Skip to content

Extractor Classes

The class-based API gives you reusable extractor instances with full configuration control. AutoRank is the recommended default when you do not want to choose a single algorithm manually. TopicRank is available only through the JSON interface.

Class Summary

Class Extra Constructor Params Extra Methods
BaseTextRank -- extract_keywords(text)
PositionRank -- extract_keywords(text)
BiasedTextRank focus_terms, bias_weight=5.0 set_focus(terms), extract_keywords(text, focus_terms=None)
SingleRank -- extract_keywords(text)
TopicalPageRank topic_weights, min_weight=0.0 set_topic_weights(w), extract_keywords(text, topic_weights=None)
AutoRank focus_terms, bias_weight=5.0, semantic_weights, semantic_min_weight=0.0 set_focus(terms), set_semantic_weights(w), extract_keywords(text)
MultipartiteRank similarity_threshold=0.26, alpha=1.1 extract_keywords(text)

All classes share these constructor parameters:

Parameter Type Default Description
config TextRankConfig None Full configuration object (see TextRankConfig). Overrides all defaults.
top_n int None Number of results. Overrides config.top_n if both are provided.
language str None Language for stopword filtering. Overrides config.language if both are provided.

All classes return a TextRankResult from extract_keywords().

BaseTextRank

The standard TextRank implementation. A good starting point for general-purpose keyword extraction.

from rapid_textrank import BaseTextRank

extractor = BaseTextRank(top_n=10, language="en")
result = extractor.extract_keywords(text)

for phrase in result.phrases:
    print(f"{phrase.text}: {phrase.score:.4f}")

PositionRank

Weights words by their position in the document -- earlier appearances score higher. Useful for academic papers, news articles, and executive summaries where key information appears early.

from rapid_textrank import PositionRank

extractor = PositionRank(top_n=10, language="en")
result = extractor.extract_keywords("""
Quantum Computing Advances in 2024

Researchers have made significant breakthroughs in quantum error correction.
The quantum computing field continues to evolve rapidly...
""")

# "quantum computing" ranks higher due to early position
for phrase in result.phrases:
    print(f"{phrase.text}: {phrase.score:.4f}")

BiasedTextRank

Steers extraction toward specific topics using focus terms. The bias_weight parameter controls how strongly results favor the focus terms.

Constructor

BiasedTextRank(
    focus_terms=None,       # list[str] -- terms to bias toward
    bias_weight=5.0,        # float -- higher = stronger bias
    config=None,
    top_n=None,
    language=None,
)

Usage

from rapid_textrank import BiasedTextRank

extractor = BiasedTextRank(
    focus_terms=["security", "privacy"],
    bias_weight=5.0,
    top_n=10,
    language="en",
)

result = extractor.extract_keywords("""
Modern web applications must balance user experience with security.
Privacy regulations require careful data handling. Performance
optimizations should not compromise security measures.
""")

# Results will favor security/privacy-related phrases
for phrase in result.phrases:
    print(f"{phrase.text}: {phrase.score:.4f}")

Updating focus terms

You can update focus terms on an existing extractor in two ways:

# Option 1: set_focus() method
extractor.set_focus(["encryption", "audit"])
result = extractor.extract_keywords(text)

# Option 2: pass focus_terms per call
result = extractor.extract_keywords(text, focus_terms=["neural", "network"])

SingleRank

Extends TextRank with weighted co-occurrence edges and cross-sentence windowing. Edges are weighted by co-occurrence count, and the sliding window ignores sentence boundaries.

from rapid_textrank import SingleRank

extractor = SingleRank(top_n=10, language="en")
result = extractor.extract_keywords("""
Machine learning is a powerful tool. Deep learning is a subset of
machine learning. Neural networks power deep learning systems.
""")

# Cross-sentence co-occurrences strengthen "machine learning" edges
for phrase in result.phrases:
    print(f"{phrase.text}: {phrase.score:.4f}")

TopicalPageRank

Extends SingleRank by biasing the random walk toward topically important words. Supply per-word topic weights as a {lemma: weight} dictionary -- typically from a topic model (LDA, TF-IDF, etc.), but any source of word importance scores works.

Constructor

TopicalPageRank(
    topic_weights=None,     # dict[str, float] -- per-lemma importance weights
    min_weight=0.0,         # float -- floor for out-of-vocabulary words
    config=None,
    top_n=None,
    language=None,
)

Usage

from rapid_textrank import TopicalPageRank

topic_weights = {
    "neural": 0.9,
    "network": 0.8,
    "learning": 0.7,
    "deep": 0.6,
}

extractor = TopicalPageRank(
    topic_weights=topic_weights,
    min_weight=0.01,
    top_n=10,
    language="en",
)

result = extractor.extract_keywords("""
Deep learning is a subset of machine learning that uses artificial neural
networks. Neural networks with many layers can learn complex patterns.
Convolutional neural networks excel at image recognition tasks.
""")

for phrase in result.phrases:
    print(f"{phrase.text}: {phrase.score:.4f}")

Updating topic weights

# Option 1: set_topic_weights() method
extractor.set_topic_weights({"machine": 0.9, "data": 0.8})
result = extractor.extract_keywords(text)

# Option 2: pass topic_weights per call
result = extractor.extract_keywords(text, topic_weights={"machine": 0.9, "data": 0.8})

See Topic Utilities for computing topic weights from LDA.

AutoRank

Runs the full eligible keyword ensemble for the document and fuses the results into one ranked list. Use it when you want a strong default without manual variant selection.

Constructor

AutoRank(
    focus_terms=None,
    bias_weight=5.0,
    semantic_weights=None,
    semantic_min_weight=0.0,
    config=None,
    top_n=None,
    language=None,
)

Usage

from rapid_textrank import AutoRank

extractor = AutoRank(
    top_n=10,
    semantic_weights={"machine": 1.0, "learning": 0.8},
)
result = extractor.extract_keywords(text)

for phrase, support in zip(result.phrases, result.consensus.phrase_support):
    print(phrase.text, support.confidence, support.supporting_variants)

MultipartiteRank

Builds a k-partite directed graph where candidates from different topic clusters are connected. Intra-topic edges are removed to reduce competition between similar candidates. An alpha weight adjustment boosts the first-occurring variant in each topic cluster, encoding positional preference.

Constructor

MultipartiteRank(
    similarity_threshold=0.26,  # float -- Jaccard threshold for topic clustering
    alpha=1.1,                  # float -- position boost strength (0 = disabled)
    config=None,
    top_n=None,
    language=None,
)

Usage

from rapid_textrank import MultipartiteRank

extractor = MultipartiteRank(
    similarity_threshold=0.26,
    alpha=1.1,
    top_n=10,
    language="en",
)

result = extractor.extract_keywords("""
Machine learning is a powerful tool for data analysis. Deep learning
is a subset of machine learning. Neural networks power deep learning
systems. Convolutional neural networks excel at image recognition.
""")

for phrase in result.phrases:
    print(f"{phrase.text}: {phrase.score:.4f}")

TopicRank (JSON only)

TopicRank clusters similar candidate phrases into topics, ranks the topics, then selects representatives. It is available only through the JSON interface with variant="topic_rank". This is because TopicRank works best with external tokenization (e.g., spaCy), which provides accurate POS tags and lemmatization for the clustering step.

Using TextRankConfig with Extractor Classes

Any extractor class accepts a TextRankConfig for full control:

from rapid_textrank import BaseTextRank, TextRankConfig

config = TextRankConfig(
    damping=0.85,
    window_size=4,
    top_n=15,
    min_phrase_length=2,
    max_phrase_length=4,
    include_pos=["NOUN", "ADJ", "PROPN"],
    phrase_grouping="scrubbed_text",
    language="en",
)

extractor = BaseTextRank(config=config)
result = extractor.extract_keywords(text)

The top_n and language shortcut parameters override the corresponding values in config if both are provided. See TextRankConfig for the full parameter reference.