Skip to content

TextRankConfig

TextRankConfig controls every tunable aspect of the TextRank algorithm. Pass it to any extractor class via the config parameter.

Parameter Reference

Parameter Type Default Description
damping float 0.85 PageRank damping factor (0-1). Higher values give more weight to graph structure vs. uniform distribution.
max_iterations int 100 Maximum number of PageRank iterations.
convergence_threshold float 1e-6 PageRank convergence threshold. Iteration stops when the score change between iterations falls below this value.
window_size int 3 Co-occurrence window size. Two words are connected in the graph if they appear within this many words of each other.
top_n int 10 Number of top-scoring phrases to return. Set 0 to return all phrases.
min_phrase_length int 1 Minimum number of words in a phrase. Set to 2 to exclude single-word results.
max_phrase_length int 4 Maximum number of words in a phrase.
score_aggregation str "sum" How to combine individual word scores into a phrase score. Options: "sum", "mean", "max", "rms" (root mean square).
language str "en" Language code for built-in stopword filtering. See Supported Languages.
use_edge_weights bool True Whether to use weighted edges in the co-occurrence graph. When False, all edges have weight 1.
include_pos list[str] ["NOUN","ADJ","PROPN","VERB"] POS tags to include in the graph. Only words with these POS tags become graph nodes.
stopwords list[str] [] Additional stopwords that extend the built-in list for the selected language.
use_pos_in_nodes bool True If True, graph nodes are keyed by "lemma|POS" (e.g., "learning|NOUN"). If False, nodes are keyed by lemma only.
phrase_grouping str "scrubbed_text" How to group phrase variants. "scrubbed_text" groups by lowercased surface form. "lemma" groups by lemmatized form.

Full Example

from rapid_textrank import TextRankConfig, BaseTextRank

config = TextRankConfig(
    damping=0.85,
    max_iterations=100,
    convergence_threshold=1e-6,
    window_size=3,
    top_n=10,
    min_phrase_length=1,
    max_phrase_length=4,
    score_aggregation="sum",
    language="en",
    use_edge_weights=True,
    include_pos=["NOUN", "ADJ", "PROPN", "VERB"],
    use_pos_in_nodes=True,
    phrase_grouping="scrubbed_text",
    stopwords=["custom", "terms"],
)

extractor = BaseTextRank(config=config)
result = extractor.extract_keywords(text)

Common Tuning Patterns

SEO-style multi-word phrases

Force 2-4 word phrases, noun-heavy, with scrubbed-text grouping:

config = TextRankConfig(
    min_phrase_length=2,
    max_phrase_length=4,
    include_pos=["NOUN", "ADJ", "PROPN"],
    phrase_grouping="scrubbed_text",
)

Larger co-occurrence window

A wider window captures longer-range relationships:

config = TextRankConfig(window_size=6)

Stricter convergence

More iterations with a tighter threshold can improve score stability on long documents:

config = TextRankConfig(
    max_iterations=200,
    convergence_threshold=1e-8,
)

Adding domain-specific stopwords

Extend the built-in stopword list with terms that are too common in your domain:

config = TextRankConfig(
    language="en",
    stopwords=["data", "system", "model", "2024"],
)

Notes

  • The include_pos parameter expects Universal POS tags as strings (the same tags spaCy uses): "NOUN", "VERB", "ADJ", "ADV", "PROPN", etc.
  • The stopwords parameter extends the built-in list -- it does not replace it. To use only your custom stopwords without built-in ones, you would need to use the JSON interface with is_stopword flags on individual tokens.
  • TextRankConfig is validated on construction. Invalid combinations (e.g., negative damping) raise a ValueError.