How TextRank Works¶

TextRank is a graph-based ranking algorithm for keyword extraction, inspired by Google's PageRank.

The Algorithm¶

Build a co-occurrence graph -- Words become nodes. An edge connects two words if they appear within a sliding window (default: 4 words).
Run PageRank -- The algorithm iteratively distributes "importance" through the graph. Words connected to many important words become important themselves.
Extract phrases -- High-scoring words are grouped into noun chunks (POS-filtered) to form key phrases. Scores are aggregated (sum, mean, or max).

Text: "Machine learning enables systems to learn from data"

Co-occurrence graph (window=2):
    machine <-> learning <-> enables <-> systems <-> learn <-> data
                              |
                            PageRank
                              |
    Scores: machine(0.23) learning(0.31) enables(0.12) ...
                              |
                        Phrase extraction
                              |
    "machine learning" (0.54), "systems" (0.18), ...

Each algorithm variant modifies one or more of these steps:

PositionRank biases PageRank teleportation toward words that appear early.
BiasedTextRank biases teleportation toward user-specified focus terms.
SingleRank weights edges by co-occurrence count and windows across sentence boundaries.
TopicalPageRank biases teleportation using per-word topic weights (e.g., from LDA).
TopicRank clusters candidates into topics, builds a topic-level graph, then picks representatives.
MultipartiteRank keeps individual candidates but removes intra-topic edges and boosts first-occurring variants.

How TextRank Works¶

The Algorithm¶

Further Reading¶