Skip to content

How TextRank Works

TextRank is a graph-based ranking algorithm for keyword extraction, inspired by Google's PageRank.

The Algorithm

  1. Build a co-occurrence graph -- Words become nodes. An edge connects two words if they appear within a sliding window (default: 4 words).

  2. Run PageRank -- The algorithm iteratively distributes "importance" through the graph. Words connected to many important words become important themselves.

  3. Extract phrases -- High-scoring words are grouped into noun chunks (POS-filtered) to form key phrases. Scores are aggregated (sum, mean, or max).

Text: "Machine learning enables systems to learn from data"

Co-occurrence graph (window=2):
    machine <-> learning <-> enables <-> systems <-> learn <-> data
                              |
                            PageRank
                              |
    Scores: machine(0.23) learning(0.31) enables(0.12) ...
                              |
                        Phrase extraction
                              |
    "machine learning" (0.54), "systems" (0.18), ...

Each algorithm variant modifies one or more of these steps:

  • PositionRank biases PageRank teleportation toward words that appear early.
  • BiasedTextRank biases teleportation toward user-specified focus terms.
  • SingleRank weights edges by co-occurrence count and windows across sentence boundaries.
  • TopicalPageRank biases teleportation using per-word topic weights (e.g., from LDA).
  • TopicRank clusters candidates into topics, builds a topic-level graph, then picks representatives.
  • MultipartiteRank keeps individual candidates but removes intra-topic edges and boosts first-occurring variants.

Further Reading

Interactive Notebook

See the algorithm explained step-by-step in the Algorithm Explanation Notebook.