Skip to content

Choosing a Variant

This page provides a decision guide to help you pick the right algorithm variant for your use case. If you do not want to make that choice manually, use AutoRank and let rapid_textrank run and fuse the full eligible keyword ensemble for the document.

Decision Flowchart

flowchart TD
    A[Start] --> B{Do you have\nfocus terms?}
    B -->|Yes| C{Per-word weights\nor term list?}
    C -->|Term list| D[BiasedTextRank]
    C -->|Per-word weights| E[TopicalPageRank]
    B -->|No| F{Document type?}
    F -->|Short / structured\n title+abstract, news| G[PositionRank]
    F -->|Long / multi-topic| H{Need diversity\nacross topics?}
    F -->|General| I[BaseTextRank]
    H -->|Yes, coarse topics| J[TopicRank]
    H -->|Yes, fine-grained| K[MultipartiteRank]
    H -->|No| L[SingleRank]

Scenario Table

Scenario Recommended Variant Why
Blog post keywords BaseTextRank General-purpose, no extra config needed
Academic paper abstract PositionRank Key terms appear early
Security audit extraction BiasedTextRank Steer toward specific domains
LDA-guided extraction TopicalPageRank Data-driven per-word weights
Multi-section report TopicRank or MultipartiteRank Ensure coverage across topics
Long technical document SingleRank Cross-sentence windowing captures more co-occurrences

When in Doubt

Start with AutoRank when you want the best default behavior without tuning. Start with BaseTextRank when you want a simple baseline or you specifically want one standalone algorithm. Once you see how a single variant performs on your data, you can switch to a more specialized extractor:

  • If important keywords are being missed because they appear only in later sections, try SingleRank (weighted edges and cross-sentence windowing help).
  • If results are too generic and you know which domain you care about, try BiasedTextRank with a focus vocabulary.
  • If the keyword list is dominated by a single theme in a multi-topic document, try TopicRank or MultipartiteRank to promote diversity.
  • If you have numeric word-importance scores from a topic model or other source, try TopicalPageRank.
  • If your documents are short and structured (abstracts, news leads), try PositionRank to leverage the early-position signal.