Choosing a Variant¶
This page provides a decision guide to help you pick the right algorithm variant for your use case. If you do not want to make that choice manually, use AutoRank and let rapid_textrank run and fuse the full eligible keyword ensemble for the document.
Decision Flowchart¶
flowchart TD
A[Start] --> B{Do you have\nfocus terms?}
B -->|Yes| C{Per-word weights\nor term list?}
C -->|Term list| D[BiasedTextRank]
C -->|Per-word weights| E[TopicalPageRank]
B -->|No| F{Document type?}
F -->|Short / structured\n title+abstract, news| G[PositionRank]
F -->|Long / multi-topic| H{Need diversity\nacross topics?}
F -->|General| I[BaseTextRank]
H -->|Yes, coarse topics| J[TopicRank]
H -->|Yes, fine-grained| K[MultipartiteRank]
H -->|No| L[SingleRank] Scenario Table¶
| Scenario | Recommended Variant | Why |
|---|---|---|
| Blog post keywords | BaseTextRank | General-purpose, no extra config needed |
| Academic paper abstract | PositionRank | Key terms appear early |
| Security audit extraction | BiasedTextRank | Steer toward specific domains |
| LDA-guided extraction | TopicalPageRank | Data-driven per-word weights |
| Multi-section report | TopicRank or MultipartiteRank | Ensure coverage across topics |
| Long technical document | SingleRank | Cross-sentence windowing captures more co-occurrences |
When in Doubt¶
Start with AutoRank when you want the best default behavior without tuning. Start with BaseTextRank when you want a simple baseline or you specifically want one standalone algorithm. Once you see how a single variant performs on your data, you can switch to a more specialized extractor:
- If important keywords are being missed because they appear only in later sections, try SingleRank (weighted edges and cross-sentence windowing help).
- If results are too generic and you know which domain you care about, try BiasedTextRank with a focus vocabulary.
- If the keyword list is dominated by a single theme in a multi-topic document, try TopicRank or MultipartiteRank to promote diversity.
- If you have numeric word-importance scores from a topic model or other source, try TopicalPageRank.
- If your documents are short and structured (abstracts, news leads), try PositionRank to leverage the early-position signal.