Performance¶
rapid_textrank is designed for speed. The Rust core delivers 10-100x faster keyword extraction compared to pure Python implementations, depending on document size and tokenization method.
Key Performance Features¶
- Rust core with zero-copy data paths -- most computation happens in compiled Rust code, minimizing Python overhead
- CSR graph format -- Compressed Sparse Row storage for cache-friendly PageRank iteration
- String interning --
StringPoolreduces memory allocations 10-100x for typical documents - Parallel processing -- Rayon provides data parallelism for internal graph construction
- Link-Time Optimization -- full LTO with single codegen unit for maximum inlining
- FxHash -- fast non-cryptographic hashing for internal hash maps
Approximate Speedups¶
| Document Size | rapid_textrank | pytextrank + spaCy | Speedup |
|---|---|---|---|
| Small (~20 words) | ~0.1 ms | ~5 ms | ~50x |
| Medium (~100 words) | ~0.3 ms | ~15 ms | ~50x |
| Large (~1000 words) | ~2 ms | ~80 ms | ~40x |
Results are approximate and vary by hardware. See the Benchmarks page for a runnable benchmark script.
Learn More¶
- Benchmarks -- detailed benchmark results and a script to measure performance on your system
- Why Rust is Fast -- deep dive into the performance optimizations used in rapid_textrank
- Comparison -- how rapid_textrank compares to alternative keyword extraction libraries