Release Notes¶

v0.1.4¶

Bug Fixes¶

Fixed missing MultipartiteRank Python export -- the class was registered in the Rust PyO3 module but not re-exported from python/rapid_textrank/__init__.py, causing ImportError: cannot import name 'MultipartiteRank' when installing from PyPI.

v0.1.3¶

New Algorithm Variants¶

SingleRank¶

New SingleRank variant with weighted edges based on co-occurrence frequency and cross-sentence windowing -- words in adjacent sentences can co-occur within the same sliding window, unlike classic TextRank which treats each sentence independently.
Full Python bindings: native class, JSON dispatch, and spaCy pipeline component.
Unit tests, integration tests, and documentation.

TopicalPageRank¶

New TopicalPageRank variant that biases PageRank toward user-supplied topic weights, enabling domain-focused keyword extraction via a personalization vector.
Includes PersonalizedPageRank engine in src/pagerank/personalized.rs with automatic normalization of the personalization vector.
7 unit tests and an integration pipeline test.
Python topic_weights_from_lda helper (python/rapid_textrank/topic_utils.py) for deriving topic weights from a Gensim LDA model, with an optional Jaccard pre-filter heuristic.
Full Python bindings and README examples.

MultipartiteRank¶

New MultipartiteRank variant that performs Hierarchical Agglomerative Clustering (HAC) on candidate keyphrases using Jaccard distance, then applies a multipartite graph ranking.
Shared src/clustering.rs module extracted from the former TopicRank-specific clustering code, now reused by both TopicRank and MultipartiteRank.
Disjoint-set fast-path optimization for Jaccard distance computation.
Full Python bindings, README documentation, and notebook examples.

Core Engine Improvements¶

Cross-sentence windowing added to GraphBuilder -- configurable window that spans sentence boundaries, improving recall for long documents.
Shared clustering module (src/clustering.rs) -- HAC with Jaccard distance, reusable across variants.
TopicRank edge weighting aligned with the PKE reference implementation for correctness.

Bug Fixes¶

Fixed use_pos_in_nodes serde default -- changed from false to true, fixing 2 POS-filtering tests (test_json_include_pos_filtering, test_json_include_pos_multiple_tags).
Fixed cargo fmt violations that were failing CI checks across multiple files.

Documentation & Examples¶

README updated with usage examples for SingleRank, TopicalPageRank, and MultipartiteRank.
Notebooks 02_algorithm_variants and 04_benchmarks updated with examples for all new variants.
CLAUDE.md and AGENTS.md updated with PyO3 + Python 3.14 build caveats.
topic_weights_from_lda usage example added to README.

Benchmarks¶

Added SingleRank, TopicalPageRank, and MultipartiteRank to the Criterion benchmark suite.

Stats¶

20 commits since v0.1.2
27 files changed, 3,409 insertions, 271 deletions

v0.1.2¶

Highlights¶

Added TopicRank support via JSON interface (variant="topic_rank") with spaCy-token examples.
TopicRank behavior aligned more closely with pytextrank.
Docs and notebooks updated to include TopicRank usage and comparisons.

Details¶

New JSON config fields: topic_similarity_threshold, topic_edge_weight, focus_terms, bias_weight.
README updated with TopicRank section + citation.
Benchmarks notebook now compares rapid_textrank TopicRank vs pytextrank TopicRank using spaCy tokens.

v0.1.1¶

Highlights¶

Closer alignment with pytextrank defaults: window size now 3, POS defaults include verbs, and scrubbed-text grouping available by default.
Improved phrase quality: stopword-aware chunking reduces noisy phrases.
Config parity across APIs: new options surfaced in Python, JSON, and spaCy component interfaces.

New¶

use_pos_in_nodes to treat nodes as lemma|POS.
phrase_grouping with lemma or scrubbed_text.
get_stopwords(language) helper to inspect built-in stopwords.
JSON config now supports language, use_pos_in_nodes, phrase_grouping, and additional stopwords (extends built-ins).
spaCy component supports include_pos, use_pos_in_nodes, phrase_grouping, language, and stopwords.

Changed Defaults¶

window_size: 4 to 3
include_pos: NOUN + ADJ + PROPN to NOUN + ADJ + PROPN + VERB
use_pos_in_nodes: false to true
phrase_grouping: lemma to scrubbed_text

Notebook Updates¶

Benchmarks now compare rapid_textrank vs pytextrank with and without spaCy tokens.
Config blocks simplified to use new defaults.
Stopword list printing added to algorithm explanation notebook.

Bug Fixes¶

Stopwords now act as chunk boundaries, preventing phrases like "of NLP is to".
Heuristic POS tagging recognizes common function words to reduce false content tokens.

Compatibility Notes¶

If you relied on old defaults, set them explicitly to preserve behavior: - window_size=4 - include_pos=["NOUN","ADJ","PROPN"] - use_pos_in_nodes=False - phrase_grouping="lemma"

v0.1.0¶

Release Date: February 5, 2026

This is the initial public release of rapid_textrank, a high-performance TextRank implementation in Rust with Python bindings.

Highlights¶

10-100x faster than pure Python TextRank implementations (depending on document size)
Three algorithm variants: BaseTextRank, PositionRank, and BiasedTextRank
18 languages supported for stopword filtering
Dual API: Native Python classes + JSON interface for batch processing
spaCy integration: Drop-in pipeline component

Features¶

Algorithm Variants¶

Variant	Use Case
`BaseTextRank`	General keyword extraction
`PositionRank`	Documents where key terms appear early (papers, news)
`BiasedTextRank`	Topic-focused extraction with customizable focus terms

Performance Optimizations¶

CSR (Compressed Sparse Row) graph format for cache-friendly PageRank iteration
String interning via StringPool reducing memory 10-100x for typical documents
Parallel graph construction with Rayon
Link-time optimization (LTO) with single codegen unit
FxHash for fast internal hash maps

Supported Platforms¶

Python 3.9, 3.10, 3.11, 3.12
Linux (manylinux)
macOS (x86_64, arm64)
Windows (x86_64)

Supported Languages¶

en, de, fr, es, it, pt, nl, ru, sv, no, da, fi, hu, tr, pl, ar, zh, ja