Skip to content

Quick Start

Extract Keywords in Five Lines

The fastest way to pull keywords from text is the extract_keywords convenience function:

from rapid_textrank import extract_keywords

text = """
Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience. Deep learning, a type of
machine learning, uses neural networks with many layers.
"""

keywords = extract_keywords(text, top_n=5, language="en")
for phrase in keywords:
    print(f"{phrase.text}: {phrase.score:.4f}")

Output:

machine learning: 0.2341
deep learning: 0.1872
artificial intelligence: 0.1654
neural networks: 0.1432
systems: 0.0891

Each returned Phrase object carries several attributes:

Attribute Description
text Surface form of the phrase (e.g., "machine learning")
lemma Lemmatized form
score TextRank score
count Number of occurrences in the text
rank 1-indexed rank

Class-Based API

For more control -- choosing an algorithm variant, tuning configuration, or reusing an extractor across multiple documents -- use the class-based API:

from rapid_textrank import BaseTextRank, TextRankConfig

config = TextRankConfig(
    top_n=10,
    language="en",
    min_phrase_length=2,
    max_phrase_length=4,
)

extractor = BaseTextRank(config=config)
result = extractor.extract_keywords(text)

for phrase in result.phrases:
    print(f"{phrase.text}: {phrase.score:.4f}")

All seven algorithm variants (BaseTextRank, PositionRank, BiasedTextRank, TopicRank, SingleRank, TopicalPageRank, MultipartiteRank) follow the same pattern. See the Extractor Classes reference for the full list of constructors and parameters.

JSON Interface

If you already tokenize with spaCy (or another NLP pipeline), you can pass pre-tokenized data directly via the JSON interface to avoid re-tokenizing in Rust:

import json
from rapid_textrank import extract_from_json

payload = {
    "tokens": [
        {
            "text": "Machine",
            "lemma": "machine",
            "pos": "NOUN",
            "start": 0,
            "end": 7,
            "sentence_idx": 0,
            "token_idx": 0,
            "is_stopword": False,
        },
        # ... more tokens
    ],
    "variant": "textrank",
    "config": {"top_n": 10, "language": "en"},
}

result = json.loads(extract_from_json(json.dumps(payload)))

See the JSON Interface reference for full details on the payload schema and supported variants.

Interactive Notebook

Explore this topic in the Quick Start Notebook.