Supported Languages¶

rapid_textrank includes built-in stopword lists for 18 languages. These are used for stopword filtering in all APIs: the convenience function, extractor classes, the JSON interface, and the spaCy component.

Language Codes¶

Code	Language	Code	Language	Code	Language
`en`	English	`de`	German	`fr`	French
`es`	Spanish	`it`	Italian	`pt`	Portuguese
`nl`	Dutch	`ru`	Russian	`sv`	Swedish
`no`	Norwegian	`da`	Danish	`fi`	Finnish
`hu`	Hungarian	`tr`	Turkish	`pl`	Polish
`ar`	Arabic	`zh`	Chinese	`ja`	Japanese

Usage¶

Pass the language code to the language parameter in any API:

from rapid_textrank import extract_keywords

# English (default)
phrases = extract_keywords(text, language="en")

# German
phrases = extract_keywords(german_text, language="de")

# Chinese
phrases = extract_keywords(chinese_text, language="zh")

With extractor classes:

from rapid_textrank import BaseTextRank

extractor = BaseTextRank(top_n=10, language="fr")
result = extractor.extract_keywords(french_text)

With TextRankConfig:

from rapid_textrank import TextRankConfig, BaseTextRank

config = TextRankConfig(language="ja")
extractor = BaseTextRank(config=config)

In the JSON interface:

{
    "tokens": [ ... ],
    "config": {
        "language": "es"
    }
}

Inspecting Stopwords¶

You can retrieve the built-in stopword list for any supported language:

import rapid_textrank as rt

stopwords = rt.get_stopwords("en")
print(f"English stopwords: {len(stopwords)} words")
print(stopwords[:10])

stopwords_de = rt.get_stopwords("de")
print(f"German stopwords: {len(stopwords_de)} words")

Extending Stopwords¶

The built-in lists can be extended with domain-specific terms using the stopwords parameter. These additional words are merged with the built-in list, not used as a replacement.

from rapid_textrank import TextRankConfig, BaseTextRank

config = TextRankConfig(
    language="en",
    stopwords=["data", "system", "model"],  # added to built-in English stopwords
)

extractor = BaseTextRank(config=config)

In the JSON interface, the same applies via config.stopwords:

{
    "tokens": [ ... ],
    "config": {
        "language": "en",
        "stopwords": ["data", "system", "model"]
    }
}