nlp

Welcome to londogard-nlp-toolkits, com.londogard:nlp!
This project is created to make NLP tools more accessible to the JVM world.

It includes a multitude of features, such as

  1. Embeddings (Word & Sentence)

  2. Tokenizers (Word, Char & Subword)

  3. Stopwords, Word Frequencies & Stemming

  4. Vectorizers & Encoders (TF-IDF, BM-25, OneHot, ...)

  5. Classifiers (NaïveBayes, Logistic Regression w/ SGD & Transformers including HuggingFace)

  6. Token Classifiers (Hidden Markov Chains & Transformers including HuggingFace)

  7. Keyword Extraction

Packages

Link copied to clipboard

In com.londogard:nlp there's multiple embeddings supported:

Link copied to clipboard

NLP currently supports two types of sentence embeddings, namely:

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

The Word Frequencies are taken from wordfreq.py a library by LuminosoInsight and are hosted directly on the GitHub. The object looks as follows: