Package com.
  Types
Link copied to clipboard
                A Character Tokenizer which returns a token for each character in the string.
Link copied to clipboard
                Wraps a HuggingFace tokenizer. These are subword tokenizers meaning they return subword tokens. Instantiated by calling the HuggingFace models name.
Link copied to clipboard
                A SentencePiece Tokenizer. This is a subword-tokenizer meaning that it return subword-tokens, e.g. "hey" might end up "h", "ey".
Link copied to clipboard
                Link copied to clipboard
                A simple tokenizer which allows you to define your own whitespace to split upon.
Link copied to clipboard
                Special tokens that is usable to Machine Learning.