SimpleSentenceTokenizer

public final class SimpleSentenceTokenizer implements Tokenizer

A sentence tokenizer which returns each sentence as a token using simple heuristics.

Constructors

Link copied to clipboard
public SimpleSentenceTokenizer SimpleSentenceTokenizer()

Types

Link copied to clipboard
public class Companion

Functions

Link copied to clipboard
public List<List<String>> batchSplit(List<String> texts)

A more efficient approach for native tokenizers, i.e. HuggingFaceTokenizer

Link copied to clipboard
public List<String> split(String text)