BpeEmbeddings

public final class BpeEmbeddings implements Embeddings

BPEEmbeddings are subword embeddings that embeds SentencePieceTokenizer tokenized data. Studies show that performance are on par with GloVe (+-5%) while only using few MB's of data rather than GB's. Supports 275 languages through bpemb.

Constructors

Link copied to clipboard
public BpeEmbeddings BpeEmbeddings(    Path filePath,     Integer dimensions,     Character delimiter,     Tokenizer tokenizer)

Types

Link copied to clipboard
public class Companion

Functions

Link copied to clipboard
public Boolean contains(String word)

Check if the word has an embedding.

Link copied to clipboard
public Float cosineDistance(String w1, String w2)
Link copied to clipboard
public Float euclideanDistance(String w1, String w2)
Link copied to clipboard
public Character getDelimiter()
Link copied to clipboard
public Integer getDimensions()
Link copied to clipboard
public Map<String, NDArray<Float, D1>> getEmbeddings()
Link copied to clipboard
public Path getFilePath()
Link copied to clipboard
public Set<String> getVocabulary()
Link copied to clipboard
public final NDArray<Float, D1> subwordVector(String subword)
Link copied to clipboard
public List<NDArray<Float, D1>> traverseVectors(List<String> words)
Link copied to clipboard
public List<NDArray<Float, D1>> traverseVectorsOrNull(List<String> words)
Link copied to clipboard
public NDArray<Float, D1> vector(String word)

Fetches Embedding if it exists for word

Properties

Link copied to clipboard
private final Character delimiter
Link copied to clipboard
private final Integer dimensions
Link copied to clipboard
private final Map<String, NDArray<Float, D1>> embeddings
Link copied to clipboard
private final Path filePath
Link copied to clipboard
private final Set<String> vocabulary