nlp
1.2.0
nlp
com.
londogard.
nlp.
embeddings
Bpe
Embeddings
Embedding
Loader
Embeddings
Light
Word
Embeddings
Word
Embeddings
com.
londogard.
nlp.
embeddings.
sentence
Average
Sentence
Embeddings
Sentence
Embeddings
USif
Sentence
Embeddings
com.
londogard.
nlp.
keywords
Cooccurrence
Keywords
Keywords
com.
londogard.
nlp.
meachinelearning
Custom
Math
Kt
D2Sparse
Array
D2Sparse
Array
Kt
Dataset
Multik
Utils
Kt
Not
Fit
Exception
com.
londogard.
nlp.
meachinelearning.
datatypes
Coordinate
Count
Percent
Percent
Or
Count
com.
londogard.
nlp.
meachinelearning.
encoders
Encoder
One
Hot
Encoder
com.
londogard.
nlp.
meachinelearning.
loss
Logistic
Loss
Loss
com.
londogard.
nlp.
meachinelearning.
metrics
Metrics
com.
londogard.
nlp.
meachinelearning.
native
Native
Manager
com.
londogard.
nlp.
meachinelearning.
optimizer
Gradient
Descent
Optimizer
com.
londogard.
nlp.
meachinelearning.
predictors
Base
Predictor
Base
Predictor
Kt
com.
londogard.
nlp.
meachinelearning.
predictors.
classifiers
Auto
One
Hot
Classifier
Classifier
Logistic
Regression
Naive
Bayes
com.
londogard.
nlp.
meachinelearning.
predictors.
regression
Regressor
Simple
Linear
Regression
com.
londogard.
nlp.
meachinelearning.
predictors.
sequence
Hidden
Markov
Model
Sequence
Classifier
com.
londogard.
nlp.
meachinelearning.
predictors.
transformers
Classifier
Pipeline
Token
Classification
Pipeline
Transformer
Pipeline
com.
londogard.
nlp.
meachinelearning.
predictors.
transformers.
translators
Classifier
Translator
Token
Classification
Translator
com.
londogard.
nlp.
meachinelearning.
regularization
Base
Regularizer
L1
L2
com.
londogard.
nlp.
meachinelearning.
transformers
Bm25Transformer
Tf
Idf
Transformer
Transformer
com.
londogard.
nlp.
meachinelearning.
vectorizer
Bm25Vectorizer
Tf
Idf
Vectorizer
Vectorizer
com.
londogard.
nlp.
meachinelearning.
vectorizer.
count
Count
Vectorizer
Dense
Count
Vectorizer
Hash
Count
Vectorizer
com.
londogard.
nlp.
preprocessing
Preprocessor
com.
londogard.
nlp.
stemmer
Stemmer
com.
londogard.
nlp.
stopwords
Stopwords
com.
londogard.
nlp.
structures.
trie
Trie
Node
com.
londogard.
nlp.
tokenizer
Char
Tokenizer
Hugging
Face
Tokenizer
Wrapper
Sentence
Piece
Tokenizer
Sentence
Piece
Tokenizer
Kt
Simple
Tokenizer
Tokenizer
Tokenizer
Special
Tokens
Vocab
Size
v1000
v3000
v5000
v10_000
v25_000
v50_000
v100_000
v200_000
com.
londogard.
nlp.
tokenizer.
sentence
Simple
Sentence
Tokenizer
com.
londogard.
nlp.
utils
Compression
Util
Ejml
Extensions
Kt
File
Info
Iterable
Extensions
Language
Support
ab
ace
ady
af
ak
als
am
an
ang
ar
arc
arz
as
ast
atj
av
ay
az
azb
ba
bar
bcl
be
bg
bi
bh
bjn
bm
bn
bo
bpy
br
bs
bug
bxr
ca
cdo
ce
ceb
ch
chr
chy
ckb
co
cr
crh
cs
csb
cu
cv
cy
da
de
din
diq
dsb
dty
dv
dz
ee
eml
el
en
eo
es
et
eu
ext
fa
ff
fi
fj
fo
fr
frp
frr
fur
fy
ga
gag
gan
gd
gl
glk
gn
gom
got
gu
gv
ha
hak
haw
he
hi
hif
hr
hsb
ht
hu
hy
ia
id
ie
ig
ik
ilo
io
is
it
iu
ja
jam
jbo
jv
ka
kaa
kab
kbd
kbp
kg
ki
kk
kl
km
kn
ko
koi
krc
ks
ksh
ku
kv
kw
ky
la
lad
lb
lbe
lez
lg
li
lij
lmo
ln
lo
lrc
lt
ltg
lv
mai
mdf
me
mg
mh
mhr
mi
min
mk
ml
mn
mr
mrj
ms
mt
mwl
my
myv
mzn
na
nb
nap
nah
nds
ne
new
ng
nl
nn
no
nov
nrm
nso
nv
ny
oc
olo
om
or
os
pa
pag
pam
pap
pcd
pdc
pfl
pi
pih
pl
pms
pnb
pnt
ps
pt
qu
rm
rmy
rn
rs
ro
ru
rue
rw
sa
sah
sc
scn
sco
sd
se
sg
sh
si
sk
sl
sm
sn
so
sq
sr
srn
ss
st
stq
su
sv
sw
szl
ta
tcy
te
tet
tg
th
ti
tk
tl
tn
to
tpi
tr
ts
tt
tum
tw
ty
tyv
udm
ug
uk
ur
uz
ve
vec
vep
vi
vls
vo
wa
war
wo
wuu
xal
xh
xmf
yi
yo
za
zea
zh
zu
Url
Provider
com.
londogard.
nlp.
utils.
huggingface
Engine
ONNX
PYTORCH
HFModel
Hugging
Face
Model
Hub
com.
londogard.
nlp.
wordfreq
Word
Frequencies
Word
Frequency
Size
Largest
Smallest
nlp
/
com.londogard.nlp.tokenizer.sentence
Package
com.
londogard.
nlp.
tokenizer.
sentence
Types
Types
Simple
Sentence
Tokenizer
Link copied to clipboard
public
final
class
SimpleSentenceTokenizer
implements
Tokenizer
Content copied to clipboard
A sentence tokenizer which returns each sentence as a token using simple heuristics.