7.3.65. table_tokenize
¶
7.3.65.1. Summary¶
table_tokenize
command tokenizes text by the specified table’s tokenizer.
7.3.65.2. Syntax¶
This command takes many parameters.
table
and string
are required parameters. Others are
optional:
table_tokenize table
string
[flags=NONE]
[mode=GET]
[index_column=null]
7.3.65.3. Usage¶
Here is a simple example.
Execution example:
plugin_register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto --token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [[0, 1337566253.89858, 0.000355720520019531], []]
Terms
table is set TokenBigram
tokenizer, NormalizerAuto
normalizer,
TokenFilterStopWord
token filter. It returns tokens that is
generated by tokenizeing "Hello and Good-bye"
with TokenBigram
tokenizer.
It is normalized by NormalizerAuto
normalizer.
and
token is removed with TokenFilterStopWord
token filter.
7.3.65.4. Parameters¶
This section describes all parameters. Parameters are categorized.
7.3.65.4.1. Required parameters¶
There are required parameters, table
and string
.
7.3.65.4.1.1. table
¶
Specifies the lexicon table. table_tokenize
command uses the
tokenizer, the normalizer, the token filters that is set the
lexicon table.
7.3.65.4.1.2. string
¶
Specifies any string which you want to tokenize.
7.3.65.4.2. Optional parameters¶
There are optional parameters.
7.3.65.4.2.1. flags
¶
Specifies a tokenization customize options. You can specify
multiple options separated by “|
”.
The default value is NONE
.
7.3.65.4.2.2. mode
¶
Specifies a tokenize mode.
The default value is GET
.
7.3.65.4.2.3. index_column
¶
Specifies an index column.
Return value includes estimated_size
of the index.
The estimated_size
is useful for checking estimated frequency of tokens.
7.3.65.5. Return value¶
table_tokenize
command returns tokenized tokens.
See Return value option in tokenize about details.