7.3.65. `table_tokenize`¶

7.3.65.1. Summary¶

table_tokenize command tokenizes text by the specified table’s tokenizer.

7.3.65.2. Syntax¶

This command takes many parameters.

table and string are required parameters. Others are optional:

table_tokenize table
               string
               [flags=NONE]
               [mode=GET]
               [index_column=null]

7.3.65.3. Usage¶

Here is a simple example.

Execution example:

plugin_register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenBigram   --normalizer NormalizerAuto   --token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [[0, 1337566253.89858, 0.000355720520019531], []]

Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord token filter. It returns tokens that is generated by tokenizeing "Hello and Good-bye" with TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is removed with TokenFilterStopWord token filter.

7.3.65.4. Parameters¶

This section describes all parameters. Parameters are categorized.

7.3.65.4.1. Required parameters¶

There are required parameters, table and string.

7.3.65.4.1.1. `table`¶

Specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer, the token filters that is set the lexicon table.

7.3.65.4.1.2. `string`¶

Specifies any string which you want to tokenize.

See string option in tokenize about details.

7.3.65.4.2. Optional parameters¶

There are optional parameters.

7.3.65.4.2.1. `flags`¶

Specifies a tokenization customize options. You can specify multiple options separated by “|”.

The default value is NONE.

See flags option in tokenize about details.

7.3.65.4.2.2. `mode`¶

Specifies a tokenize mode.

The default value is GET.

See mode option in tokenize about details.

7.3.65.4.2.3. `index_column`¶

Specifies an index column.

Return value includes estimated_size of the index.

The estimated_size is useful for checking estimated frequency of tokens.

7.3.65.5. Return value¶

table_tokenize command returns tokenized tokens.

See Return value option in tokenize about details.

7.3.65.6. See also¶

7.3.64. table_rename

7.3.66. thread_dump

7.3.65. table_tokenize¶