7.8.14. TokenPattern
¶
7.8.14.1. Summary¶
TokenPattern
is a tokenizer which is used to extract tokens by regular expression.
This tokenizer extracts only token that matches the specified regular expression.
You can also specify multiple patterns of regular expression.
7.8.14.2. Syntax¶
TokenPattern
has optional parameter.
Specify one pattern:
TokenPattern("pattern", PATTERN)
Specify multiple patterns:
TokenPattern("pattern", PATTERN_1, "pattern", PATTERN_2, ... "pattern", PATTERN_N)
TokenPattern
can accept multiple patterns as above.
7.8.14.3. Usage¶
Here is an example of TokenPattern
. As TokenPattern
only extracts the token which matches the specified regular expression, it is able to filter search results which only matches the extracted token.
For example, let’s compare search results by specific keywords. One is listed in TokenPattern
pattern, and the other is not listed in TokenPattern
pattern.
There are menus which contains both of specific keywords in Foods
table.
Here is the sample schema and data:
Execution example:
table_create Foods TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Foods name COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Keywords TABLE_PAT_KEY ShortText --default_tokenizer 'TokenPattern("pattern", "Apple|Orange")'
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Keywords index COLUMN_INDEX Foods name
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Foods
[
{"name": "Apple Pie"},
{"name": "Orange Pie"}
{"name": "Raspberry Pie"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
Then search Apple Pie
with --query Apple
.
Execution example:
select Foods --match_columns name --query 'Apple'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "name",
# "Text"
# ]
# ],
# [
# 1,
# "Apple Pie"
# ]
# ]
# ]
# ]
In above example, Apple
matches pattern which is specified as TokenPattern
pattern, select
matches Apple Pie
.
Then search Raspberry Pie
with --query Raspberry
.
Execution example:
select Foods --match_columns name --query 'Raspberry'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "name",
# "Text"
# ]
# ]
# ]
# ]
# ]
In above example, even though Foods
table contains Raspberry Pie
record, select
doesn’t match it because Raspberry
doesn’t match to TokenPattern
pattern..