7.8.14. TokenPattern

7.8.14.1. Summary

TokenPattern is a tokenizer which is used to extract tokens by regular expression. This tokenizer extracts only token that matches the specified regular expression.

You can also specify multiple patterns of regular expression.

7.8.14.2. Syntax

TokenPattern has optional parameter.

Specify one pattern:

TokenPattern("pattern", PATTERN)

Specify multiple patterns:

TokenPattern("pattern", PATTERN_1, "pattern", PATTERN_2, ... "pattern", PATTERN_N)

TokenPattern can accept multiple patterns as above.

7.8.14.3. Usage

Here is an example of TokenPattern. As TokenPattern only extracts the token which matches the specified regular expression, it is able to filter search results which only matches the extracted token.

For example, let’s compare search results by specific keywords. One is listed in TokenPattern pattern, and the other is not listed in TokenPattern pattern.

There are menus which contains both of specific keywords in Foods table.

Here is the sample schema and data:

Execution example:

table_create Foods TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Foods name COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Keywords TABLE_PAT_KEY ShortText --default_tokenizer 'TokenPattern("pattern", "Apple|Orange")'
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Keywords index COLUMN_INDEX Foods name
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Foods
[
{"name": "Apple Pie"},
{"name": "Orange Pie"}
{"name": "Raspberry Pie"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Then search Apple Pie with --query Apple.

Execution example:

select Foods --match_columns name --query 'Apple'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "name",
#           "Text"
#         ]
#       ],
#       [
#         1,
#         "Apple Pie"
#       ]
#     ]
#   ]
# ]

In above example, Apple matches pattern which is specified as TokenPattern pattern, select matches Apple Pie.

Then search Raspberry Pie with --query Raspberry.

Execution example:

select Foods --match_columns name --query 'Raspberry'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         0
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "name",
#           "Text"
#         ]
#       ]
#     ]
#   ]
# ]

In above example, even though Foods table contains Raspberry Pie record, select doesn’t match it because Raspberry doesn’t match to TokenPattern pattern..

7.8.14.4. See also