7.3.56. select

7.3.56.1. Summary

select searches records that are matched to specified conditions from a table and then outputs them.

select is the most important command in groonga. You need to understand select to use the full power of Groonga.

7.3.56.2. Syntax

This command takes many parameters.

The required parameter is only table. Other parameters are optional:

select table
       [match_columns=null]
       [query=null]
       [filter=null]
       [scorer=null]
       [sortby=null]
       [output_columns="_id, _key, *"]
       [offset=0]
       [limit=10]
       [drilldown=null]
       [drilldown_sortby=null]
       [drilldown_output_columns="_key, _nsubrecs"]
       [drilldown_offset=0]
       [drilldown_limit=10]
       [cache=yes]
       [match_escalation_threshold=0]
       [query_expansion=null]
       [query_flags=ALLOW_PRAGMA|ALLOW_COLUMN]
       [query_expander=null]
       [adjuster=null]
       [drilldown_calc_types=NONE]
       [drilldown_calc_target=null]
       [drilldown_filter=null]
       [sort_keys=null]
       [drilldown_sort_keys=null]
       [match_escalation=auto]
       [load_table=null]
       [load_columns=null]
       [load_values=null]
       [drilldown_max_n_target_records=-1]
       [n_workers=0]

This command has the following named parameters for dynamic columns:

  • columns[${NAME}].stage=null

  • columns[${NAME}].flags=COLUMN_SCALAR

  • columns[${NAME}].type=null

  • columns[${NAME}].value=null

  • columns[${NAME}].window.sort_keys=null

  • columns[${NAME}].window.group_keys=null

You can use one or more alphabets, digits, _ for ${NAME}. For example, column1 is a valid ${NAME}. This is the same rule as normal column. See also name.

Parameters that have the same ${NAME} are grouped.

For example, the following parameters specify one dynamic column:

  • --columns[name].stage initial

  • --columns[name].type UInt32

  • --columns[name].value 29

The following parameters specify two dynamic columns:

  • --columns[name1].stage initial

  • --columns[name1].type UInt32

  • --columns[name1].value 29

  • --columns[name2].stage filtered

  • --columns[name2].type Float

  • --columns[name2].value '_score * 0.1'

This command has the following named parameters for advanced drilldown:

  • drilldowns[${LABEL}].keys=null

  • drilldowns[${LABEL}].sort_keys=null

  • drilldowns[${LABEL}].output_columns="_key, _nsubrecs"

  • drilldowns[${LABEL}].offset=0

  • drilldowns[${LABEL}].limit=10

  • drilldowns[${LABEL}].calc_types=NONE

  • drilldowns[${LABEL}].calc_target=null

  • drilldowns[${LABEL}].filter=null

  • drilldowns[${LABEL}].max_n_target_records=-1

  • drilldowns[${LABEL}].columns[${NAME}].stage=null

  • drilldowns[${LABEL}].columns[${NAME}].flags=COLUMN_SCALAR

  • drilldowns[${LABEL}].columns[${NAME}].type=null

  • drilldowns[${LABEL}].columns[${NAME}].value=null

  • drilldowns[${LABEL}].columns[${NAME}].window.sort_keys=null

  • drilldowns[${LABEL}].columns[${NAME}].window.group_keys=null

Deprecated since version 6.0.3: drilldown[...] syntax is deprecated, Use drilldowns[...] instead.

You can use one or more alphabets, digits, _ and . for ${LABEL}. For example, parent.sub1 is a valid ${LABEL}.

Parameters that have the same ${LABEL} are grouped.

For example, the following parameters specify one drilldown:

  • --drilldowns[label].keys column

  • --drilldowns[label].sort_keys -_nsubrecs

The following parameters specify two drilldowns:

  • --drilldowns[label1].keys column1

  • --drilldowns[label1].sort_keys -_nsubrecs

  • --drilldowns[label2].keys column2

  • --drilldowns[label2].sort_keys _key

7.3.56.3. Usage

Let’s learn about select usage with examples. This section shows many popular usages.

Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key":    "The first post!",
 "content": "Welcome! This is my first post!",
 "n_likes": 5,
 "tag": "Hello"},
{"_key":    "Groonga",
 "content": "I started to use Groonga. It's very fast!",
 "n_likes": 10,
 "tag": "Groonga"},
{"_key":    "Mroonga",
 "content": "I also started to use Mroonga. It's also very fast! Really fast!",
 "n_likes": 15,
 "tag": "Groonga"},
{"_key":    "Good-bye Senna",
 "content": "I migrated all Senna system!",
 "n_likes": 3,
 "tag": "Senna"},
{"_key":    "Good-bye Tritonn",
 "content": "I also migrated all Tritonn system!",
 "n_likes": 3,
 "tag": "Senna"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a table, Entries, for blog entries. An entry has title, content, the number of likes for the entry and tag. Title is key of Entries. Content is value of Entries.content column. The number of likes is value of Entries.n_likes column. Tag is value of Entries.tag column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

7.3.56.3.1. Simple usage

Here is the most simple usage with the above schema and data. It outputs all records in Entries table.

Execution example:

select Entries
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5,
#         "Hello"
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3,
#         "Senna"
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3,
#         "Senna"
#       ]
#     ]
#   ]
# ]

Why does the command output all records? There are two reasons. The first reason is that the command doesn’t specify any search conditions. No search condition means all records are matched. The second reason is that the number of all records is 5. select command outputs 10 records at a maximum by default. There are only 5 records. It is less than 10. So the command outputs all records.

7.3.56.3.2. Search conditions

Search conditions are specified by query or filter. You can also specify both query and filter. It means that selected records must be matched against both query and filter.

7.3.56.3.2.1. Search condition: query

query is designed for search box in Web page. Imagine a search box in google.com. You specify search conditions for query as space separated keywords. For example, search engine means a matched record should contain two words, search and engine.

Normally, query parameter is used for specifying fulltext search conditions. It can be used for non fulltext search conditions but filter is used for the propose.

query parameter is used with match_columns parameter when query parameter is used for specifying fulltext search conditions. match_columns specifies which columnes and indexes are matched against query.

Here is a simple query usage example.

Execution example:

select Entries --match_columns content --query fast
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

The select command searches records that contain a word fast in content column value from Entries table.

query has query syntax but its deatils aren’t described here. See Query syntax for datails.

7.3.56.3.2.2. Search condition: filter

filter is designed for complex search conditions. You specify search conditions for filter as ECMAScript like syntax.

Here is a simple filter usage example.

Execution example:

select Entries --filter 'content @ "fast" && _key == "Groonga"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

The select command searches records that contain a word fast in content column value and has Groonga as _key from Entries table. There are three operators in the command, @, && and ==. @ is fulltext search operator. && and == are the same as ECMAScript. && is logical AND operator and == is equality operator.

filter has more operators and syntax like grouping by (...) its details aren’t described here. See Script syntax for datails.

7.3.56.3.3. Paging

You can specify range of outputted records by offset and limit. Here is an example to output only the 2nd record.

Execution example:

select Entries --offset 1 --limit 1
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

offset is zero-based. --offset 1 means output range is started from the 2nd record.

limit specifies the max number of output records. --limit 1 means the number of output records is 1 at a maximium. If no records are matched, select command outputs no records.

7.3.56.3.4. The total number of records

You can use --limit 0 to retrieve the total number of recrods without any contents of records.

Execution example:

select Entries --limit 0
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ]
#     ]
#   ]
# ]

--limit 0 is also useful for retrieving only the number of matched records.

7.3.56.3.5. Drilldown

You can get additional grouped results against the search result in one select. You need to use two or more SELECT s in SQL but select in Groonga can do it in one select.

This feature is called as drilldown in Groonga. It’s also called as faceted search in other search engine.

For example, think about the following situation.

You search entries that has fast word:

Execution example:

select Entries --filter 'content @ "fast"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

You want to use tag for additional search condition like --filter 'content @ "fast" && tag == "???". But you don’t know suitable tag until you see the result of content @ "fast".

If you know the number of matched records of each available tag, you can choose suitable tag. You can use drilldown for the case:

Execution example:

select Entries --filter 'content @ "fast"' --drilldown tag
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ]
#     ],
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         "Groonga",
#         2
#       ]
#     ]
#   ]
# ]

--drilldown tag returns a list of pair of available tag and the number of matched records. You can avoid “no hit search” case by choosing a tag from the list. You can also avoid “too many search results” case by choosing a tag that the number of matched records is few from the list.

You can create the following UI with the drilldown results:

  • Links to narrow search results. (Users don’t need to input a search query by their keyboard. They just click a link.)

Most EC sites use the UI. See side menu at Amazon.

Groonga supports not only counting grouped records but also finding the maximum and/or minimum value from grouped records, summing values in grouped records and so on. See Drilldown related parameters for details.

7.3.56.3.6. Dynamic column

You can create zero or more columns dynamically while a select execution. You can use them for drilldown by computed value, window function and so on.

Here is an example that uses dynamic column for drilldown by computed value. This example creates a new column named n_likes_class. n_likes_class column has classified value of Entry.n_likes value. This example classifies Entry.n_likes column value 10 step and the lowest number in the class is the classified value. If a Entry.n_likes value is between 0 and 9 such as 3 and 5, n_likes_class value (classified value) is 0. If Entry.n_likes value is between 10 and 19 such as 10 and 15, n_likes_class value (classified value) is 10.

You can use number_classify function for the classification. You need to register functions/number plugin by plugin_register command to use number_classify function.

This example does drilldown by n_likes_class value. The drilldown result will help you to know data trend.

Execution example:

plugin_register functions/number
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
  --table Entries \
  --columns[n_likes_class].stage initial \
  --columns[n_likes_class].type UInt32 \
  --columns[n_likes_class].value 'number_classify(n_likes, 10)' \
  --drilldown n_likes_class \
  --drilldown_sort_keys _nsubrecs \
  --output_columns n_likes,n_likes_class
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "n_likes_class",
#           "UInt32"
#         ]
#       ],
#       [
#         5,
#         0
#       ],
#       [
#         10,
#         10
#       ],
#       [
#         15,
#         10
#       ],
#       [
#         3,
#         0
#       ],
#       [
#         3,
#         0
#       ]
#     ],
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_key",
#           "UInt32"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         10,
#         2
#       ],
#       [
#         0,
#         3
#       ]
#     ]
#   ]
# ]

See Dynamic column related parameters for details.

7.3.56.3.7. Window function

You can compute each record value from values of grouped records. For example, you can compute sums of each group and puts sums to each record. The difference against drilldown is drilldown can compute sums of each group but it puts sums to each group not record.

Here is the result with window function. Each record has sum:

Group No.

Target value

Sum result

1

5

5

2

10

25

2

15

25

3

3

8

3

5

8

Here is the result with drilldown. Each group has sum:

Group No.

Target values

Sum result

1

5

5

2

10, 15

25

3

3, 5

8

Window function is useful for data analysis.

Here is an example that sums Entries.n_likes per Entries.tag:

Execution example:

plugin_register functions/number
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
  --table Entries \
  --columns[n_likes_sum_per_tag].stage initial \
  --columns[n_likes_sum_per_tag].type UInt32 \
  --columns[n_likes_sum_per_tag].value 'window_sum(n_likes)' \
  --columns[n_likes_sum_per_tag].window.group_keys tag \
  --output_columns tag,n_likes,n_likes_sum_per_tag
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "tag",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "n_likes_sum_per_tag",
#           "UInt32"
#         ]
#       ],
#       [
#         "Hello",
#         5,
#         5
#       ],
#       [
#         "Groonga",
#         10,
#         25
#       ],
#       [
#         "Groonga",
#         15,
#         25
#       ],
#       [
#         "Senna",
#         3,
#         6
#       ],
#       [
#         "Senna",
#         3,
#         6
#       ]
#     ]
#   ]
# ]

See Window function related parameters for details.

7.3.56.4. Parameters

This section describes all parameters. Parameters are categorized.

7.3.56.4.1. Required parameters

There is a required parameter, table.

7.3.56.4.1.1. table

Specifies a table to be searched. table must be specified.

If nonexistent table is specified, an error is returned.

Execution example:

select Nonexistent
# [
#   [
#     -22,
#     1337566253.89858,
#     0.000355720520019531,
#     "[select][table] invalid name: <Nonexistent>",
#     [
#       [
#         "grn_select",
#         "lib/proc/proc_select.c",
#         4452
#       ]
#     ]
#   ]
# ]

7.3.56.4.3. Advanced search parameters

7.3.56.4.3.1. match_escalation_threshold

New in version 8.0.1.

Specifies threshold to determine whether search storategy escalation is used or not. The threshold is compared against the number of matched records. If the number of matched records is equal to or less than the threshold, the search storategy escalation is used. See 検索 about the search storategy escalation.

The default threshold is 0. It means that search storategy escalation is used only when no records are matched.

The default threshold can be customized by one of the followings.

  • --with-match-escalation-threshold option of configure

  • --match-escalation-threshold option of groonga command

  • match-escalation-threshold configuration item in configuration file

Here is a simple match_escalation_threshold usage example. The first select doesn’t have match_escalation_threshold parameter. The second select has match_escalation_threshold parameter.

Execution example:

select Entries --match_columns content --query groo
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ]
#     ]
#   ]
# ]
select Entries --match_columns content --query groo --match_escalation_threshold -1
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         0
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ]
#     ]
#   ]
# ]

The first select command searches records that contain a word groo in content column value from Entries table. But no records are matched because the TokenBigram tokenizer tokenizes groonga to groonga not gr|ro|oo|on|ng|ga. (The TokenBigramSplitSymbolAlpha tokenizer tokenizes groonga to gr|ro|oo|on|ng|ga. See Tokenizers for details.) It means that groonga is indexed but groo isn’t indexed. So no records are matched against groo by exact match. In the case, the search storategy escalation is used because the number of matched records (0) is equal to match_escalation_threshold (0). One record is matched against groo by unsplit search.

The second select command also searches records that contain a word groo in content column value from Entries table. And it also doesn’t find matched records. In this case, the search storategy escalation is not used because the number of matched records (0) is larger than match_escalation_threshold (-1). So no more searches aren’t executed. And no records are matched.

7.3.56.4.3.2. match_escalation

Specifies how to use match escalation. See also match_escalation and 検索 about the match escalation.

Here are available values:

Value

Description

auto

Groonga uses match_escalation_threshold to determine whether match escalation is used or not.

This is the default.

yes

Groonga always uses match escalation.

no

Groonga never use match escalation.

--match_escalation yes is stronger than --match_escalation_threshold 9999...999. --filter 'true && column @ "query" with --match_escalation yes uses match escalation. --filter 'true && column @ "query" with --match_escalation_threshold 9999...999 doesn’t use match escalation.

Here is a simple match_escalation usage example. The first select doesn’t have match_escalation parameter. The second select has match_escalation parameter.

Execution example:

select Entries --filter 'true && content @ "groo"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         0
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ]
#     ]
#   ]
# ]
select Entries --filter 'true && content @ "groo"' --match_escalation yes
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

The first select command searches records that contain a word groo in content column value from Entries table. But no records are matched because the TokenBigram tokenizer tokenizes groonga to groonga not gr|ro|oo|on|ng|ga.

The second select command also searches records that contain a word groo in content column value from Entries table. And it uses match escalation. So it can find matched records.

7.3.56.4.3.3. query_expansion

Deprecated since version 3.0.2: Use query_expander instead.

7.3.56.4.3.4. query_flags

It customs query parameter syntax. You cannot update column value by query parameter by default. But if you specify ALLOW_COLUMN|ALLOW_UPDATE as query_flags, you can update column value by query.

Here are available values:

  • ALLOW_PRAGMA

  • ALLOW_COLUMN

  • ALLOW_UPDATE

  • ALLOW_LEADING_NOT

  • QUERY_NO_SYNTAX_ERROR

  • NONE

ALLOW_PRAGMA enables pragma at the head of query. This is not implemented yet.

ALLOW_COLUMN enables search againt columns that are not included in match_columns. To specify column, there are COLUMN:... syntaxes.

ALLOW_UPDATE enables column update by query with COLUMN:=NEW_VALUE syntax. ALLOW_COLUMN is also required to update column because the column update syntax specifies column.

ALLOW_LEADING_NOT enables leading NOT condition with -WORD syntax. The query searches records that doesn’t match WORD. Leading NOT condition query is heavy query in many cases because it matches many records. So this flag is disabled by default. Be careful about it when you use the flag.

QUERY_NO_SYNTAX_ERROR enables never causes syntax error for query. This flag is useful when an application uses user input directly and doesn’t want to show syntax error to the user and in a log. This flag is disabled by default.

NONE is just ignores. You can use NONE for specifying no flags.

They can be combined by separated | such as ALLOW_COLUMN|ALLOW_UPDATE.

The default value is ALLOW_PRAGMA|ALLOW_COLUMN.

Here is a usage example of ALLOW_COLUMN.

Execution example:

select Entries --query content:@mroonga --query_flags ALLOW_COLUMN
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

The select command searches records that contain mroonga in content column value from Entries table.

Here is a usage example of ALLOW_UPDATE.

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "alice", "age": 18},
{"_key": "bob",   "age": 20}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select Users --query age:=19 --query_flags ALLOW_COLUMN|ALLOW_UPDATE
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         19
#       ],
#       [
#         2,
#         "bob",
#         19
#       ]
#     ]
#   ]
# ]
select Users
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         19
#       ],
#       [
#         2,
#         "bob",
#         19
#       ]
#     ]
#   ]
# ]

The first select command sets age column value of all records to 19. The second select command outputs updated age column values.

Here is a usage example of ALLOW_LEADING_NOT.

Execution example:

select Entries --match_columns content --query -mroonga --query_flags ALLOW_LEADING_NOT
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         4
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5,
#         "Hello"
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3,
#         "Senna"
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3,
#         "Senna"
#       ]
#     ]
#   ]
# ]

The select command searches records that don’t contain mroonga in content column value from Entries table.

Here are a schema definition and sample data to describe other flags:

Execution example:

table_create --name Magazine --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Magazine --name title --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Magazine
[
{"_key":"http://test.jp/magazine/webplus","title":"WEB+"},
{"_key":"http://test.jp/magazine/database","title":"DataBase"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

Here is an example of QUERY_NO_SYNTAX_ERROR:

Execution example:

select Magazine --match_columns title --query 'WEB +'  --query_flags ALLOW_PRAGMA|ALLOW_COLUMN|QUERY_NO_SYNTAX_ERROR
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "http://test.jp/magazine/webplus",
#         "WEB+"
#       ]
#     ]
#   ]
# ]

If you don’t specify this flag, the quey causes a syntax error as below.

Execution example:

select Magazine --match_columns title --query 'WEB +'  --query_flags ALLOW_PRAGMA|ALLOW_COLUMN
# [
#   [
#     -63,
#     1337566253.89858,
#     0.000355720520019531,
#     "Syntax error: <WEB +||>",
#     [
#       [
#         "yy_syntax_error",
#         "lib/grn_ecmascript.lemon",
#         168
#       ]
#     ]
#   ]
# ]

Here is a usage example of NONE.

Execution example:

select Entries --match_columns content --query 'mroonga OR _key:Groonga' --query_flags NONE
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

The select command searches records that contain one of two words mroonga or _key:Groonga in content from Entries table. Note that _key:Groonga doesn’t mean that the value of _key column is equal to Groonga. Because ALLOW_COLUMN flag is not specified.

See also Query syntax.

7.3.56.4.3.5. query_expander

It’s for query expansion. Query expansion substitutes specific words to another words in query. Nomally, it’s used for synonym search.

It specifies a column that is used to substitute query parameter value. The format of this parameter value is “${TABLE}.${COLUMN}”. For example, “Terms.synonym” specifies synonym column in Terms table.

Table for query expansion is called “substitution table”. Substitution table’s key must be ShortText. So array table (TABLE_NO_KEY) can’t be used for query expansion. Because array table doesn’t have key.

Column for query expansion is called “substitution column”. Substitution column’s value type must be ShortText. Column type must be vector (COLUMN_VECTOR).

Query expansion substitutes key of substitution table in query with values in substitution column. If a word in query is a key of substitution table, the word is substituted with substitution column value that is associated with the key. Substition isn’t performed recursively. It means that substitution target words in substituted query aren’t substituted.

Here is a sample substitution table to show a simple query_expander usage example.

Execution example:

table_create Thesaurus TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Thesaurus synonym COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Thesaurus
[
{"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
{"_key": "groonga", "synonym": ["groonga", "senna"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

Thesaurus substitution table has two synonyms, "mroonga" and "groonga". If an user searches with "mroonga", Groonga searches with "((mroonga) OR (tritonn) OR (groonga mysql))". If an user searches with "groonga", Groonga searches with "((groonga) OR (senna))".

Normally, it’s good idea that substitution table uses a normalizer. For example, if normalizer is used, substitute target word is matched in case insensitive manner. See Normalizers for available normalizers.

Note that those synonym values include the key value such as "mroonga" and "groonga". It’s recommended that you include the key value. If you don’t include key value, substituted value doesn’t include the original substitute target value. Normally, including the original value is better search result. If you have a word that you don’t want to be searched, you should not include the original word. For example, you can implement “stop words” by an empty vector value.

Here is a simple query_expander usage example.

Execution example:

select Entries --match_columns content --query "mroonga"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ]
#     ]
#   ]
# ]
select Entries --match_columns content --query "mroonga" --query_expander Thesaurus.synonym
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3,
#         "Senna"
#       ]
#     ]
#   ]
# ]
select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3,
#         "Senna"
#       ]
#     ]
#   ]
# ]

The first select command doesn’t use query expansion. So a record that has "tritonn" isn’t found. The second select command uses query expansion. So a record that has "tritonn" is found. The third select command doesn’t use query expansion but it is same as the second select command. The third one uses expanded query.

Each substitute value can contain any Query syntax syntax such as (...) and OR. You can use complex substitution by using those syntax.

Here is a complex substitution usage example that uses query syntax.

Execution example:

load --table Thesaurus
[
{"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select Entries --match_columns content --query "popular" --query_expander Thesaurus.synonym
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "tag",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10,
#         "Groonga"
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15,
#         "Groonga"
#       ]
#     ]
#   ]
# ]

The load command registers a new synonym "popular". It is substituted with ((popular) OR (n_likes:>=10)). The substituted query means that “popular” is containing the word “popular” or 10 or more liked entries.

The select command outputs records that n_likes column value is equal to or more than 10 from Entries table.

7.3.56.4.3.6. n_workers

New in version 12.0.5.

Note

This is an experimental feature. Currently, this feature is still not stable.

This feature requires Command version 3 or later.

This feature requires that Apache Arrow is enabled in Groonga.

It depends on package provider whether Apache Arrow is enabled or not.

To check whether Apache Arrow is enabled, you can use status command that show the result of apache_arrow is true or not.

If Apache Arrow is disabled, you should build Groonga from the source code with enabling Apache Arrow following the steps in Install or request to enable Apache Arrow to the package provider.

drilldown , drilldowns and slices are executed in parallel when this parameter is specified -1 or 2 or more.

In a default setting, drilldown, drilldowns and slices are executed in serial. In other words, a next process is executed after a current process is finished. So, queries tend to take a long time if there are a lot of drilldown, drilldowns and slices.

n_workers enables to execute independent drilldown, drilldowns and slices in parallel. The execution time of the total sum of processes can be shourtend by executing them in parallel. This parallel execution is done for each select command.

“independent” means not using dorilldowns.table to reference the results of other drilldowns or slices.

If there are dependencies as same meaning as using dorilldowns.table, it wait for finish the dependent drilldowns or slices. Therefore, the degree of parallelism is reduced if they have dependencies.

Executing in parallel means using multiple CPUs at the same time. If executing in parallel without free CPU resource, it may actually slow down the execution time. This is because they have to wait for the other process being executed by the target CPU to finish.

It depends on a system configuration whether or not there are free CPU resources and how many n_workers should be specified.

For example, consider using Groonga HTTP server on a system with 6 CPUs.

Groonga HTTP server allocates 1 thread (= 1CPU) for each request.

When the average number of concurrent connections is 6, there are no free CPU resources because 6 CPUs are already in use. All the CPU is used to process each request.

When the average number of concurrent connections is 2, there are 4 free CPU resources because only 2 CPUs are already in use. When specifying 2 for n_workers, the select command will use at most 3 CPUs, including the thread for processing requests. Therefore, if two select commands with 2 specified for n_workers are requested at the same time, they will use at most 6 CPUs in total and will be processed fastly by using all of the resources. When specifying greater than 2, the degree of parallelism can be higher than the CPU resources, so it may actually slow down the execution time.

n_workers behaves as follows depending on the specified value.

  • When specifying 0 or 1

    • Executes the select command in serial

  • When specifying 2 or more

    • Executes the select command in parallel with at most the specified number of threads.

  • When specifying -1 or less

    • Executes the select command in parallel with the threads of at most the number of CPU cores.

The default value of this parameter is 0 . It means that the select command is executed in serial in default.

Note

The default value can be changed by specifying the environment variable GRN_SELECT_N_WORKERS_DEFAULT.

7.3.56.5. Return value

The command returns a response with the following format:

[
  HEADER,
  [
    SEARCH_RESULT,
    DRILLDOWN_RESULT_1,
    DRILLDOWN_RESULT_2,
    ...,
    DRILLDOWN_RESULT_N
  ]
]

If the command fails, error details are in HEADER.

See Output format for HEADER.

There are zero or more DRILLDOWN_RESULT. If no drilldown and drilldowns[${LABEL}].keys are specified, they are omitted like the following:

[
  HEADER,
  [
    SEARCH_RESULT
  ]
]

If drilldown has two or more keys like --drilldown "_key, column1, column2", multiple DRILLDOWN_RESULT exist:

[
  HEADER,
  [
    SEARCH_RESULT,
    DRILLDOWN_RESULT_FOR_KEY,
    DRILLDOWN_RESULT_FOR_COLUMN1,
    DRILLDOWN_RESULT_FOR_COLUMN2
  ]
]

If drilldowns[${LABEL}].keys is used, only one DRILLDOWN_RESULT exist:

[
  HEADER,
  [
    SEARCH_RESULT,
    DRILLDOWN_RESULT_FOR_LABELED_DRILLDOWN
  ]
]

DRILLDOWN_RESULT format is different between drilldown and drilldowns[${LABEL}].keys. It’s described later.

SEARCH_RESULT is the following format:

[
  [N_HITS],
  COLUMNS,
  RECORDS
]

See Simple usage for concrete example of the format.

N_HITS is the number of matched records before limit is applied.

COLUMNS describes about output columns specified by output_columns. It uses the following format:

[
  [COLUMN_NAME_1, COLUMN_TYPE_1],
  [COLUMN_NAME_2, COLUMN_TYPE_2],
  ...,
  [COLUMN_NAME_N, COLUMN_TYPE_N]
]

COLUMNS includes one or more output column information. Each output column information includes the followings:

  • Column name as string

  • Column type as string or null

Column name is extracted from value specified as output_columns.

Column type is Groonga’s type name or null. It doesn’t describe whether the column value is vector or scalar. You need to determine it by whether real column value is array or not.

See Data types for type details.

null is used when column value type isn’t determined. For example, function call in output_columns such as --output_columns "snippet_html(content)" uses null.

Here is an example of COLUMNS:

[
  ["_id",     "UInt32"],
  ["_key",    "ShortText"],
  ["n_likes", "UInt32"],
]

RECORDS includes column values for each matched record. Included records are selected by offset and limit. It uses the following format:

[
  [
    RECORD_1_COLUMN_1,
    RECORD_1_COLUMN_2,
    ...,
    RECORD_1_COLUMN_N
  ],
  [
    RECORD_2_COLUMN_1,
    RECORD_2_COLUMN_2,
    ...,
    RECORD_2_COLUMN_N
  ],
  ...
  [
    RECORD_N_COLUMN_1,
    RECORD_N_COLUMN_2,
    ...,
    RECORD_N_COLUMN_N
  ]
]

Here is an example RECORDS:

[
  [
    1,
    "The first post!",
    5
  ],
  [
    2,
    "Groonga",
    10
  ],
  [
    3,
    "Mroonga",
    15
  ]
]

DRILLDOWN_RESULT format is different between drilldown and drilldowns[${LABEL}].keys.

drilldown uses the same format as SEARCH_RESULT:

[
  [N_HITS],
  COLUMNS,
  RECORDS
]

And drilldown generates one or more DRILLDOWN_RESULT when drilldown has one ore more keys.

drilldowns[${LABEL}].keys uses the following format. Multiple drilldowns[${LABEL}].keys are mapped to one object (key-value pairs):

{
  "LABEL_1": [
    [N_HITS],
    COLUMNS,
    RECORDS
  ],
  "LABEL_2": [
    [N_HITS],
    COLUMNS,
    RECORDS
  ],
  ...,
  "LABEL_N": [
    [N_HITS],
    COLUMNS,
    RECORDS
  ]
}

Each drilldowns[${LABEL}].keys corresponds to the following:

"LABEL": [
  [N_HITS],
  COLUMNS,
  RECORDS
]

The following value part is the same format as SEARCH_RESULT:

[
  [N_HITS],
  COLUMNS,
  RECORDS
]

See also Output format for drilldowns[${LABEL}] style for drilldowns[${LABEL}] style drilldown output format.

7.3.56.6. See also