7.3.11. column_create
¶
7.3.11.1. Summary¶
column_create
creates a new column in a table.
You need to create one or more columns to store multiple data in one record.
Groonga provides an index as a column. It’s different from other systems. An index is just an index in other systems. Implementing an index as a column provides flexibility. For example, you can add metadata to each token.
See Column for column details.
7.3.11.2. Syntax¶
This command takes many parameters.
Most parameters are required:
column_create table
name
flags
type
[source=null]
[path=null]
7.3.11.3. Usage¶
This section describes about the followings:
Here is the People
table definition. The People
table is used
in examples:
Execution example:
table_create \
--name People \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
7.3.11.3.1. Create a scalar column¶
Groonga provides scalar column to store one value. For example, scalar column should be used for storing age into a person record. Because a person record must have only one age.
If you want to store multiple values into a record, scalar column isn’t suitable. Use Create a vector column instead.
You must specify COLUMN_SCALAR
to the flags
parameter to
create a scalar column.
Here is an example to create the age
column to the People
table. age
column is a scalar column. It can store one UInt8
(0-255
) value as its value:
Execution example:
column_create \
--table People \
--name age \
--flags COLUMN_SCALAR \
--type UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can store one value (7
) by the following load command:
Execution example:
load --table People
[
{"_key": "alice", "age": 7}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
You can confirm the stored one value (7
) by the following
select command:
Execution example:
select --table People
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt8"
# ]
# ],
# [
# 1,
# "alice",
# 7
# ]
# ]
# ]
# ]
7.3.11.3.2. Create a vector column¶
Groonga provides vector column to store multiple values. For example, vector column may be used for storing roles into a person record. Because a person record may have multiple roles.
If you want to store only one value into a record, vector column isn’t suitable. Use Create a scalar column instead.
You must specify COLUMN_VECTOR
to the flags
parameter to
create a vector column.
Here is an example to create the roles
column to the People
table. roles
column is a vector column. It can store zero or more
ShortText
values as its value:
Execution example:
column_create \
--table People \
--name roles \
--flags COLUMN_VECTOR \
--type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can store multiple values (["adventurer", "younger-sister"]
)
by the following load command:
Execution example:
load --table People
[
{"_key": "alice", "roles": ["adventurer", "younger-sister"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
You can confirm the stored multiple values (["adventurer",
"younger-sister"]
) by the following select command:
Execution example:
select --table People
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt8"
# ],
# [
# "roles",
# "ShortText"
# ]
# ],
# [
# 1,
# "alice",
# 7,
# [
# "adventurer",
# "younger-sister"
# ]
# ]
# ]
# ]
# ]
7.3.11.3.3. Create a weight vector column¶
TODO: See also Weight vector column and adjuster.
7.3.11.3.4. Create a column that refers a table’s record¶
Both scalar column and vector column can store reference to record of an existing table as column value. It’s useful to store relationship between records.
For example, using a column that refers a person record is better for storing a character into a book record. Because one person may be appeared in some books.
You must specify table name to be referenced to the type
parameter
to create a column that refers a table’s record.
Here is an example to create the character
column to the Books
table. The character
column refers the People
table. It can
store one People
table’s record.
Here is the Books
table definition:
Execution example:
table_create \
--name Books \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
Here is the character
column definition in the Books
table. --type People
is important:
Execution example:
column_create \
--table Books \
--name character \
--flags COLUMN_SCALAR \
--type People
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can store one reference ("alice"
) by the following load
command. You can use key value (People._key
value) for referring a
record:
Execution example:
load --table Books
[
{"_key": "Alice's Adventure in Wonderland", "character": "alice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
You can confirm the stored reference ("alice"
record) by the
following select command. It retrieves the age
column and
the roles
column values:
Execution example:
select \
--table Books \
--output_columns _key,character._key,character.age,character.roles
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "character._key",
# "ShortText"
# ],
# [
# "character.age",
# "UInt8"
# ],
# [
# "character.roles",
# "ShortText"
# ]
# ],
# [
# "Alice's Adventure in Wonderland",
# "alice",
# 7,
# [
# "adventurer",
# "younger-sister"
# ]
# ]
# ]
# ]
# ]
7.3.11.3.5. Create an index column¶
Groonga provides index column for fast search. It doesn’t store your data. It stores data for fast search.
You don’t need to update index column by yourself. Index column is updated automatically when you store data into a data column (scalar column or vector column) that is marked as index target column. You can set multiple columns as index target columns to one index column.
If you make a new index, it is invisible until finishing of index build.
If Groonga has an index column for the age
column of the
People
table, Groonga can do fast equal search, fast comparison
search and fast range search against age
column values.
You must specify the following parameters to create an index column:
The
flags
parameter:COLUMN_INDEX
The
type
parameter: The table name of index target column such asPeople
The
source
parameter: The index target column name such asage
You don’t need additional flags to the flags
parameter for equal
search, comparison search and range search index column. You need
additional flags to the flags
parameter for full text search index
column or multiple column index column. See
Create an index column for full text search and
Create a multiple columns index column for details.
Here is an example to create an index column for the age
column of
the People
table.
First, you need to create a table for range index column. See
Create range index table for details. This example
creates the Ages
table as TABLE_PAT_KEY:
Execution example:
table_create \
--name Ages \
--flags TABLE_PAT_KEY \
--key_type UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]
Now, you can create an index column for the age
column of the
People
table. COLUMN_INDEX
in the flags
parameter,
People
in the type
parameter and age
in the source
parameter are important:
Execution example:
column_create \
--table Ages \
--name people_age_index \
--flags COLUMN_INDEX \
--type People \
--source age
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can confirm that age > 5
is evaluated by the
Ages.people_age_index
newly created index column from log. Groonga
reports used index columns in info
log level. You can change log
level dynamically by log_level command.
Execution example:
log_level --level info
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
--table People \
--filter 'age > 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt8"
# ],
# [
# "roles",
# "ShortText"
# ]
# ],
# [
# 1,
# "alice",
# 7,
# [
# "adventurer",
# "younger-sister"
# ]
# ]
# ]
# ]
# ]
# log: 2022-03-28 15:41:00.299778|i| [table-selector][select][index][range] <Ages.people_age_index>
log_level --level notice
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can confirm that the Ages.people_age_index
is used from the
following log:
[table][select][index][range] <Ages.people_age_index>
The log says Ages.people_age_index
index column is used for range
search.
7.3.11.3.6. Create an index column for full text search¶
There is a difference between for non full text search (equal search,
comparison search or range search) index column and for full text
search index column. You need to add WITH_POSITION
to the
flags
parameter. It means that you need to specify
COLUMN_INDEX|WITH_POSITION
to the flags
parameter. It’s the
difference.
Here is an example to create a full text search index column for the
key of the People
table.
First, you need to create a table for full text search index
column. See Create lexicon for details. This example
creates the Terms
table as TABLE_PAT_KEY with
TokenBigram tokenizer and NormalizerAuto normalizer:
Execution example:
table_create \
--name Terms \
--flags TABLE_PAT_KEY \
--key_type ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
Now, you can create a full text search index column for the key of the
People
table. COLUMN_INDEX|WITH_POSITION
in the flags
parameter, People
in the type
parameter and _key
in the
source
parameter are important:
Execution example:
column_create \
--table Terms \
--name people_key_index \
--flags COLUMN_INDEX|WITH_POSITION \
--type People \
--source _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can confirm that --match_columns _key
and --query Alice
are evaluated by the Terms.people_key_index
newly created full
text search index column from log. Groonga reports used index columns
in info
log level. You can change log level dynamically by
log_level command.
Execution example:
log_level --level info
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
--table People \
--match_columns _key \
--query Alice
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt8"
# ],
# [
# "roles",
# "ShortText"
# ]
# ],
# [
# 1,
# "alice",
# 7,
# [
# "adventurer",
# "younger-sister"
# ]
# ]
# ]
# ]
# ]
# log: 2022-03-28 15:41:00.809971|i| [object][search][index][key][exact] <Terms.people_key_index>
# log: 2022-03-28 15:41:00.809998|i| grn_ii_sel > (Alice)
# log: 2022-03-28 15:41:00.810064|i| [ii][select] n=1 (Alice)
# log: 2022-03-28 15:41:00.810116|i| exact: 1
# log: 2022-03-28 15:41:00.810125|i| hits=1
log_level --level notice
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can confirm that the Terms.people_key_index
is used from the
following log:
[object][search][index][key][exact] <Terms.people_key_index>
The log says Terms.people_key_index
index column is used for full
text search. (To be precise, the index column is used for exact term
search by inverted index.)
7.3.11.3.7. Create a multiple columns index column¶
You can create an index column for multiple columns. It means that you can do fast search for multiple columns with one index column. Multiple columns index column has better space efficiency than single column index column only when multiple columns have many same tokens. Multiple columns index column may be slower than single column index column. Because multiple columns index column will be a bigger index column.
You can’t use multiples columns in different tables as index target
columns in the same multiple columns index column. You must specify
columns in the same tables as index target columns to one multiple
columns index column. For example, you can’t create a multiple columns
index for People._key
and Books._key
because they are columns
of different tables. You can create a multiple columns index column
for People._key
and People.roles
because they are columns of
the same table.
There is a difference between for single column index column and for
multiple columns index column. You need to add WITH_SECTION
to the
flags
parameter. It means that you need to specify
COLUMN_INDEX|WITH_SECTION
to the flags
parameter. It’s the
difference.
If you want to create a multiple columns index column for full text
search, you need to specify
COLUMN_INDEX|WITH_POSITION|WITH_SECTION
to the flags
parameter. See Create an index column for full text search for full
text search index column details.
Here is an example to create a multiple columns index column for the
key of the People
table and the roles
column of the People
table.
There is no difference between index table for single column index column and multiple columns index column.
In this example, Names
table is created for equal search and
prefix search. It uses TABLE_PAT_KEY
because TABLE_PAT_KEY
supports prefix search. See Tables for details.
Execution example:
table_create \
--name Names \
--flags TABLE_PAT_KEY \
--key_type ShortText \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can create a multiple columns index column for the key of the
People
table and roles
column of the People
table. COLUMN_INDEX|WITH_SECTION
in the flags
parameter,
People
in the type
parameter and _key,roles
in the
source
parameter are important:
Execution example:
column_create \
--table Names \
--name people_key_roles_index \
--flags COLUMN_INDEX|WITH_SECTION \
--type People \
--source _key,roles
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can confirm that --filter 'roles @^ "Younger"
is evaluated by
the Names.people_key_roles_index
newly created multiple columns
index column from log. Groonga reports used index columns in info
log level. You can change log level dynamically by log_level
command.
Execution example:
log_level --level info
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
--table People \
--filter 'roles @^ "Younger"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt8"
# ],
# [
# "roles",
# "ShortText"
# ]
# ],
# [
# 1,
# "alice",
# 7,
# [
# "adventurer",
# "younger-sister"
# ]
# ]
# ]
# ]
# ]
# log: 2022-03-28 15:41:01.320091|i| [table-selector][select][index][prefix] <Names.people_key_roles_index>
log_level --level notice
# [[0, 1337566253.89858, 0.000355720520019531], true]
You can confirm that the Names.people_key_roles_index
is used from
the following log:
[table][select][index][prefix] <Names.people_key_roles_index>
The log says Names.people_key_roles_index
index column is used for
prefix search.
7.3.11.3.8. Create a small index column¶
If you know index target data are small, you can reduce memory usage
for the index column. Memory usage is 1/256
of the default index
column.
How many data are small? It depends on data. Small index column can’t
handle 1 billion records at least. If index target is only one
scalar column with no text family type (ShortText
, Text
or
LongText
), the maximum handleable records are depends of the
number of kinds of index target data. If index target column has
1
, 1
, 2
and 3
, the number of kinds of them are 3
(1
and 2
and 3
). The following table shows the
relationship between the number of kinds of index target data and the
number of handleable records:
The number of kinds of index target data |
The number of hanleable records |
---|---|
1 |
16779234 |
2 |
4648070 |
4 |
7238996 |
8 |
8308622 |
16 |
11068624 |
32 |
12670817 |
64 |
18524211 |
128 |
38095511 |
256 |
51265384 |
You need to add INDEX_SMALL
to the flags
parameter such as
COLUMN_INDEX|INDEX_SMALL
to create a small index column.
If the People
table has only 1 million records, you can use a
small index column for the age
column:
Execution example:
column_create \
--table Ages \
--name people_age_small_index \
--flags COLUMN_INDEX|INDEX_SMALL \
--type People \
--source age
# [[0, 1337566253.89858, 0.000355720520019531], true]
7.3.11.3.9. Create a medium index column¶
If you know index target data are medium, you can reduce memory usage
for the index column. Memory usage is 5/24
of the default index
column.
How many data are medium? It depends on data.
If index target is only one scalar column, a medium index column can handle all records.
A medium index column may not handle all records at the following cases:
Index target is one text family (
ShortText
,Text
orLongText
) scalar columnIndex target is one vector column
Index targets are multiple columns
Index table has tokenizer
You need to add INDEX_MEDIUM
to the flags
parameter such as
COLUMN_INDEX|INDEX_MEDIUM
to create a medium index column.
You can use a medium index column for an index column of the age
column of the People
table safely. Because it’s one scalar column
with UInt8
type.
Here is an example to create a medium index column:
Execution example:
column_create \
--table Ages \
--name people_age_medium_index \
--flags COLUMN_INDEX|INDEX_MEDIUM \
--type People \
--source age
# [[0, 1337566253.89858, 0.000355720520019531], true]
7.3.11.3.10. Create a large index column¶
If you know index target data are large, you need to use large index column. It uses increases memory usage for the index column but it can accept more data. Memory usage is 2 times larger than the default index column.
How many data are large? It depends on data.
If index target is only one scalar column, it’s not large data.
Large data must have many records (normally at least 10 millions records) and at least one of the following features:
Index targets are multiple columns
Index table has tokenizer
You need to add INDEX_LARGE
to the flags
parameter such as
COLUMN_INDEX|INDEX_LARGE
to create a large index column.
You can use a large index column for an index column of the _key
of the People
table and the role
column of the People
table.
Here is an example to create a large index column:
Execution example:
column_create \
--table Terms \
--name people_roles_large_index \
--flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION|INDEX_LARGE \
--type People \
--source roles
# [[0, 1337566253.89858, 0.000355720520019531], true]
7.3.11.3.11. Missing mode¶
New in version 12.0.2.
You can control how to process a nonexistent key in the specified new
value of a reference column by a MISSING_*
flag. Here are
available MISSING_*
flags:
MISSING_ADD
(default)MISSING_IGNORE
MISSING_NIL
You can’t specify multiple MISSING_*
flags for a column.
MISSING_*
flags are meaningful only for a reference column.
The following table describes the differences between MISSING_*
flags when a nonexistent key is specified to a reference scalar
column:
Flag |
Description |
An example given value |
An example set value |
---|---|---|---|
|
The given nonexistent key is added to the referred table automatically and the ID of the newly added record is set. This is the default. |
|
The record ID of the newly added record whose key is
|
|
The given nonexistent key is ignored and There is no difference between |
|
|
|
The given nonexistent key is ignored and There is no difference between |
|
|
Here is an example to show differences between MISSING_*
flags for
a reference scalar column.
First this example defines columns for all MISSING_*
flags:
Execution example:
table_create \
--name MissingModeScalarReferred \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create \
--name MissingModeScalar \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table MissingModeScalar \
--name missing_add \
--flags COLUMN_SCALAR|MISSING_ADD \
--type MissingModeScalarReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table MissingModeScalar \
--name missing_ignore \
--flags COLUMN_SCALAR|MISSING_IGNORE \
--type MissingModeScalarReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table MissingModeScalar \
--name missing_nil \
--flags COLUMN_SCALAR|MISSING_NIL \
--type MissingModeScalarReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
Then this example loads nonexistent keys to all columns. The specified
nonexistent key for MISSING_ADD
is only added to
MissingModeScalarReferred
automatically and the specified
nonexistent keys for MISSING_IGNORE
and MISSING_NIL
aren’t
added to MissingModeScalarReferred
. missing_ignore
’s value and
missing_nil
’s value are showed as ""
because they refer a
record whose ID is 0
and record whose ID is 0
never exist:
Execution example:
load --table MissingModeScalar
[
{"_key": "key", "missing_add": "nonexistent1"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table MissingModeScalar
[
{"_key": "key", "missing_ignore": "nonexistent2"}
]
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "<MissingModeScalar.missing_ignore>: failed to cast to <MissingModeScalarReferred>: <\"nonexistent2\">",
# [
# [
# "grn_ra_cast_value",
# "lib/store.c",
# 494
# ]
# ]
# ],
# 1
# ]
load --table MissingModeScalar
[
{"_key": "key", "missing_nil": "nonexistent3"}
]
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "<MissingModeScalar.missing_nil>: failed to cast to <MissingModeScalarReferred>: <\"nonexistent3\">",
# [
# [
# "grn_ra_cast_value",
# "lib/store.c",
# 494
# ]
# ]
# ],
# 1
# ]
select --table MissingModeScalar
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "missing_add",
# "MissingModeScalarReferred"
# ],
# [
# "missing_ignore",
# "MissingModeScalarReferred"
# ],
# [
# "missing_nil",
# "MissingModeScalarReferred"
# ]
# ],
# [
# 1,
# "key",
# "nonexistent1",
# "",
# ""
# ]
# ]
# ]
# ]
select --table MissingModeScalarReferred
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "nonexistent1"
# ]
# ]
# ]
# ]
The following table describes the differences between MISSING_*
flags when a vector value that has a nonexistent key element is
specified to a reference vector column:
Flag |
Description |
An example given value |
An example set value |
---|---|---|---|
|
The given nonexistent key is added to the referred table automatically and the ID of the newly added record is used for the element. This is the default. |
|
The record IDs of |
|
The given nonexistent key element is ignored. |
|
The record IDs of |
|
The given nonexistent key element is ignored. If See also Invalid mode. |
|
The record ID of The record IDs of |
Here is an example to show differences between MISSING_*
flags for
a reference vector column.
First this example defines columns for all MISSING_*
flags:
Execution example:
table_create \
--name MissingModeVectorReferred \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create \
--name MissingModeVector \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table MissingModeVector \
--name missing_add \
--flags COLUMN_VECTOR|MISSING_ADD \
--type MissingModeVectorReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table MissingModeVector \
--name missing_ignore \
--flags COLUMN_VECTOR|MISSING_IGNORE|INVALID_IGNORE \
--type MissingModeVectorReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table MissingModeVector \
--name missing_nil \
--flags COLUMN_VECTOR|MISSING_NIL|INVALID_IGNORE \
--type MissingModeVectorReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
Then this example loads a vector that includes a nonexistent key to
all columns. The specified nonexistent key for MISSING_ADD
is only
added to MissingModeVectorReferred
automatically and the specified
nonexistent keys for MISSING_IGNORE
and MISSING_NIL
aren’t
added to MissingModeVectorReferred
. The specified nonexistent key
element is removed from missing_ignore
’s value. The specified
nonexistent key element is replaced with 0
in missing_nil
’s
value because INVALID_IGNORE
is also specified. And the element
replaced with 0
is showed as ""
because it refers a record
whose ID is 0
and record whose ID is 0
never exist:
Execution example:
load --table MissingModeVectorReferred
[
{"_key": "existent1"},
{"_key": "existent2"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table MissingModeVector
[
{"_key": "key", "missing_add": ["existent1", "nonexistent1", "existent2"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table MissingModeVector
[
{"_key": "key", "missing_ignore": ["existent1", "nonexistent2", "existent2"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table MissingModeVector
[
{"_key": "key", "missing_nil": ["existent1", "nonexistent3", "existent2"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table MissingModeVector
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "missing_add",
# "MissingModeVectorReferred"
# ],
# [
# "missing_ignore",
# "MissingModeVectorReferred"
# ],
# [
# "missing_nil",
# "MissingModeVectorReferred"
# ]
# ],
# [
# 1,
# "key",
# [
# "existent1",
# "nonexistent1",
# "existent2"
# ],
# [
# "existent1",
# "existent2"
# ],
# [
# "existent1",
# "",
# "existent2"
# ]
# ]
# ]
# ]
# ]
select --table MissingModeVectorReferred
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "existent1"
# ],
# [
# 2,
# "existent2"
# ],
# [
# 3,
# "nonexistent1"
# ]
# ]
# ]
# ]
7.3.11.3.12. Invalid mode¶
New in version 12.0.2.
You can control how to process an invalid value in the specified new
value of a data column by a INVALID_*
flag. Here are available
INVALID_*
flags:
INVALID_ERROR
(default)INVALID_WARN
INVALID_IGNORE
You can’t specify multiple INVALID_*
flags for a column.
INVALID_*
flags are meaningful only for a COLUMN_SCALAR
column
and a COLUMN_VECTOR
column.
If the target column is a reference column, an invalid value depends
on Missing mode. If you specify
MISSING_IGNORE
or MISSING_NIL
, a nonexistent key is an invalid
value. Note that an empty string key and string keys that are empty
strings by normalization aren’t an invalid value with all
MISSING_*
flags. They are special.
If the target column isn’t a reference column, an invalid value
depends on column’s value type. For example, "invalid"
is an
invalid value for an Int32
scalar column.
The following table describes the differences between INVALID_*
flags when an invalid value is specified to an Int32
scalar column:
Flag |
Description |
An example given value |
An example set value |
---|---|---|---|
|
The given invalid value is reported as an error in Process log and by load. The given invalid value isn’t set. This is the default. |
|
The column isn’t updated. |
|
The given invalid value is reported as a warning in Process log. The given invalid value is replaced with the default value of
the target scalar column. For example, |
|
|
|
The given invalid value is ignored. The given invalid value is replaced with the default value of
the target scalar column. For example, |
|
|
Here is an example to show differences between INVALID_*
flags for
an Int32
scalar column.
First this example defines columns for all INVALID_*
flags:
Execution example:
table_create \
--name InvalidModeScalar \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table InvalidModeScalar \
--name invalid_error \
--flags COLUMN_SCALAR|INVALID_ERROR \
--type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table InvalidModeScalar \
--name invalid_warn \
--flags COLUMN_SCALAR|INVALID_WARN \
--type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table InvalidModeScalar \
--name invalid_ignore \
--flags COLUMN_SCALAR|INVALID_IGNORE \
--type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
Then this example loads 29
as initial values for all columns to
show differences between them on update:
Execution example:
load --table InvalidModeScalar
[
{
"_key": "key",
"invalid_error": 29,
"invalid_warn": 29,
"invalid_ignore": 29
}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select \
--table InvalidModeScalar \
--output_columns invalid_error,invalid_warn,invalid_ignore
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "invalid_error",
# "Int32"
# ],
# [
# "invalid_warn",
# "Int32"
# ],
# [
# "invalid_ignore",
# "Int32"
# ]
# ],
# [
# 29,
# 29,
# 29
# ]
# ]
# ]
# ]
Then this example update existing column values with invalid values.
The specified invalid value is reported as an error by load
only with INVALID_ERROR
. And the existing value isn’t updated only
with INVALID_ERROR
. The existing value is updated with 0
with
INVALID_WARN
and INVALID_IGNORE
. You can’t see differences
between INVALID_WARN
and INVALID_IGNORE
with this example but
a warning message is logged in Process log only with
INVALID_WARN
:
Execution example:
load --table InvalidModeScalar
[
{"_key": "key", "invalid_error": "invalid"},
]
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "<InvalidModeScalar.invalid_error>: failed to cast to <Int32>: <\"invalid\">",
# [
# [
# "grn_ra_cast_value",
# "lib/store.c",
# 494
# ]
# ]
# ],
# 1
# ]
load --table InvalidModeScalar
[
{"_key": "key", "invalid_warn": "invalid"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table InvalidModeScalar
[
{"_key": "key", "invalid_ignore": "invalid"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select \
--table InvalidModeScalar \
--output_columns invalid_error,invalid_warn,invalid_ignore
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "invalid_error",
# "Int32"
# ],
# [
# "invalid_warn",
# "Int32"
# ],
# [
# "invalid_ignore",
# "Int32"
# ]
# ],
# [
# 29,
# 0,
# 0
# ]
# ]
# ]
# ]
The following table describes the differences between INVALID_*
flags when a vector value that has an invalid element is specified to
an Int32
vector column:
Flag |
Description |
An example given value |
An example set value |
---|---|---|---|
|
The given invalid element is reported as an error in Process log but load doesn’t report an error. If the target column is a reference vector column and
|
|
|
|
The given invalid element is reported as a warning in Process log. If the target column is a reference vector column and
|
|
|
|
The given invalid element is ignored. If the target column is a reference vector column and
|
|
|
Here is an example to show differences between INVALID_*
flags for
a reference vector column.
First this example defines columns for all INVALID_*
flags:
Execution example:
table_create \
--name InvalidModeVector \
--flags TABLE_HASH_KEY \
--key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table InvalidModeVector \
--name invalid_error \
--flags COLUMN_VECTOR|INVALID_ERROR \
--type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table InvalidModeVector \
--name invalid_warn \
--flags COLUMN_VECTOR|INVALID_WARN \
--type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
--table InvalidModeVector \
--name invalid_ignore \
--flags COLUMN_VECTOR|INVALID_IGNORE \
--type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
Then this example loads a vector that includes an invalid element to
all columns. The all specified invalid elements are ignored regardless
of INVALID_*
flags. Messages in Process log are different
by INVALID_*
flags. If INVALID_ERROR
is specified, an error
message is logged in Process log. If INVALID_WARN
is
specified, a warning message is logged in
Process log. INVALID_IGNORE
is specified, no message is
logged in Process log:
Execution example:
load --table InvalidModeVector
[
{"_key": "key", "invalid_error": [1, "invalid", 3]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table InvalidModeVector
[
{"_key": "key", "invalid_warn": [1, "invalid", 3]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table InvalidModeVector
[
{"_key": "key", "invalid_ignore": [1, "invalid", 3]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select \
--table InvalidModeVector \
--output_columns invalid_error,invalid_warn,invalid_ignore
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "invalid_error",
# "Int32"
# ],
# [
# "invalid_warn",
# "Int32"
# ],
# [
# "invalid_ignore",
# "Int32"
# ]
# ],
# [
# [
# 1,
# 3
# ],
# [
# 1,
# 3
# ],
# [
# 1,
# 3
# ]
# ]
# ]
# ]
# ]
7.3.11.4. Parameters¶
This section describes all parameters.
7.3.11.4.1. Required parameters¶
There are some required parameters.
7.3.11.4.1.1. table
¶
Specifies an existing table name for the new column.
7.3.11.4.1.2. name
¶
Specifies the column name to be created.
The column name must be unique in the same table.
Here are available characters:
0
..9
(digit)a
..z
(alphabet, lower case)A
..Z
(alphabet, upper case)#
(hash)@
(at mark)-
(hyphen)_
(underscore) (NOTE: Underscore can’t be used as the first character.)
You need to create a name with one or more the above characters. Note
that you can’t use _
as the first character such as _name
.
7.3.11.4.1.3. flags
¶
Specifies the column type and column customize options.
Here are available flags:
Flag |
Description |
---|---|
|
Scalar column. It can store one value. See also Scalar column or Create a scalar column. |
|
Vector column. It can store multiple values. See also Vector column or Create a vector column. |
|
Index column. It stores data for fast search. See also Index column or Create an index column. |
|
It enables column value compression by zlib. You need Groonga that enables zlib support. Compression by zlib is higher space efficiency than compression by LZ4. But compression by zlib is slower than compression by LZ4. This flag is available only for |
|
It enables column value compression by LZ4. You need Groonga that enables LZ4 support. Compression by LZ4 is faster than compression by zlib. But compression by LZ4 is lower space efficiency than compression by zlib. This flag is available only for |
|
It enables column value compression by Zstandard. You need Groonga that enables Zstandard support. Compression by Zstandard is faster than compression by zlib and the same space efficiency as zlib. This flag is available only for |
|
It enables section support to index column. If section support is enabled, you can support multiple documents in the same index column. You must specify this flag to create a multiple columns index column. See also Create a multiple columns index column for details. Section support requires additional spaces. If you don’t need section support, you should not enable section support. This flag is available only for |
|
It enables weight support to vector column or index column. If weight support is enabled for vector column, you can add weight for each element. If weight support is enabled for index column, you can add weight for each posting. They are useful to compute suitable search score. You must specify this flag to use adjuster. See also Create a weight vector column for details. Weight support requires additional spaces. If you don’t need weight support, you should not enable weight support. This flag is available only for |
|
New in version 10.0.3. You can use 32bit floating point instead of 32bit unsigned integer for weight value. You also need to specify This flag is available only for |
|
It enables position support to index column. It means that the index column is full inverted index. (Index column is implemented as inverted index.) If position support is enabled, you can add position in the document for each posting. It’s required for phrase search. It means that index column for full text search must enable position support because most full text search uses phrase search. You must specify this flag to create a full text search index column. See also Create an index column for full text search for details. Position support requires additional spaces. If you don’t need position support, you should not enable position support. This flag is available only for |
|
New in version 6.0.8. It requires to create a small index column. If index target data are small, small index column is enough. Small index column uses fewer memory than a normal index column or a medium index column. See also Create a small index column for knowing what are “small data” and how to use this flag. This flag is available only for |
|
New in version 6.0.8. It requires to create a medium index column. If index target data are medium, medium index column is enough. Medium index column uses fewer memory than a normal index column. See also Create a medium index column for knowing what are “medium data” and how to use this flag. This flag is available only for |
|
New in version 9.0.2. It requires to create a large index column. If index target data are large, you need to use large index column. Large index column uses more memory than a normal index column but accepts more data than a normal index column. See also Create a large index column for knowing what are “large data” and how to use this flag. This flag is available only for |
|
New in version 12.0.2. You can’t specify multiple This is meaningful only for reference scalar and vector columns. If this flag is specified and nonexistent key in the referred table is specified to the column’s value, a new record is created in the referred table automatically. If you don’t specify any See also Missing mode. This flag is available only for |
|
New in version 12.0.2. You can’t specify multiple This is meaningful only for reference scalar and vector columns. If this flag is specified and nonexistent key in the referred
table is specified to the column’s value, the value is just
ignored. If the column is a scalar column, See also Missing mode. This flag is available only for |
|
New in version 12.0.2. You can’t specify multiple This is meaningful only for reference scalar and vector columns. If this flag is specified and nonexistent key in the referred
table is specified to the column’s value, the value is replaced
with See also Missing mode. This flag is available only for |
|
New in version 12.0.2. You can’t specify multiple If this flag is specified and an invalid value is specified, an error is reported to Process log. For example, If the column is a scalar column, load also reports an error. If the column is a vector column, load doesn’t reports
an error but invalid values in a vector value are removed or
replaced with Note This is an incompatible change at 12.0.2. load also reports an error for a vector column before 12.0.2. If you don’t specify any See also Invalid mode. This flag is available only for |
|
New in version 12.0.2. You can’t specify multiple If this flag is specified and an invalid value is specified, a warning is reported to Process log but no error is reported. For example, If the column is a vector column, invalid values in a vector
value are removed or replaced with See also Invalid mode. This flag is available only for |
|
New in version 12.0.2. You can’t specify multiple If this flag is specified and an invalid value is specified, it’s just ignored. For example, If the column is a vector column, invalid values in a vector
value are removed or replaced with See also Invalid mode. This flag is available only for |
You must specify one of COLUMN_${TYPE}
flags. You can’t specify
two or more COLUMN_${TYPE}
flags. For example,
COLUMN_SCALAR|COLUMN_VECTOR
is invalid.
You can combine flags with |
(vertical bar) such as
COLUMN_INDEX|WITH_SECTION|WITH_POSITION
.
7.3.11.4.1.4. type
¶
Specifies type of the column value.
If the column is scalar column or vector column, here are available types:
Builtin types described in Data types
Tables defined by users
If the column is index column, here are available types:
Tables defined by users
See also the followings:
7.3.11.4.2. Optional parameters¶
There is an optional parameter.
7.3.11.4.2.1. source
¶
Specifies index target columns. You can specify one or more columns to
the source
parameter.
This parameter is only available for index column.
You can only specify columns of the table specified as
type. You can also use the _key
pseudo column
for specifying the table key as index target.
If you specify multiple columns to the source
parameter, separate
columns with ,
(comma) such as _key,roles
.
7.3.11.4.2.2. path
¶
New in version 10.0.7.
Specifies a path for storing a column.
This option is useful if you want to store a column that you often use to fast storage (e.g. SSD) and store it that you don’t often use to slow storage (e.g. HDD).
You can use a relative path or an absolute path in this option.
If you specify a relative path, it is resolved from the current directory for the groonga
process.
The default value is none.
7.3.11.5. Return value¶
column_create
returns true
as body on success such as:
[HEADER, true]
If column_create
fails, column_create
returns false
as
body:
[HEADER, false]
Error details are in HEADER
.
See Output format for HEADER
.