7.3.11. column_create

7.3.11.1. Summary

column_create creates a new column in a table.

You need to create one or more columns to store multiple data in one record.

Groonga provides an index as a column. It’s different from other systems. An index is just an index in other systems. Implementing an index as a column provides flexibility. For example, you can add metadata to each token.

See Column for column details.

7.3.11.2. Syntax

This command takes many parameters.

Most parameters are required:

column_create table
              name
              flags
              type
              [source=null]
              [path=null]

7.3.11.3. Usage

This section describes about the followings:

Here is the People table definition. The People table is used in examples:

Execution example:

table_create \
  --name People \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

7.3.11.3.1. Create a scalar column

Groonga provides scalar column to store one value. For example, scalar column should be used for storing age into a person record. Because a person record must have only one age.

If you want to store multiple values into a record, scalar column isn’t suitable. Use Create a vector column instead.

You must specify COLUMN_SCALAR to the flags parameter to create a scalar column.

Here is an example to create the age column to the People table. age column is a scalar column. It can store one UInt8 (0-255) value as its value:

Execution example:

column_create \
  --table People \
  --name age \
  --flags COLUMN_SCALAR \
  --type UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can store one value (7) by the following load command:

Execution example:

load --table People
[
{"_key": "alice", "age": 7}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can confirm the stored one value (7) by the following select command:

Execution example:

select --table People
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7
#       ]
#     ]
#   ]
# ]

7.3.11.3.2. Create a vector column

Groonga provides vector column to store multiple values. For example, vector column may be used for storing roles into a person record. Because a person record may have multiple roles.

If you want to store only one value into a record, vector column isn’t suitable. Use Create a scalar column instead.

You must specify COLUMN_VECTOR to the flags parameter to create a vector column.

Here is an example to create the roles column to the People table. roles column is a vector column. It can store zero or more ShortText values as its value:

Execution example:

column_create \
  --table People \
  --name roles \
  --flags COLUMN_VECTOR \
  --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can store multiple values (["adventurer", "younger-sister"]) by the following load command:

Execution example:

load --table People
[
{"_key": "alice", "roles": ["adventurer", "younger-sister"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can confirm the stored multiple values (["adventurer", "younger-sister"]) by the following select command:

Execution example:

select --table People
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ],
#         [
#           "roles",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]

7.3.11.3.3. Create a weight vector column

TODO: See also Weight vector column and adjuster.

7.3.11.3.4. Create a column that refers a table’s record

Both scalar column and vector column can store reference to record of an existing table as column value. It’s useful to store relationship between records.

For example, using a column that refers a person record is better for storing a character into a book record. Because one person may be appeared in some books.

You must specify table name to be referenced to the type parameter to create a column that refers a table’s record.

Here is an example to create the character column to the Books table. The character column refers the People table. It can store one People table’s record.

Here is the Books table definition:

Execution example:

table_create \
  --name Books \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is the character column definition in the Books table. --type People is important:

Execution example:

column_create \
  --table Books \
  --name character \
  --flags COLUMN_SCALAR \
  --type People
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can store one reference ("alice") by the following load command. You can use key value (People._key value) for referring a record:

Execution example:

load --table Books
[
{"_key": "Alice's Adventure in Wonderland", "character": "alice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can confirm the stored reference ("alice" record) by the following select command. It retrieves the age column and the roles column values:

Execution example:

select \
 --table Books \
 --output_columns _key,character._key,character.age,character.roles
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "character._key",
#           "ShortText"
#         ],
#         [
#           "character.age",
#           "UInt8"
#         ],
#         [
#           "character.roles",
#           "ShortText"
#         ]
#       ],
#       [
#         "Alice's Adventure in Wonderland",
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]

7.3.11.3.5. Create an index column

Groonga provides index column for fast search. It doesn’t store your data. It stores data for fast search.

You don’t need to update index column by yourself. Index column is updated automatically when you store data into a data column (scalar column or vector column) that is marked as index target column. You can set multiple columns as index target columns to one index column.

If you make a new index, it is invisible until finishing of index build.

If Groonga has an index column for the age column of the People table, Groonga can do fast equal search, fast comparison search and fast range search against age column values.

You must specify the following parameters to create an index column:

  • The flags parameter: COLUMN_INDEX

  • The type parameter: The table name of index target column such as People

  • The source parameter: The index target column name such as age

You don’t need additional flags to the flags parameter for equal search, comparison search and range search index column. You need additional flags to the flags parameter for full text search index column or multiple column index column. See Create an index column for full text search and Create a multiple columns index column for details.

Here is an example to create an index column for the age column of the People table.

First, you need to create a table for range index column. See Create range index table for details. This example creates the Ages table as TABLE_PAT_KEY:

Execution example:

table_create \
  --name Ages \
  --flags TABLE_PAT_KEY \
  --key_type UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]

Now, you can create an index column for the age column of the People table. COLUMN_INDEX in the flags parameter, People in the type parameter and age in the source parameter are important:

Execution example:

column_create \
  --table Ages \
  --name people_age_index \
  --flags COLUMN_INDEX \
  --type People \
  --source age
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that age > 5 is evaluated by the Ages.people_age_index newly created index column from log. Groonga reports used index columns in info log level. You can change log level dynamically by log_level command.

Execution example:

log_level --level info
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
  --table People \
  --filter 'age > 5'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ],
#         [
#           "roles",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]
# log: 2022-03-28 15:41:00.299778|i| [table-selector][select][index][range] <Ages.people_age_index>
log_level --level notice
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that the Ages.people_age_index is used from the following log:

[table][select][index][range] <Ages.people_age_index>

The log says Ages.people_age_index index column is used for range search.

7.3.11.3.7. Create a multiple columns index column

You can create an index column for multiple columns. It means that you can do fast search for multiple columns with one index column. Multiple columns index column has better space efficiency than single column index column only when multiple columns have many same tokens. Multiple columns index column may be slower than single column index column. Because multiple columns index column will be a bigger index column.

You can’t use multiples columns in different tables as index target columns in the same multiple columns index column. You must specify columns in the same tables as index target columns to one multiple columns index column. For example, you can’t create a multiple columns index for People._key and Books._key because they are columns of different tables. You can create a multiple columns index column for People._key and People.roles because they are columns of the same table.

There is a difference between for single column index column and for multiple columns index column. You need to add WITH_SECTION to the flags parameter. It means that you need to specify COLUMN_INDEX|WITH_SECTION to the flags parameter. It’s the difference.

If you want to create a multiple columns index column for full text search, you need to specify COLUMN_INDEX|WITH_POSITION|WITH_SECTION to the flags parameter. See Create an index column for full text search for full text search index column details.

Here is an example to create a multiple columns index column for the key of the People table and the roles column of the People table.

There is no difference between index table for single column index column and multiple columns index column.

In this example, Names table is created for equal search and prefix search. It uses TABLE_PAT_KEY because TABLE_PAT_KEY supports prefix search. See Tables for details.

Execution example:

table_create \
  --name Names \
  --flags TABLE_PAT_KEY \
  --key_type ShortText \
  --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can create a multiple columns index column for the key of the People table and roles column of the People table. COLUMN_INDEX|WITH_SECTION in the flags parameter, People in the type parameter and _key,roles in the source parameter are important:

Execution example:

column_create \
  --table Names \
  --name people_key_roles_index \
  --flags COLUMN_INDEX|WITH_SECTION \
  --type People \
  --source _key,roles
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that --filter 'roles @^ "Younger" is evaluated by the Names.people_key_roles_index newly created multiple columns index column from log. Groonga reports used index columns in info log level. You can change log level dynamically by log_level command.

Execution example:

log_level --level info
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
  --table People \
  --filter 'roles @^ "Younger"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ],
#         [
#           "roles",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]
# log: 2022-03-28 15:41:01.320091|i| [table-selector][select][index][prefix] <Names.people_key_roles_index>
log_level --level notice
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that the Names.people_key_roles_index is used from the following log:

[table][select][index][prefix] <Names.people_key_roles_index>

The log says Names.people_key_roles_index index column is used for prefix search.

7.3.11.3.8. Create a small index column

If you know index target data are small, you can reduce memory usage for the index column. Memory usage is 1/256 of the default index column.

How many data are small? It depends on data. Small index column can’t handle 1 billion records at least. If index target is only one scalar column with no text family type (ShortText, Text or LongText), the maximum handleable records are depends of the number of kinds of index target data. If index target column has 1, 1, 2 and 3, the number of kinds of them are 3 (1 and 2 and 3). The following table shows the relationship between the number of kinds of index target data and the number of handleable records:

The number of kinds of index target data and the number of handleable records in a small index column

The number of kinds of index target data

The number of hanleable records

1

16779234

2

4648070

4

7238996

8

8308622

16

11068624

32

12670817

64

18524211

128

38095511

256

51265384

You need to add INDEX_SMALL to the flags parameter such as COLUMN_INDEX|INDEX_SMALL to create a small index column.

If the People table has only 1 million records, you can use a small index column for the age column:

Execution example:

column_create \
  --table Ages \
  --name people_age_small_index \
  --flags COLUMN_INDEX|INDEX_SMALL \
  --type People \
  --source age
# [[0, 1337566253.89858, 0.000355720520019531], true]

7.3.11.3.9. Create a medium index column

If you know index target data are medium, you can reduce memory usage for the index column. Memory usage is 5/24 of the default index column.

How many data are medium? It depends on data.

If index target is only one scalar column, a medium index column can handle all records.

A medium index column may not handle all records at the following cases:

  • Index target is one text family (ShortText, Text or LongText) scalar column

  • Index target is one vector column

  • Index targets are multiple columns

  • Index table has tokenizer

You need to add INDEX_MEDIUM to the flags parameter such as COLUMN_INDEX|INDEX_MEDIUM to create a medium index column.

You can use a medium index column for an index column of the age column of the People table safely. Because it’s one scalar column with UInt8 type.

Here is an example to create a medium index column:

Execution example:

column_create \
  --table Ages \
  --name people_age_medium_index \
  --flags COLUMN_INDEX|INDEX_MEDIUM \
  --type People \
  --source age
# [[0, 1337566253.89858, 0.000355720520019531], true]

7.3.11.3.10. Create a large index column

If you know index target data are large, you need to use large index column. It uses increases memory usage for the index column but it can accept more data. Memory usage is 2 times larger than the default index column.

How many data are large? It depends on data.

If index target is only one scalar column, it’s not large data.

Large data must have many records (normally at least 10 millions records) and at least one of the following features:

  • Index targets are multiple columns

  • Index table has tokenizer

You need to add INDEX_LARGE to the flags parameter such as COLUMN_INDEX|INDEX_LARGE to create a large index column.

You can use a large index column for an index column of the _key of the People table and the role column of the People table.

Here is an example to create a large index column:

Execution example:

column_create \
  --table Terms \
  --name people_roles_large_index \
  --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION|INDEX_LARGE \
  --type People \
  --source roles
# [[0, 1337566253.89858, 0.000355720520019531], true]

7.3.11.3.11. Missing mode

New in version 12.0.2.

You can control how to process a nonexistent key in the specified new value of a reference column by a MISSING_* flag. Here are available MISSING_* flags:

  • MISSING_ADD (default)

  • MISSING_IGNORE

  • MISSING_NIL

You can’t specify multiple MISSING_* flags for a column.

MISSING_* flags are meaningful only for a reference column.

The following table describes the differences between MISSING_* flags when a nonexistent key is specified to a reference scalar column:

Flag

Description

An example given value

An example set value

MISSING_ADD

The given nonexistent key is added to the referred table automatically and the ID of the newly added record is set.

This is the default.

"nonexistent"

The record ID of the newly added record whose key is "nonexistent".

MISSING_IGNORE

The given nonexistent key is ignored and 0 is set.

There is no difference between MISSING_IGNORE and MISSING_NIL for a reference scalar column.

"nonexistent"

0

MISSING_NIL

The given nonexistent key is ignored and 0 is set.

There is no difference between MISSING_IGNORE and MISSING_NIL for a reference scalar column.

"nonexistent"

0

Here is an example to show differences between MISSING_* flags for a reference scalar column.

First this example defines columns for all MISSING_* flags:

Execution example:

table_create \
  --name MissingModeScalarReferred \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create \
  --name MissingModeScalar \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table MissingModeScalar \
  --name missing_add \
  --flags COLUMN_SCALAR|MISSING_ADD \
  --type MissingModeScalarReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table MissingModeScalar \
  --name missing_ignore \
  --flags COLUMN_SCALAR|MISSING_IGNORE \
  --type MissingModeScalarReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table MissingModeScalar \
  --name missing_nil \
  --flags COLUMN_SCALAR|MISSING_NIL \
  --type MissingModeScalarReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]

Then this example loads nonexistent keys to all columns. The specified nonexistent key for MISSING_ADD is only added to MissingModeScalarReferred automatically and the specified nonexistent keys for MISSING_IGNORE and MISSING_NIL aren’t added to MissingModeScalarReferred. missing_ignore’s value and missing_nil’s value are showed as "" because they refer a record whose ID is 0 and record whose ID is 0 never exist:

Execution example:

load --table MissingModeScalar
[
{"_key": "key", "missing_add":    "nonexistent1"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table MissingModeScalar
[
{"_key": "key", "missing_ignore": "nonexistent2"}
]
# [
#   [
#     -22,
#     1337566253.89858,
#     0.000355720520019531,
#     "<MissingModeScalar.missing_ignore>: failed to cast to <MissingModeScalarReferred>: <\"nonexistent2\">",
#     [
#       [
#         "grn_ra_cast_value",
#         "lib/store.c",
#         494
#       ]
#     ]
#   ],
#   1
# ]
load --table MissingModeScalar
[
{"_key": "key", "missing_nil":    "nonexistent3"}
]
# [
#   [
#     -22,
#     1337566253.89858,
#     0.000355720520019531,
#     "<MissingModeScalar.missing_nil>: failed to cast to <MissingModeScalarReferred>: <\"nonexistent3\">",
#     [
#       [
#         "grn_ra_cast_value",
#         "lib/store.c",
#         494
#       ]
#     ]
#   ],
#   1
# ]
select --table MissingModeScalar
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "missing_add",
#           "MissingModeScalarReferred"
#         ],
#         [
#           "missing_ignore",
#           "MissingModeScalarReferred"
#         ],
#         [
#           "missing_nil",
#           "MissingModeScalarReferred"
#         ]
#       ],
#       [
#         1,
#         "key",
#         "nonexistent1",
#         "",
#         ""
#       ]
#     ]
#   ]
# ]
select --table MissingModeScalarReferred
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "nonexistent1"
#       ]
#     ]
#   ]
# ]

The following table describes the differences between MISSING_* flags when a vector value that has a nonexistent key element is specified to a reference vector column:

Flag

Description

An example given value

An example set value

MISSING_ADD

The given nonexistent key is added to the referred table automatically and the ID of the newly added record is used for the element.

This is the default.

["existent1", "nonexistent", "existent2"]

The record IDs of "existent1", "nonexistent" and "existent2".

MISSING_IGNORE

The given nonexistent key element is ignored.

["existent1", "nonexistent", "existent2"]

The record IDs of "existent1" and "existent2".

MISSING_NIL

The given nonexistent key element is ignored.

If INVALID_WARN or INVALID_IGNORE are also specified, the element is replaced with 0. If INVALID_ERROR is specified or no INVALID_* are specified, the element is ignored.

See also Invalid mode.

["existent1", "nonexistent", "existent2"]

The record ID of "existent1" and 0 and the record ID of "existent2" for INVALID_WARN and INVALID_IGNORE.

The record IDs of "existent1" and "existent2" for INVALID_ERROR and no INVALID_*.

Here is an example to show differences between MISSING_* flags for a reference vector column.

First this example defines columns for all MISSING_* flags:

Execution example:

table_create \
  --name MissingModeVectorReferred \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create \
  --name MissingModeVector \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table MissingModeVector \
  --name missing_add \
  --flags COLUMN_VECTOR|MISSING_ADD \
  --type MissingModeVectorReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table MissingModeVector \
  --name missing_ignore \
  --flags COLUMN_VECTOR|MISSING_IGNORE|INVALID_IGNORE \
  --type MissingModeVectorReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table MissingModeVector \
  --name missing_nil \
  --flags COLUMN_VECTOR|MISSING_NIL|INVALID_IGNORE \
  --type MissingModeVectorReferred
# [[0, 1337566253.89858, 0.000355720520019531], true]

Then this example loads a vector that includes a nonexistent key to all columns. The specified nonexistent key for MISSING_ADD is only added to MissingModeVectorReferred automatically and the specified nonexistent keys for MISSING_IGNORE and MISSING_NIL aren’t added to MissingModeVectorReferred. The specified nonexistent key element is removed from missing_ignore’s value. The specified nonexistent key element is replaced with 0 in missing_nil’s value because INVALID_IGNORE is also specified. And the element replaced with 0 is showed as "" because it refers a record whose ID is 0 and record whose ID is 0 never exist:

Execution example:

load --table MissingModeVectorReferred
[
{"_key": "existent1"},
{"_key": "existent2"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table MissingModeVector
[
{"_key": "key", "missing_add":    ["existent1", "nonexistent1", "existent2"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table MissingModeVector
[
{"_key": "key", "missing_ignore": ["existent1", "nonexistent2", "existent2"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table MissingModeVector
[
{"_key": "key", "missing_nil":    ["existent1", "nonexistent3", "existent2"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table MissingModeVector
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "missing_add",
#           "MissingModeVectorReferred"
#         ],
#         [
#           "missing_ignore",
#           "MissingModeVectorReferred"
#         ],
#         [
#           "missing_nil",
#           "MissingModeVectorReferred"
#         ]
#       ],
#       [
#         1,
#         "key",
#         [
#           "existent1",
#           "nonexistent1",
#           "existent2"
#         ],
#         [
#           "existent1",
#           "existent2"
#         ],
#         [
#           "existent1",
#           "",
#           "existent2"
#         ]
#       ]
#     ]
#   ]
# ]
select --table MissingModeVectorReferred
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "existent1"
#       ],
#       [
#         2,
#         "existent2"
#       ],
#       [
#         3,
#         "nonexistent1"
#       ]
#     ]
#   ]
# ]

7.3.11.3.12. Invalid mode

New in version 12.0.2.

You can control how to process an invalid value in the specified new value of a data column by a INVALID_* flag. Here are available INVALID_* flags:

  • INVALID_ERROR (default)

  • INVALID_WARN

  • INVALID_IGNORE

You can’t specify multiple INVALID_* flags for a column.

INVALID_* flags are meaningful only for a COLUMN_SCALAR column and a COLUMN_VECTOR column.

If the target column is a reference column, an invalid value depends on Missing mode. If you specify MISSING_IGNORE or MISSING_NIL, a nonexistent key is an invalid value. Note that an empty string key and string keys that are empty strings by normalization aren’t an invalid value with all MISSING_* flags. They are special.

If the target column isn’t a reference column, an invalid value depends on column’s value type. For example, "invalid" is an invalid value for an Int32 scalar column.

The following table describes the differences between INVALID_* flags when an invalid value is specified to an Int32 scalar column:

Flag

Description

An example given value

An example set value

INVALID_ERROR

The given invalid value is reported as an error in Process log and by load.

The given invalid value isn’t set.

This is the default.

"invalid"

The column isn’t updated.

INVALID_WARN

The given invalid value is reported as a warning in Process log.

The given invalid value is replaced with the default value of the target scalar column. For example, 0 is the default value for an Int32 scalar column.

"nonexistent"

0

INVALID_IGNORE

The given invalid value is ignored.

The given invalid value is replaced with the default value of the target scalar column. For example, 0 is the default value for an Int32 scalar column.

"invalid"

0

Here is an example to show differences between INVALID_* flags for an Int32 scalar column.

First this example defines columns for all INVALID_* flags:

Execution example:

table_create \
  --name InvalidModeScalar \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table InvalidModeScalar \
  --name invalid_error \
  --flags COLUMN_SCALAR|INVALID_ERROR \
  --type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table InvalidModeScalar \
  --name invalid_warn \
  --flags COLUMN_SCALAR|INVALID_WARN \
  --type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table InvalidModeScalar \
  --name invalid_ignore \
  --flags COLUMN_SCALAR|INVALID_IGNORE \
  --type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]

Then this example loads 29 as initial values for all columns to show differences between them on update:

Execution example:

load --table InvalidModeScalar
[
  {
    "_key": "key",
    "invalid_error":  29,
    "invalid_warn":   29,
    "invalid_ignore": 29
  }
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select \
  --table InvalidModeScalar \
  --output_columns invalid_error,invalid_warn,invalid_ignore
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "invalid_error",
#           "Int32"
#         ],
#         [
#           "invalid_warn",
#           "Int32"
#         ],
#         [
#           "invalid_ignore",
#           "Int32"
#         ]
#       ],
#       [
#         29,
#         29,
#         29
#       ]
#     ]
#   ]
# ]

Then this example update existing column values with invalid values. The specified invalid value is reported as an error by load only with INVALID_ERROR. And the existing value isn’t updated only with INVALID_ERROR. The existing value is updated with 0 with INVALID_WARN and INVALID_IGNORE. You can’t see differences between INVALID_WARN and INVALID_IGNORE with this example but a warning message is logged in Process log only with INVALID_WARN:

Execution example:

load --table InvalidModeScalar
[
{"_key": "key", "invalid_error":  "invalid"},
]
# [
#   [
#     -22,
#     1337566253.89858,
#     0.000355720520019531,
#     "<InvalidModeScalar.invalid_error>: failed to cast to <Int32>: <\"invalid\">",
#     [
#       [
#         "grn_ra_cast_value",
#         "lib/store.c",
#         494
#       ]
#     ]
#   ],
#   1
# ]
load --table InvalidModeScalar
[
{"_key": "key", "invalid_warn":   "invalid"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table InvalidModeScalar
[
{"_key": "key", "invalid_ignore": "invalid"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select \
  --table InvalidModeScalar \
  --output_columns invalid_error,invalid_warn,invalid_ignore
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "invalid_error",
#           "Int32"
#         ],
#         [
#           "invalid_warn",
#           "Int32"
#         ],
#         [
#           "invalid_ignore",
#           "Int32"
#         ]
#       ],
#       [
#         29,
#         0,
#         0
#       ]
#     ]
#   ]
# ]

The following table describes the differences between INVALID_* flags when a vector value that has an invalid element is specified to an Int32 vector column:

Flag

Description

An example given value

An example set value

INVALID_ERROR

The given invalid element is reported as an error in Process log but load doesn’t report an error.

If the target column is a reference vector column and MISSING_NIL flag is specified, invalid elements are replaced with 0. Invalid elements are ignored otherwise.

[1, "invalid", 3]

[1, 3]

INVALID_WARN

The given invalid element is reported as a warning in Process log.

If the target column is a reference vector column and MISSING_NIL flag is specified, invalid elements are replaced with 0. Invalid elements are ignored otherwise.

[1, "invalid", 3]

[1, 3]

INVALID_IGNORE

The given invalid element is ignored.

If the target column is a reference vector column and MISSING_NIL flag is specified, invalid elements are replaced with 0. Invalid elements are ignored otherwise.

[1, "invalid", 3]

[1, 3]

Here is an example to show differences between INVALID_* flags for a reference vector column.

First this example defines columns for all INVALID_* flags:

Execution example:

table_create \
  --name InvalidModeVector \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table InvalidModeVector \
  --name invalid_error \
  --flags COLUMN_VECTOR|INVALID_ERROR \
  --type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table InvalidModeVector \
  --name invalid_warn \
  --flags COLUMN_VECTOR|INVALID_WARN \
  --type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create \
  --table InvalidModeVector \
  --name invalid_ignore \
  --flags COLUMN_VECTOR|INVALID_IGNORE \
  --type Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]

Then this example loads a vector that includes an invalid element to all columns. The all specified invalid elements are ignored regardless of INVALID_* flags. Messages in Process log are different by INVALID_* flags. If INVALID_ERROR is specified, an error message is logged in Process log. If INVALID_WARN is specified, a warning message is logged in Process log. INVALID_IGNORE is specified, no message is logged in Process log:

Execution example:

load --table InvalidModeVector
[
{"_key": "key", "invalid_error":  [1, "invalid", 3]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table InvalidModeVector
[
{"_key": "key", "invalid_warn":   [1, "invalid", 3]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table InvalidModeVector
[
{"_key": "key", "invalid_ignore": [1, "invalid", 3]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select \
  --table InvalidModeVector \
  --output_columns invalid_error,invalid_warn,invalid_ignore
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "invalid_error",
#           "Int32"
#         ],
#         [
#           "invalid_warn",
#           "Int32"
#         ],
#         [
#           "invalid_ignore",
#           "Int32"
#         ]
#       ],
#       [
#         [
#           1,
#           3
#         ],
#         [
#           1,
#           3
#         ],
#         [
#           1,
#           3
#         ]
#       ]
#     ]
#   ]
# ]

7.3.11.4. Parameters

This section describes all parameters.

7.3.11.4.1. Required parameters

There are some required parameters.

7.3.11.4.1.1. table

Specifies an existing table name for the new column.

7.3.11.4.1.2. name

Specifies the column name to be created.

The column name must be unique in the same table.

Here are available characters:

  • 0 .. 9 (digit)

  • a .. z (alphabet, lower case)

  • A .. Z (alphabet, upper case)

  • # (hash)

  • @ (at mark)

  • - (hyphen)

  • _ (underscore) (NOTE: Underscore can’t be used as the first character.)

You need to create a name with one or more the above characters. Note that you can’t use _ as the first character such as _name.

7.3.11.4.1.3. flags

Specifies the column type and column customize options.

Here are available flags:

Flag

Description

COLUMN_SCALAR

Scalar column. It can store one value. See also Scalar column or Create a scalar column.

COLUMN_VECTOR

Vector column. It can store multiple values. See also Vector column or Create a vector column.

COLUMN_INDEX

Index column. It stores data for fast search. See also Index column or Create an index column.

COMPRESS_ZLIB

It enables column value compression by zlib. You need Groonga that enables zlib support.

Compression by zlib is higher space efficiency than compression by LZ4. But compression by zlib is slower than compression by LZ4.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

COMPRESS_LZ4

It enables column value compression by LZ4. You need Groonga that enables LZ4 support.

Compression by LZ4 is faster than compression by zlib. But compression by LZ4 is lower space efficiency than compression by zlib.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

COMPRESS_ZSTD

It enables column value compression by Zstandard. You need Groonga that enables Zstandard support.

Compression by Zstandard is faster than compression by zlib and the same space efficiency as zlib.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

WITH_SECTION

It enables section support to index column.

If section support is enabled, you can support multiple documents in the same index column.

You must specify this flag to create a multiple columns index column. See also Create a multiple columns index column for details.

Section support requires additional spaces. If you don’t need section support, you should not enable section support.

This flag is available only for COLUMN_INDEX.

WITH_WEIGHT

It enables weight support to vector column or index column.

If weight support is enabled for vector column, you can add weight for each element. If weight support is enabled for index column, you can add weight for each posting. They are useful to compute suitable search score.

You must specify this flag to use adjuster. See also Create a weight vector column for details.

Weight support requires additional spaces. If you don’t need weight support, you should not enable weight support.

This flag is available only for COLUMN_VECTOR or COLUMN_INDEX.

WEIGHT_FLOAT32

New in version 10.0.3.

You can use 32bit floating point instead of 32bit unsigned integer for weight value.

You also need to specify WITH_WEIGHT.

This flag is available only for COLUMN_VECTOR or COLUMN_INDEX.

WITH_POSITION

It enables position support to index column. It means that the index column is full inverted index. (Index column is implemented as inverted index.)

If position support is enabled, you can add position in the document for each posting. It’s required for phrase search. It means that index column for full text search must enable position support because most full text search uses phrase search.

You must specify this flag to create a full text search index column. See also Create an index column for full text search for details.

Position support requires additional spaces. If you don’t need position support, you should not enable position support.

This flag is available only for COLUMN_INDEX.

INDEX_SMALL

New in version 6.0.8.

It requires to create a small index column.

If index target data are small, small index column is enough. Small index column uses fewer memory than a normal index column or a medium index column. See also Create a small index column for knowing what are “small data” and how to use this flag.

This flag is available only for COLUMN_INDEX.

INDEX_MEDIUM

New in version 6.0.8.

It requires to create a medium index column.

If index target data are medium, medium index column is enough. Medium index column uses fewer memory than a normal index column. See also Create a medium index column for knowing what are “medium data” and how to use this flag.

This flag is available only for COLUMN_INDEX.

INDEX_LARGE

New in version 9.0.2.

It requires to create a large index column.

If index target data are large, you need to use large index column. Large index column uses more memory than a normal index column but accepts more data than a normal index column. See also Create a large index column for knowing what are “large data” and how to use this flag.

This flag is available only for COLUMN_INDEX.

MISSING_ADD

New in version 12.0.2.

You can’t specify multiple MISSING_* flags. They are exclusive.

This is meaningful only for reference scalar and vector columns.

If this flag is specified and nonexistent key in the referred table is specified to the column’s value, a new record is created in the referred table automatically.

If you don’t specify any MISSING_* flag, MISSING_ADD is used as the default.

See also Missing mode.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

MISSING_IGNORE

New in version 12.0.2.

You can’t specify multiple MISSING_* flags. They are exclusive.

This is meaningful only for reference scalar and vector columns.

If this flag is specified and nonexistent key in the referred table is specified to the column’s value, the value is just ignored. If the column is a scalar column, GRN_ID_NIL (0) is stored because Groonga doesn’t support the NULL value. If the column is a vector column, the element is just removed from the value. For example, ["existent1", "nonexistent", "existent2"] is set to the vector column and "nonexistent" record doesn’t exist in the referred table, ["existent1", "existent2"] are set to the vector column.

See also Missing mode.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

MISSING_NIL

New in version 12.0.2.

You can’t specify multiple MISSING_* flags. They are exclusive.

This is meaningful only for reference scalar and vector columns.

If this flag is specified and nonexistent key in the referred table is specified to the column’s value, the value is replaced with GRN_ID_NIL (0).

See also Missing mode.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

INVALID_ERROR

New in version 12.0.2.

You can’t specify multiple INVALID_* flags. They are exclusive.

If this flag is specified and an invalid value is specified, an error is reported to Process log.

For example, "STRING" for Int32 column is an invalid value.

If the column is a scalar column, load also reports an error.

If the column is a vector column, load doesn’t reports an error but invalid values in a vector value are removed or replaced with GRN_ID_NIL (0) depending on MISSING_* flag of the column.

Note

This is an incompatible change at 12.0.2. load also reports an error for a vector column before 12.0.2.

If you don’t specify any INVALID_* flag, INVALID_ERROR is used as the default.

See also Invalid mode.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

INVALID_WARN

New in version 12.0.2.

You can’t specify multiple INVALID_* flags. They are exclusive.

If this flag is specified and an invalid value is specified, a warning is reported to Process log but no error is reported.

For example, "STRING" for Int32 column is an invalid value.

If the column is a vector column, invalid values in a vector value are removed or replaced with GRN_ID_NIL (0) depending on MISSING_* flag of the column.

See also Invalid mode.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

INVALID_IGNORE

New in version 12.0.2.

You can’t specify multiple INVALID_* flags. They are exclusive.

If this flag is specified and an invalid value is specified, it’s just ignored.

For example, "STRING" for Int32 column is an invalid value.

If the column is a vector column, invalid values in a vector value are removed or replaced with GRN_ID_NIL (0) depending on MISSING_* flag of the column.

See also Invalid mode.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

You must specify one of COLUMN_${TYPE} flags. You can’t specify two or more COLUMN_${TYPE} flags. For example, COLUMN_SCALAR|COLUMN_VECTOR is invalid.

You can combine flags with | (vertical bar) such as COLUMN_INDEX|WITH_SECTION|WITH_POSITION.

7.3.11.4.1.4. type

Specifies type of the column value.

If the column is scalar column or vector column, here are available types:

  • Builtin types described in Data types

  • Tables defined by users

If the column is index column, here are available types:

  • Tables defined by users

See also the followings:

7.3.11.4.2. Optional parameters

There is an optional parameter.

7.3.11.4.2.1. source

Specifies index target columns. You can specify one or more columns to the source parameter.

This parameter is only available for index column.

You can only specify columns of the table specified as type. You can also use the _key pseudo column for specifying the table key as index target.

If you specify multiple columns to the source parameter, separate columns with , (comma) such as _key,roles.

7.3.11.4.2.2. path

New in version 10.0.7.

Specifies a path for storing a column.

This option is useful if you want to store a column that you often use to fast storage (e.g. SSD) and store it that you don’t often use to slow storage (e.g. HDD).

You can use a relative path or an absolute path in this option. If you specify a relative path, it is resolved from the current directory for the groonga process.

The default value is none.

7.3.11.5. Return value

column_create returns true as body on success such as:

[HEADER, true]

If column_create fails, column_create returns false as body:

[HEADER, false]

Error details are in HEADER.

See Output format for HEADER.

7.3.11.6. See also