7.3.25. load

7.3.25.1. Summary

load loads data as records in the current database and updates values of each columns.

7.3.25.2. Syntax

The required parameters are only values and table. Other parameters are optional:

load values
     table
     [columns=null]
     [ifexists=null]
     [input_type=json]
     [each=null]
     [output_ids=no]
     [output_errors=no]
     [lock_table=no]

This command is a special command. Other commands need to pass all parameters to one line but this command can accept values as followed data.

If you use command line style, you can pass values like the following:

load --table Bookmarks
[
{"_key": "http://groonga.org/", "title": "Groonga"},
{"_key": "http://mroonga.org/", "title": "Mroonga"}
]

[...] is value of values.

If you use HTTP style, you can pass values as body:

% curl \
    --request POST \
    --header "Content-Type: application/json" \
    --data-raw '[{"_key": "http://groonga.org/"}]' \
    http://localhost:10041/d/load?table=Bookmarks"

7.3.25.3. Usage

Here is a schema definition to show usage:

Execution example:

table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to add records to Entries table by parameter:

Execution example:

load \
  --table Entries \
  --values "[{\"_key\":\"Groonga\",\"content\":\"It's very fast!!\"}]"
# [[0, 1337566253.89858, 0.000355720520019531], 1]

Here is an example to add records to Entries table from standard input:

Execution example:

load --table Entries
[
{"_key": "Groonga", "content": "It's very fast!!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

Here is an example to lock table while updating columns:

Execution example:

load --table Entries --lock_table yes
[
{"_key": "Groonga", "content": "It's very fast!!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

7.3.25.4. Parameters

This section describes all parameters. Parameters are categorized.

7.3.25.4.1. Required parameters

There are some required parameters.

7.3.25.4.1.1. values

Specifies values to be loaded.

Values should satisfy input_type format. If you specify json as input_type, you can choose a format from below:

Bracket style:

[
[COLUMN_NAME1, COLUMN_NAME2, ...],
[VALUE1, VALUE2, ...],
[VALUE1, VALUE2, ...],
...
]

Brace style:

[
{"COLUMN_NAME1": VALUE1, "COLUMN_NAME2": VALUE2, ...},
{"COLUMN_NAME1": VALUE1, "COLUMN_NAME2": VALUE2, ...},
...
]

[COLUMN_NAME1, COLUMN_NAME2, ...] in bracket style is effective only when columns parameter isn’t specified.

When a target table contains primary key, you must specify _key column (pseudo column associated primary key) as the one of COLUMN_NAME.

If you specify apache-arrow as input_type, you must use Apache Arrow IPC Streaming Format. You can’t use Apache Arrow IPC File Format.

You must use HTTP interface to use Apache Arrow. You can’t use Apache Arrow in command line interface.

You must set application/x-apache-arrow-streaming to Content-Type HTTP request header.

You must choose suitable record batch size. Groonga loads data per record batch. If you choose very large record batch size, Groonga can’t start loading until whole data of a record batch are received. If you choose very small record batch size, Groonga can load data incrementally but overhead will be large. Suitable record batch size depends on your system but 1024 or so will be suitable.

If values isn’t specified any values, they are read from the standard input in command line style or body in HTTP style.

7.3.25.4.1.2. table

Specifies a table name you want to add records.

7.3.25.4.2. Optional parameters

There are some optional parameters.

7.3.25.4.2.1. columns

Specifies column names in added records with comma separations.

7.3.25.4.2.2. ifexists

Specifies executed expression in Script syntax when the same primary key as added records already exists in your table.

If ifexists specifies expression and its value is true, values in other (all columns excluding _key column) columns is updated.

7.3.25.4.2.3. input_type

Specifies an input format for values.

Note that you must also specify suitable HTTP Content-Type header value when you use input_type with HTTP interface.

Here are available types and Content-Type values:

Type

Content-Type

Description

json

application/json

Use JSON for values format.

This is the default.

apache-arrow

application/x-apache-arrow-streaming

New in version 9.1.1.

Use Apache Arrow for values format.

7.3.25.4.2.4. each

TODO

7.3.25.4.2.5. output_ids

TODO

7.3.25.4.2.6. output_errors

TODO

7.3.25.4.2.7. lock_table

New in version 8.0.6.

Specifies whether locking table while updating columns.

The default is no.

If you may run destructive commands such as load, delete and so on concurrently, it may break database. For example, if you’re updating a record by load and deleting the updating record by delete, the load may refer the delete record.

You can guard the update conflict by locking the target table but it reduces load performance.

If you specify yes to this parameter, you can lock the target table while updating columns. Here is the update sequence of each record:

  1. Lock the target table

  2. Add or refer a record to the target table

  3. Unlock the target table

  4. Lock the target table when lock_table is yes

  5. Update columns of the target record

  6. Unlock the target table when lock_table is yes

7.3.25.5. Return value

The command returns a response with the following format:

[THE_NUMBER_OF_LOADED_RECORDS]

The command returns a response with the following format with Command version 3 or later:

{
  "n_loaded_records": THE_NUMBER_OF_LOADED_RECORDS,
  "loaded_ids": [
    LOADED_RECORD'S_ID1,
    LOADED_RECORD'S_ID2,
    ...
  ],
  "errors": [
    {
      "return_code": RETURN_CODE_FOR_1ST_RECORD,
      "message": MESSAGE_FOR_1ST_RECORD
    },
    {
      "return_code": RETURN_CODE_FOR_2ND_RECORD,
      "message": MESSAGE_FOR_2ND_RECORD
    },
    ...
  ]
}

loaded_ids is only included when output_ids is yes.

errors is only included when output_errors is yes.

7.3.25.6. See also