4.5. Drilldown

You learned how to search and sort searched results in the previous sections. Now that you can search as you likes, but how do you summarize the number of records which has specific value in the column?

As you know, there is a naive solution to execute query by every the value of column, then you can get the number of records as a result. It is a simple way, but it is not reasonable to many records.

If you are familiar with SQL, you will doubt with “Is there a similar SQL functionality to GROUP BY in Groonga?”.

Of course, Groonga provides such a functionality. It’s called as drilldown.

drilldown enables you to get the number of records which belongs to specific the value of column at once.

To illustrate this feature, imagine the case that classification by domain and grouping by country that domain belongs to.

Here is the concrete examples how to use this feature.

In this example, we add two columns to Site table. domain column is used for TLD (top level domain). country column is used for country name. The type of these columns are SiteDomain table which uses domain name as a primary key and SiteCountry table which uses country name as a primary key.

Execution example:

table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","domain":".org","country":"japan"},
{"_key":"http://example.net/","domain":".net","country":"brazil"},
{"_key":"http://example.com/","domain":".com","country":"japan"},
{"_key":"http://example.net/afr","domain":".net","country":"usa"},
{"_key":"http://example.org/aba","domain":".org","country":"korea"},
{"_key":"http://example.com/rab","domain":".com","country":"china"},
{"_key":"http://example.net/atv","domain":".net","country":"china"},
{"_key":"http://example.org/gat","domain":".org","country":"usa"},
{"_key":"http://example.com/vdw","domain":".com","country":"japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]

Here is a example of drilldown with domain column. Three kind of values are used in domain column - “.org”, “.net” and “.com”.

Execution example:

select --table Site --limit 0 --drilldown domain
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         9
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "country",
#           "SiteCountry"
#         ],
#         [
#           "domain",
#           "SiteDomain"
#         ],
#         [
#           "link",
#           "Site"
#         ],
#         [
#           "links",
#           "Site"
#         ],
#         [
#           "location",
#           "WGS84GeoPoint"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ]
#     ],
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         ".org",
#         3
#       ],
#       [
#         ".net",
#         3
#       ],
#       [
#         ".com",
#         3
#       ]
#     ]
#   ]
# ]

Here is a summary of above query.

Drilldown by domain column

Group by

The number of group records

Group records means following records

.org

3

.net

3

.com

3

The value of drilldown are returned as the value of _nsubrecs column. In this case, Site table is grouped by “.org”, “.net”, “.com” domain. _nsubrecs shows that each three domain has three records.

If you execute drildown to the column which has table as a type, you can get the value of column which is stored in referenced table. _nsubrecs pseudo column is added to the table which is used for drilldown. this pseudo column stores the number of records which is grouped by.

Then, investigate referenced table in detail. As Site table use SiteDomain table as column type of domain, you can use --drilldown_output_columns to know detail of referenced column.

Execution example:

select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         9
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "country",
#           "SiteCountry"
#         ],
#         [
#           "domain",
#           "SiteDomain"
#         ],
#         [
#           "link",
#           "Site"
#         ],
#         [
#           "links",
#           "Site"
#         ],
#         [
#           "location",
#           "WGS84GeoPoint"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ]
#     ],
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         1,
#         ".org",
#         3
#       ],
#       [
#         2,
#         ".net",
#         3
#       ],
#       [
#         3,
#         ".com",
#         3
#       ]
#     ]
#   ]
# ]

Now, you can see detail of each grouped domain, drilldown by country column which has “.org” as column value.

Execution example:

select --table Site --limit 0 --filter "domain._id == 1" --drilldown country
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "country",
#           "SiteCountry"
#         ],
#         [
#           "domain",
#           "SiteDomain"
#         ],
#         [
#           "link",
#           "Site"
#         ],
#         [
#           "links",
#           "Site"
#         ],
#         [
#           "location",
#           "WGS84GeoPoint"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ]
#     ],
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         "japan",
#         1
#       ],
#       [
#         "korea",
#         1
#       ],
#       [
#         "usa",
#         1
#       ]
#     ]
#   ]
# ]

4.5.1. Drilldown with multiple column

Drilldown feature supports multiple column. Use comma separated multiple column names as drildown parameter. You can get the each result of drilldown at once.

Execution example:

select --table Site --limit 0 --drilldown domain,country
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         9
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "country",
#           "SiteCountry"
#         ],
#         [
#           "domain",
#           "SiteDomain"
#         ],
#         [
#           "link",
#           "Site"
#         ],
#         [
#           "links",
#           "Site"
#         ],
#         [
#           "location",
#           "WGS84GeoPoint"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ]
#     ],
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         ".org",
#         3
#       ],
#       [
#         ".net",
#         3
#       ],
#       [
#         ".com",
#         3
#       ]
#     ],
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         "japan",
#         3
#       ],
#       [
#         "brazil",
#         1
#       ],
#       [
#         "usa",
#         2
#       ],
#       [
#         "korea",
#         1
#       ],
#       [
#         "china",
#         2
#       ]
#     ]
#   ]
# ]

4.5.2. Sorting drildown results

Use --drilldown_sort_keys if you want to sort the result of drilldown. For example, specify _nsubrecs as ascending order.

Execution example:

select --table Site --limit 0 --drilldown country --drilldown_sort_keys _nsubrecs
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         9
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "country",
#           "SiteCountry"
#         ],
#         [
#           "domain",
#           "SiteDomain"
#         ],
#         [
#           "link",
#           "Site"
#         ],
#         [
#           "links",
#           "Site"
#         ],
#         [
#           "location",
#           "WGS84GeoPoint"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ]
#     ],
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         "brazil",
#         1
#       ],
#       [
#         "korea",
#         1
#       ],
#       [
#         "usa",
#         2
#       ],
#       [
#         "china",
#         2
#       ],
#       [
#         "japan",
#         3
#       ]
#     ]
#   ]
# ]

4.5.3. limits drildown results

The number of drilldown results is limited to 10 as a default. Use drilldown_limit and drilldown_offset parameter to customize orilldown results.

Execution example:

select --table Site --limit 0 --drilldown country --drilldown_sort_keys _nsubrecs --drilldown_limit 2 --drilldown_offset 2
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         9
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "country",
#           "SiteCountry"
#         ],
#         [
#           "domain",
#           "SiteDomain"
#         ],
#         [
#           "link",
#           "Site"
#         ],
#         [
#           "links",
#           "Site"
#         ],
#         [
#           "location",
#           "WGS84GeoPoint"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ]
#     ],
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_nsubrecs",
#           "Int32"
#         ]
#       ],
#       [
#         "usa",
#         2
#       ],
#       [
#         "china",
#         2
#       ]
#     ]
#   ]
# ]

Note that drilldown to the column which stores string is slower than the columns which stores the other types. If you drilldown to string type of column, create the table that type of primary key is string, then create the column which refers that table.