7.10.1. QueryExpanderTSV¶
7.10.1.1. Summary¶
QueryExpanderTSV
is a query expander plugin that reads synonyms
from TSV (Tab Separated Values) file. This plugin provides poor
feature than the embedded query expansion feature. For example, it
doesn’t support word normalization. But it may be easy to use because
you can manage your synonyms by TSV file. You can edit your synonyms
by spreadsheet application such as Excel. With the embedded query
expansion feature, you manage your synonyms by Groonga’s table.
7.10.1.2. Install¶
You need to register query_expanders/tsv
as a plugin before you
use QueryExpanderTSV
:
plugin_register query_expanders/tsv
7.10.1.3. Usage¶
You just add --query_expander QueryExpanderTSV
parameter to
select
command:
select --query "QUERY" --query_expander QueryExpanderTSV
If QUERY
has registered synonyms, they are expanded. For example,
there are the following synonyms.
word |
synonym 1 |
synonym 2 |
---|---|---|
groonga |
groonga |
Senna |
mroonga |
mroonga |
groonga MySQL |
The table means that synonym 1
and synonym 2
are synonyms of
word
. For example, groonga
and Senna
are synonyms of
groonga
. And mroonga
and groonga MySQL
are synonyms of
mroonga
.
Here is an example of query expnasion that uses groonga
as query:
select --query "groonga" --query_expander QueryExpanderTSV
The above command equals to the following command:
select --query "groonga OR Senna" --query_expander QueryExpanderTSV
Here is another example of query expnasion that uses mroonga
search
as query:
select --query "mroonga search" --query_expander QueryExpanderTSV
The above command equals to the following command:
select --query "(mroonga OR (groonga MySQL)) search" --query_expander QueryExpanderTSV
It is important that registered words (groonga
and mroonga
)
are only expanded to synonyms and not registered words (search
)
are not expanded. Query expansion isn’t occurred
recursively. groonga
is appeared in (mroonga OR (groonga
MySQL))
as query expansion result but it isn’t expanded.
Normally, you need to include word
itself into synonyms. For
example, groonga
and mroonga
are included in synonyms of
themselves. If you want to ignore word
itself, you don’t include
word
itself into synonyms. For example, if you want to use query
expansion as spelling correction, you should use the following
synonyms.
word |
synonym |
---|---|
gronga |
groonga |
gronga
in word
has a typo. A o
is missing. groonga
in
synonym
is the correct word.
Here is an example of using query expnasion as spelling correction:
select --query "gronga" --query_expander QueryExpanderTSV
The above command equals to the following command:
select --query "groonga" --query_expander QueryExpanderTSV
The former command has a typo in --query
value but the latter
command doesn’t have any typos.
7.10.1.4. TSV File¶
Synonyms are defined in TSV format file. This section describes about it.
7.10.1.4.1. Location¶
The file name should be synonyms.tsv
and it is located at
configuration directory. For example, /etc/groonga/synonyms.tsv
is
a TSV file location. The location is decided at build time.
You can change the location by environment variable
GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE
at run time:
% env GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE=/tmp/synonyms.tsv groonga
With the above command, /tmp/synonyms.tsv
file is used.
7.10.1.4.2. Format¶
You can define zero or more synonyms in a TSV file. You define a
word
and synonyms
pair by a line. word
is expanded to
synonyms
in --query
value. Synonyms
are combined by
OR
. For example, groonga
and Senna
synonyms are expanded
as groonga OR Senna
.
The first column is word
and the rest columns are synonyms
of
the word
. Here is a sample line for word
is groonga
and
synonyms
are groonga
and Senna
. (TAB)
means a tab
character (U+0009
):
groonga(TAB)groonga(TAB)Senna
Comment line is supported. Lines that start with #
are ignored.
Here is an example for comment line. groonga
line is ignored as
comment line:
#groonga(TAB)groonga(TAB)Senna
mroonga(TAB)mroonga(TAB)groonga MySQL
7.10.1.5. Limitation¶
You need to restart groonga to reload your synonyms. TSV file is loaded only at the plugin load time.