Extracted from Pike v7.8 release 866 at 2016-11-06.
pike.ida.liu.se
[Top]
Sql
Sql.mysql

Method Sql.mysql()->set_unicode_encode_mode()


Method set_unicode_encode_mode

int(0..1) set_unicode_encode_mode(int enable)

Description

Enables or disables unicode encode mode.

In this mode, if the server supports UTF-8 and the connection charset is latin1 (the default) or unicode then big_query handles wide unicode queries. Enabled by default.

Unicode encode mode works as follows: Eight bit strings are sent as latin1 and wide strings are sent using utf8. big_query sends SET character_set_client statements as necessary to update the charset on the server side. If the server doesn't support that then it fails, but the wide string query would fail anyway.

To make this transparent, string literals with introducers (e.g. _binary 'foo') are excluded from the UTF-8 encoding. This means that big_query needs to do some superficial parsing of the query when it is a wide string.

Returns
1

Unicode encode mode is enabled.

0

Unicode encode mode couldn't be enabled because an incompatible connection charset is set. You need to do set_charset ("latin1") or set_charset ("unicode") to enable it.


Note

Note that this mode doesn't affect the MySQL system variable character_set_connection, i.e. it will still be set to latin1 by default which means server functions like UPPER() won't handle non-latin1 characters correctly in all cases.

To fix that, do set_charset ("unicode"). That will allow unicode encode mode to work while utf8 is fully enabled at the server side.

Tip: If you enable utf8 on the server side, you need to send raw binary strings as _binary'...'. Otherwise they will get UTF-8 encoded by the server.

Note

When unicode encode mode is enabled and the connection charset is latin1, the charset accepted by big_query is not quite Unicode since latin1 is based on cp1252. The differences are in the range 0x80..0x9f where Unicode has control chars.

This small discrepancy is not present when the connection charset is unicode.

See also

set_unicode_decode_mode , set_charset