Format Text for Display¶
Functions related to displaying unicode text. Unicode characters don’t all have the same width so we need helper functions for displaying them.
New in version 0.2: kitchen.display API 1.0.0
- kitchen.text.display.textual_width(msg, control_chars='guess', encoding='utf-8', errors='replace')¶
Get the textual width of a string
- Parameters
msg –
str
string or bytebytes
to get the width ofcontrol_chars –
specify how to deal with control characters. Possible values are:
- guess
(default) will take a guess for control character widths. Most codes will return zero width.
backspace
,delete
, andclear delete
return -1.escape
currently returns -1 as well but this is not guaranteed as it’s not always correct- strict
will raise
kitchen.text.exceptions.ControlCharError
if a control character is encountered
encoding – If we are given a byte
bytes
this is used to decode it intostr
string. Any characters that are not decodable in this encoding will get a value dependent on theerrors
parameter.errors – How to treat errors encoding the byte
bytes
tostr
string. Legal values are the same as forkitchen.text.converters.to_unicode()
. The default value ofreplace
will cause undecodable byte sequences to have a width of one.ignore
will have a width of zero.
- Raises
ControlCharError – if
msg
contains a control character andcontrol_chars
isstrict
.- Returns
Textual width of the
msg
. This is the amount of space that the string will consume on a monospace display. It’s measured in the number of cell positions or columns it will take up on a monospace display. This is not the number of glyphs that are in the string.
Note
This function can be wrong sometimes because Unicode does not specify a strict width value for all of the code points. In particular, we’ve found that some Tamil characters take up to four character cells but we return a lesser amount.
- kitchen.text.display.textual_width_chop(msg, chop, encoding='utf-8', errors='replace')¶
Given a string, return it chopped to a given textual width
- Parameters
msg –
str
string or bytebytes
to chopchop – Chop
msg
if it exceeds this textual widthencoding – If we are given a byte
bytes
, this is used to decode it into astr
string. Any characters that are not decodable in this encoding will be assigned a width of one.errors – How to treat errors encoding the byte
bytes
tostr
. Legal values are the same as forkitchen.text.converters.to_unicode()
- Return type
str
string- Returns
str
string of themsg
chopped at the given textual width
This is what you want to use instead of
%.*s
, as it does the “right” thing with regard to UTF-8 sequences, control characters, and characters that take more than one cell position. Eg:>>> # Wrong: only displays 8 characters because it is operating on bytes >>> print "%.*s" % (10, 'café ñunru!') café ñun >>> # Properly operates on graphemes >>> '%s' % (textual_width_chop('café ñunru!', 10)) café ñunru >>> # takes too many columns because the kanji need two cell positions >>> print '1234567890\n%.*s' % (10, u'一二三四五六七八九十') 1234567890 一二三四五六七八九十 >>> # Properly chops at 10 columns >>> print '1234567890\n%s' % (textual_width_chop(u'一二三四五六七八九十', 10)) 1234567890 一二三四五
- kitchen.text.display.textual_width_fill(msg, fill, chop=None, left=True, prefix='', suffix='')¶
Expand a
str
string to a specified textual width or chop to same- Parameters
msg –
str
string to formatfill – pad string until the textual width of the string is this length
chop – before doing anything else, chop the string to this length. Default: Don’t chop the string at all
left – If
True
(default) left justify the string and put the padding on the right. IfFalse
, pad on the left side.prefix – Attach this string before the field we’re filling
suffix – Append this string to the end of the field we’re filling
- Return type
str
string- Returns
msg
formatted to fill the specified width. If nochop
is specified, the string could exceed the fill length when completed. Ifprefix
orsuffix
are printable characters, the string could be longer than the fill width.
Note
prefix
andsuffix
should be used for “invisible” characters like highlighting, color changing escape codes, etc. The fill characters are appended outside of anyprefix
orsuffix
elements. This allows you to only highlightmsg
inside of the field you’re filling.Warning
msg
,prefix
, andsuffix
should all be representable as unicode characters. In particular, any escape sequences inprefix
andsuffix
need to be convertible tostr
. If you need to use byte sequences here rather than unicode characters, usebyte_string_textual_width_fill()
instead.This function expands a string to fill a field of a particular textual width. Use it instead of
%*.*s
, as it does the “right” thing with regard to UTF-8 sequences, control characters, and characters that take more than one cell position in a display. Example usage:>>> msg = u'一二三四五六七八九十' >>> # Wrong: This uses 10 characters instead of 10 cells: >>> u":%-*.*s:" % (10, 10, msg[:9]) :一二三四五六七八九 : >>> # This uses 10 cells like we really want: >>> u":%s:" % (textual_width_fill(msg[:9], 10, 10)) :一二三四五: >>> # Wrong: Right aligned in the field, but too many cells >>> u"%20.10s" % (msg) 一二三四五六七八九十 >>> # Correct: Right aligned with proper number of cells >>> u"%s" % (textual_width_fill(msg, 20, 10, left=False)) 一二三四五 >>> # Wrong: Adding some escape characters to highlight the line but too many cells >>> u"%s%20.10s%s" % (prefix, msg, suffix) u'[7m 一二三四五六七八九十[0m' >>> # Correct highlight of the line >>> u"%s%s%s" % (prefix, display.textual_width_fill(msg, 20, 10, left=False), suffix) u'[7m 一二三四五[0m' >>> # Correct way to not highlight the fill >>> u"%s" % (display.textual_width_fill(msg, 20, 10, left=False, prefix=prefix, suffix=suffix)) u' [7m一二三四五[0m'
- kitchen.text.display.wrap(text, width=70, initial_indent='', subsequent_indent='', encoding='utf-8', errors='replace')¶
Works like we want
textwrap.wrap()
to work,- Parameters
text –
str
string or bytebytes
to wrapwidth – textual width at which to wrap. Default: 70
initial_indent – string to use to indent the first line. Default: do not indent.
subsequent_indent – string to use to wrap subsequent lines. Default: do not indent
encoding – Encoding to use if
text
is a bytebytes
errors – error handler to use if
text
is a bytebytes
and contains some undecodable characters.
- Return type
list
ofstr
strings- Returns
list of lines that have been text wrapped and indented.
textwrap.wrap()
from the python standard library has two drawbacks that this attempts to fix:It does not handle textual width. It only operates on bytes or characters which are both inadequate (due to multi-byte and double width characters).
It malforms lists and blocks.
- kitchen.text.display.fill(text, *args, **kwargs)¶
Works like we want
textwrap.fill()
to work- Parameters
text –
str
string or bytebytes
to process- Returns
str
string with each line separated by a newline
See also
kitchen.text.display.wrap()
for other parameters that you can give this command.
This function is a light wrapper around
kitchen.text.display.wrap()
. Where that function returns alist
of lines, this function returns one string with each line separated by a newline.
- kitchen.text.display.byte_string_textual_width_fill(msg, fill, chop=None, left=True, prefix='', suffix='', encoding='utf-8', errors='replace')¶
Expand a byte
bytes
to a specified textual width or chop to same- Parameters
msg – byte
bytes
encoded in UTF-8 that we want formattedfill – pad
msg
until the textual width is this longchop – before doing anything else, chop the string to this length. Default: Don’t chop the string at all
left – If
True
(default) left justify the string and put the padding on the right. IfFalse
, pad on the left side.prefix – Attach this byte
bytes
before the field we’re fillingsuffix – Append this byte
bytes
to the end of the field we’re filling
- Return type
byte
bytes
- Returns
msg
formatted to fill the specified textual width. If nochop
is specified, the string could exceed the fill length when completed. Ifprefix
orsuffix
are printable characters, the string could be longer than fill width.
Note
prefix
andsuffix
should be used for “invisible” characters like highlighting, color changing escape codes, etc. The fill characters are appended outside of anyprefix
orsuffix
elements. This allows you to only highlightmsg
inside of the field you’re filling.See also
textual_width_fill()
For example usage. This function has only two differences.
it takes byte
bytes
forprefix
andsuffix
so you can pass in arbitrary sequences of bytes, not just unicode characters.it returns a byte
bytes
instead of astr
string.
Internal Data¶
There are a few internal functions and variables in this module. Code outside of kitchen shouldn’t use them but people coding on kitchen itself may find them useful.
- kitchen.text.display._COMBINING = ((768, 879), (1155, 1161), (1425, 1469), (1471, 1471), (1473, 1474), (1476, 1477), (1479, 1479), (1536, 1539), (1552, 1562), (1611, 1631), (1648, 1648), (1750, 1764), (1767, 1768), (1770, 1773), (1807, 1807), (1809, 1809), (1840, 1866), (1958, 1968), (2027, 2035), (2045, 2045), (2070, 2073), (2075, 2083), (2085, 2087), (2089, 2093), (2137, 2139), (2259, 2273), (2275, 2303), (2305, 2306), (2364, 2364), (2369, 2376), (2381, 2381), (2385, 2388), (2402, 2403), (2433, 2433), (2492, 2492), (2497, 2500), (2509, 2509), (2530, 2531), (2558, 2558), (2561, 2562), (2620, 2620), (2625, 2626), (2631, 2632), (2635, 2637), (2672, 2673), (2689, 2690), (2748, 2748), (2753, 2757), (2759, 2760), (2765, 2765), (2786, 2787), (2817, 2817), (2876, 2876), (2879, 2879), (2881, 2883), (2893, 2893), (2902, 2902), (2946, 2946), (3008, 3008), (3021, 3021), (3134, 3136), (3142, 3144), (3146, 3149), (3157, 3158), (3260, 3260), (3263, 3263), (3270, 3270), (3276, 3277), (3298, 3299), (3387, 3388), (3393, 3395), (3405, 3405), (3530, 3530), (3538, 3540), (3542, 3542), (3633, 3633), (3636, 3642), (3655, 3662), (3761, 3761), (3764, 3772), (3784, 3789), (3864, 3865), (3893, 3893), (3895, 3895), (3897, 3897), (3953, 3966), (3968, 3972), (3974, 3975), (3984, 3991), (3993, 4028), (4038, 4038), (4141, 4144), (4146, 4146), (4150, 4151), (4153, 4154), (4184, 4185), (4237, 4237), (4448, 4607), (4957, 4959), (5906, 5908), (5938, 5940), (5970, 5971), (6002, 6003), (6068, 6069), (6071, 6077), (6086, 6086), (6089, 6099), (6109, 6109), (6155, 6157), (6313, 6313), (6432, 6434), (6439, 6440), (6450, 6450), (6457, 6459), (6679, 6680), (6752, 6752), (6773, 6780), (6783, 6783), (6832, 6845), (6912, 6915), (6964, 6964), (6966, 6970), (6972, 6972), (6978, 6978), (6980, 6980), (7019, 7027), (7082, 7083), (7142, 7142), (7154, 7155), (7223, 7223), (7376, 7378), (7380, 7392), (7394, 7400), (7405, 7405), (7412, 7412), (7416, 7417), (7616, 7673), (7675, 7679), (8203, 8207), (8234, 8238), (8288, 8291), (8298, 8303), (8400, 8432), (11503, 11505), (11647, 11647), (11744, 11775), (12330, 12335), (12441, 12442), (42607, 42607), (42612, 42621), (42654, 42655), (42736, 42737), (43014, 43014), (43019, 43019), (43045, 43046), (43204, 43204), (43232, 43249), (43307, 43309), (43347, 43347), (43443, 43443), (43456, 43456), (43696, 43696), (43698, 43700), (43703, 43704), (43710, 43711), (43713, 43713), (43766, 43766), (44013, 44013), (64286, 64286), (65024, 65039), (65056, 65071), (65279, 65279), (65529, 65531), (66045, 66045), (66272, 66272), (66422, 66426), (68097, 68099), (68101, 68102), (68108, 68111), (68152, 68154), (68159, 68159), (68325, 68326), (68900, 68903), (69446, 69456), (69702, 69702), (69759, 69759), (69817, 69818), (69888, 69890), (69939, 69940), (70003, 70003), (70080, 70080), (70090, 70090), (70197, 70198), (70377, 70378), (70459, 70460), (70477, 70477), (70502, 70508), (70512, 70516), (70722, 70722), (70726, 70726), (70750, 70750), (70850, 70851), (71103, 71104), (71231, 71231), (71350, 71351), (71467, 71467), (71737, 71738), (72160, 72160), (72244, 72244), (72263, 72263), (72345, 72345), (72767, 72767), (73026, 73026), (73028, 73029), (73111, 73111), (92912, 92916), (92976, 92982), (113822, 113822), (119141, 119145), (119149, 119170), (119173, 119179), (119210, 119213), (119362, 119364), (122880, 122886), (122888, 122904), (122907, 122913), (122915, 122916), (122918, 122922), (123184, 123190), (123628, 123631), (125136, 125142), (125252, 125258), (917505, 917505), (917536, 917631), (917760, 917999))¶
Internal table, provided by this module to list code points which combine with other characters and therefore should have no textual width. This is a sorted
tuple
of non-overlapping intervals. Each interval is atuple
listing a starting code point and ending code point. Every code point between the two end points is a combining character.See also
_generate_combining_table()
for how this table is generated
This table was last regenerated on python-3.8.0a3 with
unicodedata.unidata_version
12.0.0
- kitchen.text.display._generate_combining_table()¶
Combine Markus Kuhn’s data with
unicodedata
to make combining char list- Return type
tuple
of tuples- Returns
tuple
of intervals of code points that are combining character. Each interval is a 2-tuple
of the starting code point and the ending code point for the combining characters.
In normal use, this function serves to tell how we’re generating the combining char list. For speed reasons, we use this to generate a static list and just use that later.
Markus Kuhn’s list of combining characters is more complete than what’s in the python
unicodedata
library but the pythonunicodedata
is synced against later versions of the unicode databaseThis is used to generate the
_COMBINING
table.
- kitchen.text.display._print_combining_table()¶
Print out a new
_COMBINING
tableThis will print a new
_COMBINING
table in the format used inkitchen/text/display.py
. It’s useful for updating the_COMBINING
table with updated data from a new python as the format won’t change from what’s already in the file.
- kitchen.text.display._interval_bisearch(value, table)¶
Binary search in an interval table.
- Parameters
value – numeric value to search for
table – Ordered list of intervals. This is a list of two-tuples. The elements of the two-tuple define an interval’s start and end points.
- Returns
If
value
is found within an interval in thetable
returnTrue
. Otherwise,False
This function checks whether a numeric value is present within a table of intervals. It checks using a binary search algorithm, dividing the list of values in half and checking against the values until it determines whether the value is in the table.
- kitchen.text.display._ucp_width(ucs, control_chars='guess')¶
Get the textual width of a ucs character
- Parameters
ucs – integer representing a single unicode code point
control_chars –
specify how to deal with control characters. Possible values are:
- guess
(default) will take a guess for control character widths. Most codes will return zero width.
backspace
,delete
, andclear delete
return -1.escape
currently returns -1 as well but this is not guaranteed as it’s not always correct- strict
will raise
ControlCharError
if a control character is encountered
- Raises
ControlCharError – if the code point is a unicode control character and
control_chars
is set to ‘strict’- Returns
textual width of the character.
Note
It’s important to remember this is textual width and not the number of characters or bytes.
- kitchen.text.display._textual_width_le(width, *args)¶
Optimize the common case when deciding which textual width is larger
- Parameters
width – textual width to compare against.
*args –
str
strings to check the total textual width of
- Returns
True
if the total length ofargs
are less than or equal towidth
. OtherwiseFalse
.
We often want to know “does X fit in Y”. It takes a while to use
textual_width()
to calculate this. However, we know that the number of canonically composedstr
characters is always going to have 1 or 2 for the textual width per character. With this we can take the following shortcuts:If the number of canonically composed characters is more than width, the true textual width cannot be less than width.
If the number of canonically composed characters * 2 is less than the width then the textual width must be ok.
textual width of a canonically composed
str
string will always be greater than or equal to the the number ofstr
characters. So we can first check if the number of composedstr
characters is less than the asked for width. If it is we can returnTrue
immediately. If not, then we must do a full textual width lookup.