Format Text for Display¶

Functions related to displaying unicode text. Unicode characters don’t all have the same width so we need helper functions for displaying them.

New in version 0.2: kitchen.display API 1.0.0

kitchen.text.display.textual_width(msg, control_chars='guess', encoding='utf-8', errors='replace')¶

Get the textual width of a string

Parameters

msg – str string or byte bytes to get the width of
control_chars –
specify how to deal with control characters. Possible values are:

guess

(default) will take a guess for control character widths. Most codes will return zero width. backspace, delete, and clear delete return -1. escape currently returns -1 as well but this is not guaranteed as it’s not always correct

strict

will raise kitchen.text.exceptions.ControlCharError if a control character is encountered
encoding – If we are given a byte bytes this is used to decode it into str string. Any characters that are not decodable in this encoding will get a value dependent on the errors parameter.
errors – How to treat errors encoding the byte bytes to str string. Legal values are the same as for kitchen.text.converters.to_unicode(). The default value of replace will cause undecodable byte sequences to have a width of one. ignore will have a width of zero.

Raises

ControlCharError – if msg contains a control character and control_chars is strict.

Returns

Textual width of the msg. This is the amount of space that the string will consume on a monospace display. It’s measured in the number of cell positions or columns it will take up on a monospace display. This is not the number of glyphs that are in the string.

Note

This function can be wrong sometimes because Unicode does not specify a strict width value for all of the code points. In particular, we’ve found that some Tamil characters take up to four character cells but we return a lesser amount.

kitchen.text.display.textual_width_chop(msg, chop, encoding='utf-8', errors='replace')¶

Given a string, return it chopped to a given textual width

Parameters

msg – str string or byte bytes to chop
chop – Chop msg if it exceeds this textual width
encoding – If we are given a byte bytes, this is used to decode it into a str string. Any characters that are not decodable in this encoding will be assigned a width of one.
errors – How to treat errors encoding the byte bytes to str. Legal values are the same as for kitchen.text.converters.to_unicode()

Return type

str string

Returns

str string of the msg chopped at the given textual width

This is what you want to use instead of %.*s, as it does the “right” thing with regard to UTF-8 sequences, control characters, and characters that take more than one cell position. Eg:

>>> # Wrong: only displays 8 characters because it is operating on bytes
>>> print "%.*s" % (10, 'café ñunru!')
café ñun
>>> # Properly operates on graphemes
>>> '%s' % (textual_width_chop('café ñunru!', 10))
café ñunru
>>> # takes too many columns because the kanji need two cell positions
>>> print '1234567890\n%.*s' % (10, u'一二三四五六七八九十')
1234567890
一二三四五六七八九十
>>> # Properly chops at 10 columns
>>> print '1234567890\n%s' % (textual_width_chop(u'一二三四五六七八九十', 10))
1234567890
一二三四五

kitchen.text.display.textual_width_fill(msg, fill, chop=None, left=True, prefix='', suffix='')¶

Expand a str string to a specified textual width or chop to same

Parameters

msg – str string to format
fill – pad string until the textual width of the string is this length
chop – before doing anything else, chop the string to this length. Default: Don’t chop the string at all
left – If True (default) left justify the string and put the padding on the right. If False, pad on the left side.
prefix – Attach this string before the field we’re filling
suffix – Append this string to the end of the field we’re filling

Return type

str string

Returns

msg formatted to fill the specified width. If no chop is specified, the string could exceed the fill length when completed. If prefix or suffix are printable characters, the string could be longer than the fill width.

Note

prefix and suffix should be used for “invisible” characters like highlighting, color changing escape codes, etc. The fill characters are appended outside of any prefix or suffix elements. This allows you to only highlight msg inside of the field you’re filling.

Warning

msg, prefix, and suffix should all be representable as unicode characters. In particular, any escape sequences in prefix and suffix need to be convertible to str. If you need to use byte sequences here rather than unicode characters, use byte_string_textual_width_fill() instead.

This function expands a string to fill a field of a particular textual width. Use it instead of %*.*s, as it does the “right” thing with regard to UTF-8 sequences, control characters, and characters that take more than one cell position in a display. Example usage:

>>> msg = u'一二三四五六七八九十'
>>> # Wrong: This uses 10 characters instead of 10 cells:
>>> u":%-*.*s:" % (10, 10, msg[:9])
:一二三四五六七八九 :
>>> # This uses 10 cells like we really want:
>>> u":%s:" % (textual_width_fill(msg[:9], 10, 10))
:一二三四五:

>>> # Wrong: Right aligned in the field, but too many cells
>>> u"%20.10s" % (msg)
          一二三四五六七八九十
>>> # Correct: Right aligned with proper number of cells
>>> u"%s" % (textual_width_fill(msg, 20, 10, left=False))
          一二三四五

>>> # Wrong: Adding some escape characters to highlight the line but too many cells
>>> u"%s%20.10s%s" % (prefix, msg, suffix)
u'[7m          一二三四五六七八九十[0m'
>>> # Correct highlight of the line
>>> u"%s%s%s" % (prefix, display.textual_width_fill(msg, 20, 10, left=False), suffix)
u'[7m          一二三四五[0m'

>>> # Correct way to not highlight the fill
>>> u"%s" % (display.textual_width_fill(msg, 20, 10, left=False, prefix=prefix, suffix=suffix))
u'          [7m一二三四五[0m'

kitchen.text.display.wrap(text, width=70, initial_indent='', subsequent_indent='', encoding='utf-8', errors='replace')¶

Works like we want textwrap.wrap() to work,

Parameters

text – str string or byte bytes to wrap
width – textual width at which to wrap. Default: 70
initial_indent – string to use to indent the first line. Default: do not indent.
subsequent_indent – string to use to wrap subsequent lines. Default: do not indent
encoding – Encoding to use if text is a byte bytes
errors – error handler to use if text is a byte bytes and contains some undecodable characters.

Return type

list of str strings

Returns

list of lines that have been text wrapped and indented.

textwrap.wrap() from the python standard library has two drawbacks that this attempts to fix:

It does not handle textual width. It only operates on bytes or characters which are both inadequate (due to multi-byte and double width characters).
It malforms lists and blocks.

kitchen.text.display.fill(text, *args, **kwargs)¶

Works like we want textwrap.fill() to work

Parameters: text – str string or byte bytes to process
Returns: str string with each line separated by a newline

Internal Data¶

There are a few internal functions and variables in this module. Code outside of kitchen shouldn’t use them but people coding on kitchen itself may find them useful.

kitchen.text.display._COMBINING = ((768, 879), (1155, 1161), (1425, 1469), (1471, 1471), (1473, 1474), (1476, 1477), (1479, 1479), (1536, 1539), (1552, 1562), (1611, 1631), (1648, 1648), (1750, 1764), (1767, 1768), (1770, 1773), (1807, 1807), (1809, 1809), (1840, 1866), (1958, 1968), (2027, 2035), (2045, 2045), (2070, 2073), (2075, 2083), (2085, 2087), (2089, 2093), (2137, 2139), (2259, 2273), (2275, 2303), (2305, 2306), (2364, 2364), (2369, 2376), (2381, 2381), (2385, 2388), (2402, 2403), (2433, 2433), (2492, 2492), (2497, 2500), (2509, 2509), (2530, 2531), (2558, 2558), (2561, 2562), (2620, 2620), (2625, 2626), (2631, 2632), (2635, 2637), (2672, 2673), (2689, 2690), (2748, 2748), (2753, 2757), (2759, 2760), (2765, 2765), (2786, 2787), (2817, 2817), (2876, 2876), (2879, 2879), (2881, 2883), (2893, 2893), (2902, 2902), (2946, 2946), (3008, 3008), (3021, 3021), (3134, 3136), (3142, 3144), (3146, 3149), (3157, 3158), (3260, 3260), (3263, 3263), (3270, 3270), (3276, 3277), (3298, 3299), (3387, 3388), (3393, 3395), (3405, 3405), (3530, 3530), (3538, 3540), (3542, 3542), (3633, 3633), (3636, 3642), (3655, 3662), (3761, 3761), (3764, 3772), (3784, 3789), (3864, 3865), (3893, 3893), (3895, 3895), (3897, 3897), (3953, 3966), (3968, 3972), (3974, 3975), (3984, 3991), (3993, 4028), (4038, 4038), (4141, 4144), (4146, 4146), (4150, 4151), (4153, 4154), (4184, 4185), (4237, 4237), (4448, 4607), (4957, 4959), (5906, 5908), (5938, 5940), (5970, 5971), (6002, 6003), (6068, 6069), (6071, 6077), (6086, 6086), (6089, 6099), (6109, 6109), (6155, 6157), (6313, 6313), (6432, 6434), (6439, 6440), (6450, 6450), (6457, 6459), (6679, 6680), (6752, 6752), (6773, 6780), (6783, 6783), (6832, 6845), (6912, 6915), (6964, 6964), (6966, 6970), (6972, 6972), (6978, 6978), (6980, 6980), (7019, 7027), (7082, 7083), (7142, 7142), (7154, 7155), (7223, 7223), (7376, 7378), (7380, 7392), (7394, 7400), (7405, 7405), (7412, 7412), (7416, 7417), (7616, 7673), (7675, 7679), (8203, 8207), (8234, 8238), (8288, 8291), (8298, 8303), (8400, 8432), (11503, 11505), (11647, 11647), (11744, 11775), (12330, 12335), (12441, 12442), (42607, 42607), (42612, 42621), (42654, 42655), (42736, 42737), (43014, 43014), (43019, 43019), (43045, 43046), (43204, 43204), (43232, 43249), (43307, 43309), (43347, 43347), (43443, 43443), (43456, 43456), (43696, 43696), (43698, 43700), (43703, 43704), (43710, 43711), (43713, 43713), (43766, 43766), (44013, 44013), (64286, 64286), (65024, 65039), (65056, 65071), (65279, 65279), (65529, 65531), (66045, 66045), (66272, 66272), (66422, 66426), (68097, 68099), (68101, 68102), (68108, 68111), (68152, 68154), (68159, 68159), (68325, 68326), (68900, 68903), (69446, 69456), (69702, 69702), (69759, 69759), (69817, 69818), (69888, 69890), (69939, 69940), (70003, 70003), (70080, 70080), (70090, 70090), (70197, 70198), (70377, 70378), (70459, 70460), (70477, 70477), (70502, 70508), (70512, 70516), (70722, 70722), (70726, 70726), (70750, 70750), (70850, 70851), (71103, 71104), (71231, 71231), (71350, 71351), (71467, 71467), (71737, 71738), (72160, 72160), (72244, 72244), (72263, 72263), (72345, 72345), (72767, 72767), (73026, 73026), (73028, 73029), (73111, 73111), (92912, 92916), (92976, 92982), (113822, 113822), (119141, 119145), (119149, 119170), (119173, 119179), (119210, 119213), (119362, 119364), (122880, 122886), (122888, 122904), (122907, 122913), (122915, 122916), (122918, 122922), (123184, 123190), (123628, 123631), (125136, 125142), (125252, 125258), (917505, 917505), (917536, 917631), (917760, 917999))¶

Internal table, provided by this module to list code points which combine with other characters and therefore should have no textual width. This is a sorted tuple of non-overlapping intervals. Each interval is a tuple listing a starting code point and ending code point. Every code point between the two end points is a combining character.

Format Text for Display¶

Internal Data¶

Table of Contents

Previous topic

Next topic

This Page