Text zones

See also

Lizardtech DjVu Reference (8.3.5 Text Chunk).

Representing text zones as S-expressions is DjVuLibre-specific; see djvused manual for reference.

class djvu.decode.PageText(page[, details=TEXT_DETAILS_ALL])

A wrapper around page text.

details controls the level of details in the returned S-expression:

  • TEXT_DETAILS_PAGE, or

  • TEXT_DETAILS_COLUMN, or

  • TEXT_DETAILS_REGION, or

  • TEXT_DETAILS_PARAGRAPH, or

  • TEXT_DETAILS_LINE, or

  • TEXT_DETAILS_WORD, or

  • TEXT_DETAILS_CHARACTER, or

  • TEXT_DETAILS_ALL.

wait()

Wait until the associated S-expression is available.

page
Return type

Page

sexpr
Return type

djvu.sexpr.Expression

Raises
class djvu.const.TextZoneType

A type of a text zone.

To create objects of this class, use the get_text_zone_type() function.

djvu.const.get_text_zone_type(symbol)

Return one of the following text zone types:

djvu.const.TEXT_ZONE_PAGE
>>> get_text_zone_type(Symbol('page')) is TEXT_ZONE_PAGE
True
djvu.const.TEXT_ZONE_COLUMN
>>> get_text_zone_type(Symbol('column')) is TEXT_ZONE_COLUMN
True
djvu.const.TEXT_ZONE_REGION
>>> get_text_zone_type(Symbol('region')) is TEXT_ZONE_REGION
True
djvu.const.TEXT_ZONE_PARAGRAPH
>>> get_text_zone_type(Symbol('para')) is TEXT_ZONE_PARAGRAPH
True
djvu.const.TEXT_ZONE_LINE
>>> get_text_zone_type(Symbol('line')) is TEXT_ZONE_LINE
True
djvu.const.TEXT_ZONE_WORD
>>> get_text_zone_type(Symbol('word')) is TEXT_ZONE_WORD
True
djvu.const.TEXT_ZONE_CHARACTER
>>> get_text_zone_type(Symbol('char')) is TEXT_ZONE_CHARACTER
True

You can compare text zone types using the > operator:

>>> TEXT_ZONE_PAGE > TEXT_ZONE_COLUMN > TEXT_ZONE_REGION > TEXT_ZONE_PARAGRAPH
True
>>> TEXT_ZONE_PARAGRAPH > TEXT_ZONE_LINE > TEXT_ZONE_WORD > TEXT_ZONE_CHARACTER
True
djvu.decode.cmp_text_zone(zonetype1, zonetype2)
Returns

a negative integer if zonetype1 is more concrete than zonetype2.

Returns

a negative integer if zonetype1 is the same as zonetype2.

Returns

a positive integer if zonetype1 is the general than zonetype2.

djvu.const.TEXT_ZONE_SEPARATORS

Dictionary that maps text types to their separators.

>>> pprint(TEXT_ZONE_SEPARATORS)
{<djvu.const.TextZoneType: char>: '',
 <djvu.const.TextZoneType: word>: ' ',
 <djvu.const.TextZoneType: line>: '\n',
 <djvu.const.TextZoneType: para>: '\x1f',
 <djvu.const.TextZoneType: region>: '\x1d',
 <djvu.const.TextZoneType: column>: '\x0b',
 <djvu.const.TextZoneType: page>: '\x0c'}