3.3.3 Arrays and Tabular Environments

Arrays and tabular environments are the most complex structures in a plasTeX document. This because tables can include spanning columns, spanning rows, and borders specified on the table, rows, and individual cells. In addition, there are alignments associated with each column and alignments can be specified by any \multicolumn command. It is also possible with some packages to create your own column declarations. Add to that the fact that the longtable package allows you to specify multiple headers, footers, and coptions, and you can see why tabular environments can be rather tricky to deal with.

As with all parts of the document, plasTeX tries to normalize all tables to have a consistent structure. The structure for arrays and tables is shown in Figure 3.5.

\includegraphics[width=4in]{tablestruct}
Figure 3.5: Normalized structure of all tables and arrays

Luckily, the array macro class that comes with plasTeX was made to handle all of the work for you. In fact, it also handles the work of some extra packages such as longtable to make processing them transparent. The details of the tabular environments are described in the following sections.

With this normalized structure, you can traverse all array and table structures with code like the following.

# Iterate through all rows in the table
for row in tablenode:

    # Iterate through all cells in the row
    for cell in row:

        # Iterate through all paragraphs in the cell
        for par in cell:

            # Print the text content of each cell
            print '   ' + par.textContent 

        # Print a blank line after each cell
        print

    # Print a blank line after each row
    print

Borders

Borders in a tabular environment are generally handled by \hline, \vline, \cline, as well as the column specifications on the tabular environment and the \multicolumn command. plasTeX merges all of the border specifications and puts them into CSS formatted values in the style attribute of each of the table cell nodes. To get the CSS information formatted such that it can be used in an inline style, simply access the inline property of the style object.

Here is an example of a tabular environment.

\begin{tabular}{|l|l|}\hline
x & y \\
1 & 2 \\\hline
\end{tabular}

The table node can be traversed as follows.

# Print the CSS for the borders of each cell
for rownum, row in enumerate(table):
    for cellnum, cell in enumerate(row):
        print '(%s,%s) %s -- %s' % (rownum, cellnum, 
               cell.textContent.strip(), cell.style.inline)

The code above will print the following output (whitespace has been added to make the output easier to read).

(0,0) x -- border-top-style:solid; 
           border-left:1px solid black; 
           border-right:1px solid black; 
           border-top-color:black; 
           border-top-width:1px; 
           text-align:left
(0,1) y -- border-top-style:solid; 
           text-align:left; 
           border-top-color:black; 
           border-top-width:1px; 
           border-right:1px solid black
(1,0) 1 -- border-bottom-style:solid; 
           border-bottom-width:1px; 
           border-left:1px solid black; 
           border-right:1px solid black; 
           text-align:left; 
           border-bottom-color:black
(1,1) 2 -- border-bottom-color:black; 
           border-bottom-width:1px; 
           text-align:left; 
           border-bottom-style:solid; 
           border-right:1px solid black

Alignments

Alignments can be specified in the column specification of the tabular environment as well as in the column specification of \multicolumn commands. Just like the border information, the alignment information is also stored in CSS formatted values in each cell’s style attribute.

Longtables

Longtables are treated just like regular tables. Only the first header and the last footer are supported in the resulting table structure. To indicate that these are verifiable header or footer cells, the isHeader attribute of the corresponding cells is set to True. This information can be used by the renderer to more accurately represent the table cells.