Tabular data

Table handles tabular data, storing as columns in a, you guessed it, columns attribute. The latter acts like a dictionary, with the column names as the keys and the column values being numpy.ndarray instances. The table itself is iterable over rows.

Note

Table is immutable at the level of the individual ndarray not being writable.

Loading a csv file

We load a tab separated data file using the load_table() function. The format is inferred from the filename suffix and you will note, in this case, it’s not actually a csv file.

Note

The known filename suffixes for reading are .csv, .tsv and .pkl or .pickle (Python’s pickle format).

Note

If you invoke the static column types argument, i.e.``load_table(…, static_column_types=True)`` and the column data are not static, those columns will be left as a string type.

Loading from a url

The cogent3 load functions support loading from a url. We load the above .tsv file directly from GitHub.

Loading delimited specifying the format

Although unnecessary in this case, it’s possible to override the suffix by specifying the delimiter using the sep argument.

Loading delimited data without a header line

To create a table from the follow examples, you specify your header and use make_table().

Using load_delimited()

This is just a standard parsing function which does not do any filtering or converting elements to non-string types.

Using FilteringParser

Selectively loading parts of a big file

Loading a set number of lines from a file

The limit argument specifies the number of lines to read.

Loading only some rows

If you only want a subset of the contents of a file, use the FilteringParser. This allows skipping certain lines by using a callback function. We illustrate this with stats.tsv, skipping any rows with "Ratio" > 10.

You can also negate a condition, which is useful if the condition is complex. In this example, it means keep the rows for which Ratio > 10.

Loading only some columns

Specify the columns by their names.

Or, by their index.

Note

The negate argument does not affect the columns evaluated.

Load raw data as a list of lists of strings

We just use FilteringParser.

We just display the first two lines.

Note

The individual elements are all str.

Make a table from header and rows

Make a table from a dict

For a dict with key’s as column headers.

Specify the column order when creating from a dict.

Create the table with an index

A Table can be indexed like a dict if you designate a column as the index (and that column has a unique value for every row).

Note

The index_name argument also applies when using make_table().

Create a table from a pandas.DataFrame

Create a table from header and rows

Create a table from dict

make_table() is the utility function for creating Table objects from standard python objects.

Create a table from a 2D dict

Create a table that has complex python objects as elements

Create an empty table

Adding a new column

Add a title and a legend to a table

This can be done when you create the table.

It can be done by directly assigning to the corresponding attributes.

Iterating over table rows

Table is a row oriented object. Iterating on the table returns each row as a new Table instance.

The resulting rows can be indexed using their column names.

How many rows are there?

The Table.shape attribute is like that of a numpy array. The first element (Table.shape[0]) is the number of rows.

How many columns are there?

Table.shape[1] is the number of columns. Using the table from above.

Iterating over table columns

The Table.columns attribute is a Columns instance, an object with dict attributes.

So iteration is the same as for dicts.

Table slicing using column names

Slice using the column name.

Table slicing using indices

Changing displayed numerical precision

We change the Ratio column to using scientific notation.

Change digits or column spacing

This can be done on table loading,

or, for spacing at least, by modifying the attributes

Wrapping tables for display

Wrapping generates neat looking tables whether or not you index the table rows. We demonstrate here

Display the top of a table using head()

You change how many rows are displayed.

The table shape is that of the original table.

Display the bottom of a table using tail()

You change how many rows are displayed.

Display random rows from a table

Change the number of rows displayed by repr()

Note

The ... indicates the break between the top and bottom rows.

Changing column headings

The table header is immutable. Changing column headings is done as follows.

Adding a new column

Create a new column from existing ones

This can be used to take a single, or multiple columns and generate a new column of values. Here we’ll take 2 columns and return True/False based on a condition.

Get table data as a numpy array

Get a table column as a list

Via the Table.tolist() method.

Or directly from the column array object.

Get multiple table columns as a list

This returns a row oriented list.

Note

column name order dictates the element order per row

Get the table as a row oriented dict

Keys in the resulting dict are the row indices, the value is a dict of column name, value pairs.

Get the table as a column oriented dict

Keys in the resulting dict are the column names, the value is a list.

Get the table as a pandas.DataFrame

You can also specify column(s) are categories

Get a table of counts as a contingency table

If our table consists of counts data, the Table can convert it into a CategoryCount instance that can be used for performing basic contingency table statistical tests, e.g. chisquare, G-test of independence, etc.. To do this, we must specify which column contains the row names using the index_name argument.

Alternatively, you could also specify the index_name of the category column as

Appending tables

Warning

Only for tables with the same columns.

Can be done without specifying a new column (set the first argument to appended to be None). Here we simply use the same table data.

Specifying with a new column. In this case, the value of the table.title becomes the value for the new column.

Note

We assigned an empty string to title, otherwise the resulting table has the same title attribute as that of table1.

Summing a single column

Because each column is just a numpy.ndarray, this also can be done directly via the array methods.

Summing multiple columns or rows - strictly numerical data

We define a strictly numerical table,

and sum all columns (default condition)

and all rows

Summing multiple columns or rows with mixed non-numeric/numeric data

We define a table with mixed data, like a distance matrix.

and sum all columns (default condition), ignoring non-numerical data

and all rows

Filtering table rows

We can do this by providing a reference to an external function

or using valid python syntax within a string, which is executed

You can also filter for values in multiple columns

Filtering table columns

We select only columns that have a sum > 20 from the all_numeric table constructed above.

Standard sorting

Reverse sorting

Sorting involving multiple columns, one reversed

Getting raw data for a single column

Getting raw data for multiple columns

Getting distinct values

Counting occurrences of values

Counting unique values

This returns a CategoryCounter, a dict like class.

For multiple columns.

Joining or merging tables

We do a standard inner join here for a restricted subset. We must specify the columns that will be used for the join. Here we just use Locus.

Note

If the tables have titles, column names are prefixed with those instead of right_.

Note

The joined() method is just a wrapper for the inner_join() and cross_join() (row cartesian product) methods, which you can use directly.

Transpose a table

We require a new column heading for the current header data. We also need to specify which existing column will become the header.

Specify markdown as the str() format

Using the method provides finer control over formatting.

Specify latex as the str() format

Using the method provides finer control over formatting.

Get a table as a markdown formatted string

We use the justify argument to indicate the column justification.

Get a table as a latex formatted string

Get a table as a restructured text csv-table

Get a table as a restructured text grid table

Getting a latex format table with to_string()

It is also possible to specify column alignment, table caption and other arguments.

Getting a bedGraph format with to_string()

This format allows display of annotation tracks on genome browsers. A small sample of a bigger table.

Then converted.

Getting a table as html

What formats can be written?

Appending any of the following to a filename will cause that format to be used for writing.

Writing a latex formmated file

Writing delimited formats

The delimiter can be specified explicitly using the sep argument or implicitly via the file name suffix.