VOTable XML Handling (astropy.io.votable)

Introduction

The astropy.io.votable sub-package converts VOTable XML files to and from numpy record arrays. This subpackage was originally developed as vo.table.

Getting Started

This section provides a quick introduction of using astropy.io.votable. The goal is to demonstrate the package’s basic features without getting into too much detail.

Note

If you want to read or write a single table in VOTable format, the recommended method is via the high-level Unified File Read/Write Interface. In particular see the Unified I/O VOTables section.

Reading a VOTable File

To read in a VOTable file, pass a file path to parse:

from astropy.io.votable import parse
votable = parse("votable.xml")

votable is a VOTableFile object, which can be used to retrieve and manipulate the data and save it back out to disk.

VOTable files are made up of nested RESOURCE elements, each of which may contain one or more TABLE elements. The TABLE elements contain the arrays of data.

To get at the TABLE elements, you can write a loop over the resources in the VOTABLE file:

for resource in votable.resources:
    for table in resource.tables:
        # ... do something with the table ...
        pass

However, if the nested structure of the resources is not important, you can use iter_tables to return a flat list of all tables:

for table in votable.iter_tables():
    # ... do something with the table ...
    pass

Finally, if you expect only one table in the file, it might be most convenient to use get_first_table:

table = votable.get_first_table()

Alternatively, there is a convenience method to parse a VOTable file and return the first table all in one step:

from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml")

From a Table object, you can get the data itself in the array member variable:

data = table.array

This data is a numpy record array.

The columns get their names from both the ID and name attributes of the FIELD elements in the VOTABLE file.

Examples

Suppose we had a FIELD specified as follows:

<FIELD ID="Dec" name="dec_targ" datatype="char" ucd="POS_EQ_DEC_MAIN"
       unit="deg">
 <DESCRIPTION>
  representing the ICRS declination of the center of the image.
 </DESCRIPTION>
</FIELD>

Note

The mapping from VOTable name and ID attributes to numpy dtype names and titles is highly confusing.

In VOTable, ID is guaranteed to be unique, but is not required. name is not guaranteed to be unique, but is required.

In numpy record dtypes, names are required to be unique and are required. titles are not required, and are not required to be unique.

Therefore, VOTable’s ID most closely maps to numpy’s names, and VOTable’s name most closely maps to numpy’s titles. However, in some cases where a VOTable ID is not provided, a numpy name will be generated based on the VOTable name. Unfortunately, VOTable fields do not have an attribute that is both unique and required, which would be the most convenient mechanism to uniquely identify a column.

When converting from an astropy.io.votable.tree.Table object to an astropy.table.Table object, you can specify whether to give preference to name or ID attributes when naming the columns. By default, ID is given preference. To give name preference, pass the keyword argument use_names_over_ids=True:

>>> votable.get_first_table().to_table(use_names_over_ids=True)

This column of data can be extracted from the record array using:

>>> table.array['dec_targ']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
       17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
       17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
       17.1553884541, 17.15539736932, 17.15539752176,
       17.25736014763,
       # ...
       17.2765703], dtype=object)

or equivalently:

>>> table.array['Dec']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
       17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
       17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
       17.1553884541, 17.15539736932, 17.15539752176,
       17.25736014763,
       # ...
       17.2765703], dtype=object)

Building a New Table from Scratch

It is also possible to build a new table, define some field datatypes, and populate it with data.

Example

To build a new table from a VOTable file:

from astropy.io.votable.tree import VOTableFile, Resource, Table, Field

# Create a new VOTable file...
votable = VOTableFile()

# ...with one resource...
resource = Resource()
votable.resources.append(resource)

# ... with one table
table = Table(votable)
resource.tables.append(table)

# Define some fields
table.fields.extend([
        Field(votable, name="filename", datatype="char", arraysize="*"),
        Field(votable, name="matrix", datatype="double", arraysize="2x2")])

# Now, use those field definitions to create the numpy record arrays, with
# the given number of rows
table.create_arrays(2)

# Now table.array can be filled with data
table.array[0] = ('test1.xml', [[1, 0], [0, 1]])
table.array[1] = ('test2.xml', [[0.5, 0.3], [0.2, 0.1]])

# Now write the whole thing to a file.
# Note, we have to use the top-level votable file object
votable.to_xml("new_votable.xml")

Outputting a VOTable File

This section describes writing table data in the VOTable format using the votable package directly. For some cases, however, the high-level Unified File Read/Write Interface will often suffice and is somewhat more convenient to use. See the Unified I/O VOTable section for details.

To save a VOTable file, call the to_xml method. It accepts either a string or Unicode path, or a Python file-like object:

votable.to_xml('output.xml')

There are a number of data storage formats supported by astropy.io.votable. The TABLEDATA format is XML-based and stores values as strings representing numbers. The BINARY format is more compact, and stores numbers in base64-encoded binary. VOTable version 1.3 adds the BINARY2 format, which allows for masking of any data type, including integers and bit fields which cannot be masked in the older BINARY format. The storage format can be set on a per-table basis using the format attribute, or globally using the set_all_tables_format method:

votable.get_first_table().format = 'binary'
votable.set_all_tables_format('binary')
votable.to_xml('binary.xml')

Using astropy.io.votable

Standard Compliance

astropy.io.votable.tree.Table supports the VOTable Format Definition Version 1.1, Version 1.2, Version 1.3, and Version 1.4. Some flexibility is provided to support the 1.0 draft version and other nonstandard usage in the wild, see Verifying VOTables for more details.

Note

Each warning and VOTABLE-specific exception emitted has a number and is documented in more detail in Warnings and Exceptions.

Output always conforms to the 1.1, 1.2, 1.3, or 1.4 spec, depending on the input.

Verifying VOTables

Many VOTable files in the wild do not conform to the VOTable specification. You can set what should happen when a violation is encountered with the verify keyword, which can take three values:

The verify keyword can be used with the parse() or parse_single_table() functions:

from astropy.io.votable import parse
votable = parse("votable.xml", verify='warn')

It is possible to change the default verify value through the astropy.io.votable.conf.verify item in the Configuration System (astropy.config).

Note that 'ignore' or 'warn' mean that astropy will attempt to parse the VOTable, but if the specification has been violated then success cannot be guaranteed.

It is good practice to report any errors to the author of the application that generated the VOTable file to bring the file into compliance with the specification.

Missing Values

Any value in the table may be “missing”. astropy.io.votable stores a numpy masked array in each Table instance. This behaves like an ordinary numpy masked array, except for variable-length fields. For those fields, the datatype of the column is “object” and another numpy masked array is stored there. Therefore, operations on variable-length columns will not work — this is because variable-length columns are not directly supported by numpy masked arrays.

Datatype Mappings

The datatype specified by a FIELD element is mapped to a numpy type according to the following table:

VOTABLE type

NumPy type

boolean

b1

bit

b1

unsignedByte

u1

char (variable length)

O - A bytes() object.

char (fixed length)

S

unicodeChar (variable length)

O - A str object

unicodeChar (fixed length)

U

short

i2

int

i4

long

i8

float

f4

double

f8

floatComplex

c8

doubleComplex

c16

If the field is a fixed-size array, the data is stored as a numpy fixed-size array.

If the field is a variable-size array (that is, arraysize contains a ‘*’), the cell will contain a Python list of numpy values. Each value may be either an array or scalar depending on the arraysize specifier.

Examining Field Types

To look up more information about a field in a table, you can use the get_field_by_id method, which returns the Field object with the given ID.

Example

To look up more information about a field:

>>> field = table.get_field_by_id('Dec')
>>> field.datatype
'char'
>>> field.unit
'deg'

Note

Field descriptors should not be mutated. To change the set of columns, convert the Table to an astropy.table.Table, make the changes, and then convert it back.

Data Serialization Formats

VOTable supports a number of different serialization formats.

  • TABLEDATA stores the data in pure XML, where the numerical values are written as human-readable strings.

  • BINARY is a binary representation of the data, stored in the XML as an opaque base64-encoded blob.

  • BINARY2 was added in VOTable 1.3, and is identical to “BINARY”, except that it explicitly records the position of missing values rather than identifying them by a special value.

  • FITS stores the data in an external FITS file. This serialization is not supported by the astropy.io.votable writer, since it requires writing multiple files.

The serialization format can be selected in two ways:

1) By setting the format attribute of a astropy.io.votable.tree.Table object:

votable.get_first_table().format = "binary"
votable.to_xml("new_votable.xml")

2) By overriding the format of all tables using the tabledata_format keyword argument when writing out a VOTable file:

votable.to_xml("new_votable.xml", tabledata_format="binary")

Converting to/from an astropy.table.Table

The VOTable standard does not map conceptually to an astropy.table.Table. However, a single table within the VOTable file may be converted to and from an astropy.table.Table:

from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml").to_table()

As a convenience, there is also a function to create an entire VOTable file with just a single table:

from astropy.io.votable import from_table, writeto
votable = from_table(table)
writeto(votable, "output.xml")

Note

By default, to_table will use the ID attribute from the files to create the column names for the Table object. However, it may be that you want to use the name attributes instead. For this, set the use_names_over_ids keyword to True. Note that since field names are not guaranteed to be unique in the VOTable specification, but column names are required to be unique in numpy structured arrays (and thus astropy.table.Table objects), the names may be renamed by appending numbers to the end in some cases.

Performance Considerations

File reads will be moderately faster if the TABLE element includes an nrows attribute. If the number of rows is not specified, the record array must be resized repeatedly during load.

See Also

Reference/API

astropy.io.votable Package

This package reads and writes data formats used by the Virtual Observatory (VO) initiative, particularly the VOTable XML format.

Functions

parse(source[, columns, invalid, verify, ...])

Parses a VOTABLE xml file (or file-like object), and returns a VOTableFile object.

parse_single_table(source, **kwargs)

Parses a VOTABLE xml file (or file-like object), reading and returning only the first Table instance.

validate(source[, output, xmllint, filename])

Prints a validation report for the given file.

from_table(table[, table_id])

Given an Table object, return a VOTableFile file structure containing just that single table.

is_votable(source)

Reads the header of a file to determine if it is a VOTable file.

writeto(table, file[, tabledata_format])

Writes a VOTableFile to a VOTABLE xml file.

Classes

Conf()

Configuration parameters for astropy.io.votable.

astropy.io.votable.tree Module

Classes

Link([ID, title, value, href, action, id, ...])

LINK elements: used to reference external documents and servers through a URI.

Info([ID, name, value, id, xtype, ref, ...])

INFO elements: arbitrary key-value pairs for extensions to the standard.

Values(votable, field[, ID, null, ref, ...])

VALUES element: used within FIELD and PARAM elements to define the domain of values.

Field(votable[, ID, name, datatype, ...])

FIELD element: describes the datatype of a particular column of data.

Param(votable[, ID, name, value, datatype, ...])

PARAM element: constant-valued columns in the data.

CooSys([ID, equinox, epoch, system, id, ...])

COOSYS element: defines a coordinate system.

TimeSys([ID, timeorigin, timescale, ...])

TIMESYS element: defines a time system.

FieldRef(table, ref[, ucd, utype, config, pos])

FIELDref element: used inside of GROUP elements to refer to remote FIELD elements.

ParamRef(table, ref[, ucd, utype, config, pos])

PARAMref element: used inside of GROUP elements to refer to remote PARAM elements.

Group(table[, ID, name, ref, ucd, utype, ...])

GROUP element: groups FIELD and PARAM elements.

Table(votable[, ID, name, ref, ucd, utype, ...])

TABLE element: optionally contains data.

Resource([name, ID, utype, type, id, ...])

RESOURCE element: Groups TABLE and RESOURCE elements.

VOTableFile([ID, id, config, pos, version])

VOTABLE element: represents an entire file.

Element()

A base class for all classes that represent XML elements in the VOTABLE file.

astropy.io.votable.converters Module

This module handles the conversion of various VOTABLE datatypes to/from TABLEDATA and BINARY formats.

Functions

get_converter(field[, config, pos])

Get an appropriate converter instance for a given field.

table_column_to_votable_datatype(column)

Given a astropy.table.Column instance, returns the attributes necessary to create a VOTable FIELD element that corresponds to the type of the column.

Classes

Converter(field[, config, pos])

The base class for all converters.

astropy.io.votable.ucd Module

This file contains routines to verify the correctness of UCD strings.

Functions

parse_ucd(ucd[, ...])

Parse the UCD into its component parts.

check_ucd(ucd[, ...])

Returns False if ucd is not a valid unified content descriptor.

astropy.io.votable.util Module

Various utilities and cookbook-like things.

Functions

convert_to_writable_filelike(fd[, compressed])

Returns a writable file-like object suitable for streaming output.

coerce_range_list_param(p[, frames, numeric])

Coerces and/or verifies the object p into a valid range-list-format parameter.

astropy.io.votable.validator Package

Validates a large collection of web-accessible VOTable files, and generates a report as a directory tree of HTML files.

Functions

make_validation_report([urls, destdir, ...])

Validates a large collection of web-accessible VOTable files.

astropy.io.votable.xmlutil Module

Various XML-related utilities

Functions

check_id(ID[, name, config, pos])

Raises a VOTableSpecError if ID is not a valid XML ID.

fix_id(ID[, config, pos])

Given an arbitrary string, create one that can be used as an xml id.

check_token(token, attr_name[, config, pos])

Raises a ValueError if token is not a valid XML token.

check_mime_content_type(content_type[, ...])

Raises a VOTableSpecError if content_type is not a valid MIME content type.

check_anyuri(uri[, config, pos])

Raises a VOTableSpecError if uri is not a valid URI.

validate_schema(filename[, version])

Validates the given file against the appropriate VOTable schema.

astropy.io.votable.exceptions Module