Deprecated extension API¶
This page documents the original asdf
extension API, which has been
deprecated in favor of Extensions. Since support
for the deprecated API will be removed in asdf
3.0, we recommend that
all new extensions be implemented with the new API.
Extensions provide a way for ASDF to represent complex types that are not defined by the ASDF standard. Examples of types that require custom extensions include types from third-party libraries, user-defined types, and complex types that are part of the Python standard library but are not handled in the ASDF standard. From ASDF’s perspective, these are all considered ‘custom’ types.
Supporting new types in ASDF is easy. Three components are required:
A YAML Schema file for each new type.
A tag class (inheriting from
asdf.CustomType
) corresponding to each new custom type. The class must overrideto_tree
andfrom_tree
fromasdf.CustomType
in order to define how ASDF serializes and deserializes the custom type.A Python class to define an “extension” to ASDF, which is a set of related types. This class must implement the
asdf.extension.AsdfExtension
abstract base class. In general, a third-party library that defines multiple custom types can group them all in the same extension.
Note
The mechanisms of tag classes and extension classes are specific to this particular implementation of ASDF. As of this writing, this is the only complete implementation of the ASDF Standard. However, other language implementations may use other mechanisms for processing custom types.
All implementations of ASDF, regardless of language, will make use of the same schemas for abstract data type definitions. This allows all ASDF files to be language-agnostic, and also enables interoperability.
An Example¶
As an example, we will write an extension for ASDF that allows us to represent
Python’s standard fractions.Fraction
class for representing rational numbers.
We will call our new ASDF type fraction
.
First, the YAML Schema, defining the type as a pair of integers:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/fraction-1.0.0"
title: An example custom type for handling fractions
tag: "tag:nowhere.org:custom/fraction-1.0.0"
type: array
items:
type: integer
minItems: 2
maxItems: 2
...
Then, the Python implementation of the tag class and extension class. See the
asdf.CustomType
and asdf.extension.AsdfExtension
documentation for more information:
Note that the method to_tree
of the tag class
FractionType
defines how the library converts fractions.Fraction
into a
tree that can be stored by ASDF. Conversely, the method
from_tree
defines how the library reads a serialized
representation of the object and converts it back into an instance of
fractions.Fraction
.
Note that the values of the name
,
organization
, standard
, and
version
fields are all reflected in the id
and tag
definitions in the schema.
Note also that the base of the tag
value (up to the name
and version
components) is reflected in tag_mapping
property of the
FractionExtension
type, which is used to map tags to URLs. The
url_mapping
is used to map URLs (of the same form as the
id
field in the schema) to the actual location of a schema file.
Once these classes and the schema have been defined, we can save an ASDF file using them:
Defining custom types¶
In the example above, we showed how to create an extension that is capable of
serializing fractions.Fraction
. The custom tag type that we created was
defined as a subclass of asdf.CustomType
.
Custom type attributes¶
We overrode the following attributes of CustomType
in order to define
FractionType
(each bullet is also a link to the API documentation):
Each of these attributes is important, and each is described in more detail in the linked API documentation.
The choice of name
should be descriptive of the custom type
that is being serialized. The choice of organization
, and
standard
is fairly arbitrary, but also important. Custom
types that are provided by the same package should be grouped into the same
standard
and organization
.
These three values, along with the version
, are used to
define the YAML tag that will mark the serialized type in ASDF files. In our
example, the tag becomes tag:nowhere.org:custom/fraction-1.0.0
. The tag
is important when defining the asdf.extension.AsdfExtension
subclass.
Critically, these values must all be reflected in the associated schema.
Custom type methods¶
In addition to the attributes mentioned above, we also overrode the following
methods of CustomType
(each bullet is also a link to the API
documentation):
The to_tree
method defines how an instance of a custom data
type is converted into data structures that represent a YAML tree that can be
serialized to a file.
The from_tree
method defines how a YAML tree can be
converted back into an instance of the original custom data type.
In the example above, we used a list
to contain the important attributes of
fractions.Fraction
. However, this choice is fairly arbitrary, as long as it
is consistent between the way that to_tree
and
from_tree
are defined. For example, we could have also
chosen to use a dict
:
In this case, the associated schema would look like the following:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/fraction-1.0.0"
title: An example custom type for handling fractions
tag: "tag:nowhere.org:custom/fraction-1.0.0"
type: object
properties:
numerator:
type: integer
denominator:
type: integer
...
We can compare the output using this representation to the example above:
Serializing more complex types¶
Sometimes the custom types that we wish to represent in ASDF themselves have
attributes which are also custom types. As a somewhat contrived example,
consider a 2D cartesian coordinate that uses fraction.Fraction
to represent
each of the components. We will call this type Fractional2DCoordinate
.
First we need to define a schema to represent this new type:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/fractional_2d_coord-1.0.0"
title: An example custom type for handling components
tag: "tag:nowhere.org:custom/fractional_2d_coord-1.0.0"
type: object
properties:
x:
$ref: fraction-1.0.0
y:
$ref: fraction-1.0.0
...
Note that in the schema, the x
and y
attributes are expressed as
references to our fraction-1.0.0
schema. Since both of these schemas are
defined under the same standard and organization, we can simply use the name
and version of the fraction-1.0.0
schema to refer to it. However, if the
reference type was defined in a different organization and standard, it would
be necessary to use the entire YAML tag in the reference (e.g.
tag:nowhere.org:custom/fraction-1.0.0
). Relative tag references are also
allowed where appropriate.
We also need to define the custom tag type that corresponds to our new type:
In previous versions of this library, it was necessary for our
Fractional2DCoordinateType
class to call yamlutil
functions
explicitly to convert the x
and y
components to and from
their tree representations. Now, the library will automatically
convert nested custom types before calling from_tree
,
and after receiving the result from to_tree
.
Since Fractional2DCoordinateType
shares the same
organization
and standard
as
FractionType
, it can be added to the same extension class:
Now we can use this extension to create an ASDF file:
Note that in the resulting ASDF file, the x
and y
components of
our new fraction_2d_coord
type are tagged as fraction-1.0.0
.
Serializing reference cycles¶
Special considerations must be made when deserializing a custom type that
contains a reference to itself among its descendants. Consider a
fractions.Fraction
subclass that maintains a reference to its multiplicative
inverse:
The inverse of the inverse of a fraction is the fraction itself, so you might wish to construct your objects in the following way:
Which creates an “infinite loop” between the two fractions. An ordinary
CustomType
wouldn’t be able to deserialize this, since each object
requires that the other be deserialized first! Let’s see what happens
when we define our from_tree
method in a naive way:
After adding our type to the extension class, the tree will serialize correctly:
But upon deserialization, we notice a problem:
The presence of _PendingValue
is asdf
’s way of telling you
that the value corresponding to the key inverse
was not fully deserialized
at the time that you retrieved it. We can handle this situation by making our
from_tree
a generator function:
The generator version of from_tree
yields the partially constructed
FractionWithInverse
object before setting its inverse property. This allows
asdf to proceed to constructing the inverse FractionWithInverse
object,
and resume the original from_tree
execution only when the inverse
is actually available.
With this new version of from_tree
, we can successfully deserialize
our ASDF file:
Assigning schema and tag versions¶
Authors of new tags and schemas should strive to use the conventions described
by semantic versioning. Tags and schemas for types
that have not been serialized before should begin at 1.0.0
. Versions for a
particular tag type need not move in lock-step with other tag types in the same
extension.
The patch version should be bumped for bug fixes and other minor, backwards-compatible changes. New features can be indicated with increments to the minor version, as long as they remain backwards compatible with older versions of the schema. Any changes that break backwards compatibility must be indicated by a major version update.
Since ASDF is intended to be an archival file format, authors of tags and schemas should work to ensure that ASDF files created with older extensions can continue to be processed. This means that every time a schema version is bumped (with the possible exception of patch updates), a new schema file should be created.
For example, if we currently have a schema for xyz-1.0.0
, and we wish to
make changes and bump the version to xyz-1.1.0
, we should leave the
original schema intact. A new schema file should be created for
xyz-1.1.0
, which can exist in parallel with the old file. The version of
the corresponding tag type should be bumped to 1.1.0
.
For more details on the behavior of schema and tag versioning from a user perspective, see Versioning and Compatibility, and also Custom types, extensions, and versioning.
Explicit version support¶
To some extent schemas and tag classes will be closely tied to the custom data types that they represent. This means that in some cases API changes or other changes to the representation of the underlying types will force us to modify our schemas and tag classes. ASDF’s schema versioning allows us to handle changes in schemas over time.
Let’s consider an imaginary custom type called Person
that we want to
serialize in ASDF. The first version of Person
was constructed using a
first and last name:
person = Person("James", "Webb")
print(person.first, person.last)
Our version 1.0.0 YAML schema for Person
might look like the following:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/person-1.0.0"
title: An example custom type for representing a Person
tag: "tag:nowhere.org:custom/person-1.0.0"
type: array
items:
type: string
minItems: 2
maxItems: 2
...
And our tag implementation would look something like this:
import asdf
from people import Person
class PersonType(asdf.CustomType):
name = "person"
organization = "nowhere.org"
version = (1, 0, 0)
standard = "custom"
types = [Person]
@classmethod
def to_tree(cls, node, ctx):
return [node.first, node.last]
@classmethod
def from_tree(cls, tree, ctx):
return Person(tree[0], tree[1])
However, a newer version of Person
now requires a middle name in the
constructor as well:
person = Person("James", "Edwin", "Webb")
print(person.first, person.middle, person.last)
So we update our YAML schema to version 1.1.0 in order to support newer versions of Person:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/person-1.1.0"
title: An example custom type for representing a Person
tag: "tag:nowhere.org:custom/person-1.1.0"
type: array
items:
type: string
minItems: 3
maxItems: 3
...
We need to update our tag class implementation as well. However, we need to be
careful. We still want to be able to read version 1.0.0 of our schema and be
able to convert it to the newer version of Person
objects. To accomplish
this, we will make use of the supported_versions
attribute
for our tag class. This will allow us to declare explicit support for the
schema versions our tag class implements.
Under the hood, asdf
creates multiple copies of our PersonType
tag class,
each with a different version
attribute corresponding to one
of the supported versions. This means that in our new tag class implementation,
we can condition our from_tree
implementation on the value
of version
to determine which schema version should be used when reading:
import asdf
from people import Person
class PersonType(asdf.CustomType):
name = "person"
organization = "nowhere.org"
version = (1, 1, 0)
supported_versions = [(1, 0, 0), (1, 1, 0)]
standard = "custom"
types = [Person]
@classmethod
def to_tree(cls, node, ctx):
return [node.first, node.middle, node.last]
@classmethod
def from_tree(cls, tree, ctx):
# Handle the older version of the person schema
if cls.version == (1, 0, 0):
# Construct a Person object with an empty middle name field
return Person(tree[0], "", tree[1])
else:
# The newer version of the schema stores the middle name too
return person(tree[0], tree[1], tree[2])
Note that the implementation of to_tree
is not conditioned on
cls.version
since we do not need to convert new Person
objects back to
the older version of the schema.
Handling subclasses¶
By default, if a custom type is serialized by an asdf
tag class, then all
subclasses of that type can also be serialized. However, no attributes that are
specific to the subclass will be stored in the file. When reading the file, an
instance of the base custom type will be returned instead of the subclass that
was written.
To properly handle subclasses of custom types already recognized by asdf
, it is
necessary to implement a separate tag class that is specific to the subclass to
be serialized.
Previous versions of this library implemented an experimental feature that allowed ADSF to serialize subclass attributes using the same tag class, but this feature was dropped as it produced files that were not portable.
Creating custom schemas¶
All custom types to be serialized by asdf
require custom schemas. The best
resource for creating ASDF schemas can be found in the ASDF Standard documentation.
In most cases, ASDF schemas will be included as part of a packaged software
distribution. In these cases, it is important for the
url_mapping
of the corresponding AsdfExtension
extension class to map the schema URL to an actual location on disk. However,
it is possible for schemas to be hosted online as well, in which case the URL
mapping can map (perhaps trivially) to an actual network location. See
Defining custom extension classes for more information.
It is also important for packages that provide custom schemas to test them, both to make sure that they are valid, and to ensure that any examples they provide are also valid. See Testing custom schemas for more information.
Adding custom validators¶
A new type may also add new validation keywords to the schema language. This can be used to impose type-specific restrictions on the values in an ASDF file. This feature is used internally so a schema can specify the required datatype of an array.
To support custom validation keywords, set the validators
member of a CustomType
subclass to a dictionary where the keys are the
validation keyword name and the values are validation functions. The
validation functions are of the same form as the validation functions in the
underlying jsonschema
library, and are passed the following arguments:
validator
: Ajsonschema.Validator
instance.
value
: The value of the schema keyword.
instance
: The instance to validate. This will be made up of basic datatypes as represented in the YAML file (list, dict, number, strings), and not include any object types.
schema
: The entire schema that applies to instance. Useful to get other related schema keywords.
The validation function should either return None
if the instance
is valid or yield
one or more jsonschema.ValidationError
objects if
the instance is invalid.
To continue the example from above, for the FractionType
say we
want to add a validation keyword “simplified
” that, when true
,
asserts that the corresponding fraction is in simplified form:
from asdf import ValidationError
def validate_simplified(validator, simplified, instance, schema):
if simplified:
reduced = fraction.Fraction(instance[0], instance[1])
if reduced.numerator != instance[0] or reduced.denominator != instance[1]:
yield ValidationError("Fraction is not in simplified form.")
FractionType.validators = {"simplified": validate_simplified}
Defining custom extension classes¶
Extension classes are the mechanism that asdf
uses to register custom tag types
so that they can be used when processing ASDF files. Packages that define their
own custom tag types must also define extensions in order for those types to be
used.
All extension classes must implement the asdf.extension.AsdfExtension
abstract base
class. A custom extension will override each of the following properties of
asdf.extension.AsdfExtension
(the text in each bullet is also a link to the corresponding
documentation):
Overriding built-in extensions¶
It is possible for externally defined extensions to override tag types that are
provided by asdf
’s built-in extension. For example, maybe an external package
wants to provide a different implementation of NDArrayType
.
In this case, the external package does not need to provide custom schemas
since the schema for the type to be overridden is already provided as part of
the ASDF standard.
Instead, the extension class may inherit from asdf
’s
asdf.extension.BuiltinExtension
and simply override the
types
property to indicate the type that is being
overridden. Doing this preserves the tag_mapping
and
url_mapping
that is used by the BuiltinExtension
, which
allows the schemas that are packaged by asdf
to be located.
asdf
will give precedence to the type that is provided by the external
extension, effectively overriding the corresponding type in the built-in
extension. Note that it is currently undefined if multiple external extensions
are provided that override the same built-in type.
Packaging custom extensions¶
Packaging schemas¶
If a package provides custom schemas, the schema files must be installed as
part of that package distribution. In general, schema files must be installed
into a subdirectory of the package distribution. The asdf
extension class must
supply a url_mapping
that maps to the installed location
of the schemas. See Defining custom extension classes for more details.
Registering entry points¶
Packages that provide their own ASDF extensions can (and should!) install them
so that they are automatically detectable by the asdf
Python package. This is
accomplished using Python’s setuptools
entry points. Entry points are registered in a package’s setup.py
file.
Consider a package that provides an extension class MyPackageExtension
in the
submodule mypackage.asdf.extensions
. We need to register this class as an
extension entry point that asdf
will recognize. First, we create a dictionary:
entry_points = {}
entry_points["asdf_extensions"] = [
"mypackage = mypackage.asdf.extensions:MyPackageExtension"
]
The key used in the entry_points
dictionary must be 'asdf_extensions'
.
The value must be an array of one or more strings, each with the following
format:
extension_name = fully.specified.submodule:ExtensionClass
The extension name can be any arbitrary string, but it should be descriptive of the package and the extension. In most cases the package itself name will suffice.
Note that depending on individual package requirements, there may be other
entries in the entry_points
dictionary.
The entry points must be passed to the call to setuptools.setup
:
from setuptools import setup
entry_points = {}
entry_points["asdf_extensions"] = [
"mypackage = mypackage.asdf.extensions:MyPackageExtension"
]
setup(
# We omit other package-specific arguments that are not
# relevant to this example
entry_points=entry_points,
)
When running python setup.py install
or python setup.py develop
on this
package, the entry points will be registered automatically. This allows the
asdf
package to recognize the extensions without any user intervention. Users
of your package that wish to read ASDF files using types that you have
registered will not need to use any extension explicitly. Instead, asdf
will
automatically recognize the types you have registered and will process them
appropriately. See Extensions from other packages for more information on using
extensions.
Testing custom schemas¶
Packages that provide their own schemas can test them using asdf
’s
pytest plugin for schema testing.
Schemas are tested for overall validity, and any examples given within the
schemas are also tested.
The schema tester plugin is automatically registered when the asdf
package is
installed. In order to enable testing, it is necessary to add the directory
containing your schema files to the pytest section of your project’s build configuration
(pyproject.toml
or setup.cfg
). If you do not already have such a file, creating
one with the following should be sufficient:
The schema directory paths should be paths that are relative to the top of the
package directory when it is installed. If this is different from the path
in the source directory, then both paths can be used to facilitate in-place
testing (see asdf
’s own pyproject.toml
for an example of this).
Note
Older versions of asdf
(prior to 2.4.0) required the plugin to be registered
in your project’s conftest.py
file. As of 2.4.0, the plugin is now
registered automatically and so this line should be removed from your
conftest.py
file, unless you need to retain compatibility with older
versions of asdf
.
The asdf_schema_skip_names
configuration variable can be used to skip
schema files that live within one of the asdf_schema_root
directories but
should not be tested. The names should be given as simple base file names
(without directory paths or extensions). Again, see asdf
’s own pyproject.toml
file
for an example.
The schema tests do not run by default. In order to enable the tests by
default for your package, add asdf_schema_tests_enabled = 'true'
to the
[tool.pytest.ini_options]
section of your pyproject.toml
file (or [tool:pytest]
in setup.cfg
).
If you do not wish to enable the schema tests by default, you can add the --asdf-tests
option to
the pytest
command line to enable tests on a per-run basis.