.. _utils-data: *************************************************** Downloadable Data Management (`astropy.utils.data`) *************************************************** Introduction ============ A number of Astropy's tools work with data sets that are either awkwardly large (e.g., `~astropy.coordinates.solar_system_ephemeris`) or regularly updated (e.g., `~astropy.utils.iers.IERS_B`) or both (e.g., `~astropy.utils.iers.IERS_A`). This kind of data - authoritative data made available on the Web, and possibly updated from time to time - is reasonably common in astronomy. The Astropy Project therefore provides some tools for working with such data. The primary tool for this is the ``astropy`` *cache*. This is a repository of downloaded data, indexed by the URL where it was obtained. The tool `~astropy.utils.data.download_file` and various other things built upon it can use this cache to request the contents of a URL, and (if they choose to use the cache) the data will only be downloaded if it is not already present in the cache. The tools can be instructed to obtain a new copy of data that is in the cache but has been updated online. The ``astropy`` cache is stored in a centralized place (on Linux machines by default it is ``$HOME/.astropy/cache``; see :ref:`astropy_config` for more details). You can check its location on your machine:: >>> import astropy.config.paths >>> astropy.config.paths.get_cache_dir() # doctest: +SKIP '/home/burnell/.astropy/cache' This centralization means that the cache is persistent and shared between all ``astropy`` runs in any virtualenv by one user on one machine (possibly more if your home directory is shared between multiple machines). This can dramatically accelerate ``astropy`` operations and reduce the load on servers, like those of the IERS, that were not designed for heavy Web traffic. If you find the cache has corrupted or outdated data in it, you can remove an entry or clear the whole thing with `~astropy.utils.data.clear_download_cache`. The files in the cache directory are named according to a cryptographic hash of their URL (currently MD5, so in principle malevolent entities can cause collisions, though the security risks this poses are marginal at most). The modification times on these files normally indicate when they were last downloaded from the Internet. Usage Within Astropy ==================== For the most part, you can ignore the caching mechanism and rely on ``astropy`` to have the correct data when you need it. For example, precise time conversions and sky locations need measured tables of the Earth's rotation from the IERS. The table `~astropy.utils.iers.IERS_Auto` provides the infrastructure for many of these calculations. It makes available Earth rotation parameters, and if you request them for a time more recent than its tables cover, it will download updated tables from the IERS. So for example asking what time it is in UT1 (a timescale that reflects the irregularity of the Earth's rotation) probably triggers a download of the IERS data:: >>> from astropy.time import Time >>> Time.now().ut1 # doctest: +SKIP Downloading https://maia.usno.navy.mil/ser7/finals2000A.all |============================================| 3.2M/3.2M (100.00%) 1s