An overview of the interoperability standards being pursued in the ESCAPE project to enable FAIR data and services in a wide range of astronomy infrastructures, plus consideration of some of the data stewardship aspects of multi-wavelength and multi-messenger astrophysics.
The ESCAPE project (European Science Cluster of Astronomy & Particle physics ESFRI research Infrastructure) is addressing the Open Science challenges shared by the astrophysics ESFRI facilities (SKA, CTA, KM3NeT, EST) as well as other pan-European research infrastructures (ESO, JIVE) in the context of the European Open Science Cloud (EOSC). One of the activities of ESCAPE is to identify the best practices for data stewardship in astronomy and in particular how to use common IVOA standards to implement the FAIR principles. We present a summary of the interoperability topics being pursued in ESCAPE in the different scientific domain areas, and relate these activities to data stewardship.
The ESCAPE project (European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures1) is addressing the Open Science challenges shared by the astrophysics and accelerator particle physics facilities that are labelled as ESFRI2 by the European Strategy Forum for Research Infrastructures. The astrophysics ESFRI infrastructures are the CTA (Cherenkov Telescope Array), ELT (ESO Extremely Large Telescope), EST (European Solar Telescope), KM3NeT (Kilometre Cube Neutrino Telescope), and SKA (Square Kilometre Array). The other pan-European research infrastructures involved are ESO (European Southern Observatory), JIVE (Joint Institute for VLBI in ERIC) and EGO/Virgo (European Gravitational Observatory Virgo collaboration). The project is being done in the context of the European Open Science Cloud (EOSC3) which is a major European initiative for Open Science.
The CEVO4 Work Package of the ESCAPE project [1] addresses the implementation of FAIR principles for Astronomy and Astroparticle ESFRI by using IVOA5 (International Virtual Observatory Alliance) standards. The FAIR principles [2] are guidelines for data resources to be findable, accessible, interoperable and re-usable. These principles are key to Open Science as described in the EOSC Association Strategic Research and Innovation Agenda6. In order to implement FAIR in practice it requires the scientists, software engineers and data experts to face the different ‘data stewardship’ challenges across the different facilities and data types. The work of the project is built around the requirements of the various ESFRI, in particular their needs for standardisation and interoperable tools. A recent Technology Forum event7 was held to track the progress of the work. This poster paper presents a summary of the most important interoperability topics that have been identified in the different scientific domain areas, and the implications for data stewardship of the different types of data and services and how this is evolving with the development of EOSC. All the IVOA standards cited in the text can be found at https://www.ivoa.net/documents/.
Radio and millimetre astronomy produce a wide range of data with special characteristics related to the interferometric nature of the observations. These data have typically been difficult to use by non-specialists, but the developments of current facilities (such as ALMA, JIVE, LOFAR) and new facilities (e.g. SKA) are putting much effort into making the data widely usable. The ESCAPE project contributed to a recent review, published as an IVOA Note8, of how the Virtual Observatory can be better adapted to work with radio data. The emphasis of the work, based on the ALMA, JIVE and SKA requirements, is on data discovery and visualisation. Firstly to "Make the data easy to find" – which requires standards for discovery of interferometric data, and definition of the necessary metadata for interoperability. Secondly, a way to encode "What happened to these data?" – for which the IVOA Provenance Data model [3] has been developed and then tested for radio astronomy [4]. In terms of visualisation, there is a strong motivation to implement the all-sky zoomable interfaces that are enabled by the IVOA HiPS standard [5] and also access to VO services for radio data from Python language programs and notebooks.
High energy astrophysics by the future Cherenkov Telescope Array (CTA) will be pursued with an observatory operational model, and many of the interoperability challenges involve the development of archiving and data access services for the complex CTA data. CTA will detect the Cherenkov radiation from particle showers triggered in the Earth atmosphere by high energy particles from astrophysical sources. The data processing chains require the capturing of the ’provenance’ information at each step to describe what happened to the data using a common standard for the description. The CTA requirements have been used in the use-cases for the development of the IVOA Provenance Data Model [3]. The focus of this effort is to allow users to access quality and reliability information. This work for CTA has been used as an example of how the VO Provenance Scheme can be applied to a large complex project. Its applicability to ESFRI projects and scientific research, has been highlighted in a recent ESCAPE workshop9, and also in a review of practical provenance issues for implementation of the provenance standard [4], and this lead to a proposed management system for provenance information [6].
The KM3NeT neutrino telescope is involved in ESCAPE CEVO with the work focused on the improvement in the use of common IVOA standards for KM3NeT. Some of the first steps for making neutrino data available in the VO include the establishment of expertise within KM3NeT for the VO publication of tabular data and publication of events (e.g. VOEvents). This is strongly motivated by the need to integrate neutrino data services into multi-messenger astrophysics systems, and to set up complementarity with the services of the KM3NeT Open Data Center10. The ESCAPE project work has identified synergies between the requirements for CTA and KM3NeT in terms of provenance standards, as well as for the using of VO applications and tools (e.g. TOPCAT, Aladin).
The use of VO standards is embedded in the operations of many archives and data centres of optical, ultraviolet and infrared astronomy. IVOA has taken an approach of engaging with the large astronomy projects to develop the necessary standards and tools (e.g. [7] and [8]). The ESCAPE project supports the development of IVOA standards based on the needs, which have been updated recently to include standards for time-domain astrophysics and alerts for multi-messenger astronomy, as well as standards for handling and accessing multi-dimensional data (e.g. cut-outs and visualisation). This is in addition to the maintenance and evolution of the standards that are implemented in services such as the ESO archive [9]. The relevant standards11 are ADQL 2.0 [10], DataLink v1.0 [11], ObsCore v1.1 [12], SSAP v1.1 [13], TAP v1.1 [14]. A major improvement to the Aladin Lite web visualiser is also underway for use in multiple astronomy services, making use of the HiPS 1.0 standard [15]. The progress toward all these standards are recorded in Milestone reports of CEVO, which are available in the public library of ESCAPE Milestones and Reports12.
One of the innovative activities being pursued in ESCAPE is the prototyping of services for value-added data in astronomy archives. A highlight of this work is the application of Deep Learning techniques to enable searches for ’similar data’ within the ESO archive of HARPS instrument spectra - see [16]. Various prototypes have been tested to explore how these types of services can go beyond the traditional data archive interfaces, and offer ways to "let the data speak" in the sense of enabling new views of the data based on characteristics identified by machine learning or other techniques.
The detection of gravitational waves and in particular the example where an electromagnetic counterpart was found by follow-up obervations, has marked the beginning a new era of multi-messenger astrophysics [17]. Interoperability of multi-wavelength data is essential for these studies which involve measurements from a wide range of instruments. The EGO/Virgo partner in ESCAPE identified the need for tools for visualising and managing the sky regions covered by the credible region probability maps and by the complex footprints of sky survey coverages. The key development supported in the ESCAPE project is the addition of the time-dimension to the IVOA standardised Multi-Order Coverage (MOC) maps. By including the temporal coverage to the spatial sky coverage maps, it enables searches of data for observations that overlap in both space and time. This builds on the work done in the ASTERICS project [18], and the new functions have been implemented into Python libraries for the use of the space-time MOCs [19].
The European Solar Telescope (EST) is included in the astrophysics ESFRI participating in the ESCAPE project. The emphasis of the activities in the CEVO work is the application of IVOA standards for solar physics data, and setting up the basis for interoperability between the astronomy Virtual Observatory and the SOLARNET virtual observatory 13. This involves a first trial for installation of a IVOA standard TAP service at the Royal Observatory of Belgium, and a study on how to link a TAP service and a solar web browser: JHelioviewer 14. The ESCAPE supported participation of EST partners in the IVOA interoperability meetings has enabled extensions of the IVOA metadata (Unified Content Descriptors ‘UCD’) for the semantics of data content for solar data. This is being used to support cross-column, or even cross-catalog search in SOLARNET VO.
The ESCAPE project brings the Astronomy and Particle Physics ESFRI infrastructures together to address their common challenges. In doing so, we find that there is a strong motivation to support Open Science via the use of the FAIR principles. One of the concepts reinforced by the project is that it is essential to have disciplinary level standards, and that these standards need to include or allow interfaces with wider interdisciplinary standards and frameworks. Disciplinary level standards are necessary because there are specific data and information characteristics within a given domain, that must be managed by experts in order to maintain scientific integrity of the data and services. In astrophysics, this includes the disciplinary level metadata for semantic content (IVOA standard UCDs), and also the systems for sky coordinates and indexing (e.g. World Coordinate Systems [20], and the IVOA HiPS/MOC standards), as well as for characterising astronomical observations (such as the IVOA Observational Core Data Model) to name just a few examples. In practice, it is critical for the bodies that will implement the standards to be involved in their definition and maintenance, which has implications for the data stewardship roles that are needed to support open science in astronomy infrastructures.
With the creation of Open Science frameworks such as EOSC in Europe, and in particular the inclusion of existing infrastructures into these, the astrophysics standards and data services can be seen as part of the systems that have a wider scope of data sharing across all areas of science. This requires that the astronomy standards be compatible at some level with more generic standards. This is already the case with the IVOA Resource Registry which uses the Dublin Core metadata and OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting), and enables the harvesting of records by EUDAT/B2FIND15 in readiness for inclusion in EOSC catalogues. Also, the IVOA standards definitions themselves which are included in the FAIRSharing database of standards16. Detailed interdisciplinary science will demand higher levels of cross-discipline data compatibility, but this is already a big step forward in terms of findability, access and interoperability of resource descriptions. In terms of the EOSC architecture, the astronomy standards would contribute to the "interoperability layer" and VO services can be considered as "thematic services" alongside of those from other domains (e.g. Environmental Science, Life Science, Social Sciences, Photon and Nuclear Sciences). CEVO is also following the efforts to define the cross-disciplinary semantic standard layer in EOSC, so that we will be ready to interface the IVOA semantic standards. Various services, such as authentication and authorisation may be provided generically via shared systems (e.g. the EOSC Core).
In terms of data stewardship, it requires that scientists, software engineers, data publication experts and librarians be aware of the wider context of their work. For example, in these various roles, it will be important to use interoperable metadata and to benefit from systems/tools/services that can be made common across domains. This is challenging because it requires a level of participation in activities that extend beyond the traditional boundaries of a discipline. The EOSC initiative is promoting widespread uptake of Open Science, and projects such as FAIRsFAIR17 are fostering FAIR data practices, so there is a rapidily growing set of resources for training which can be of great value. The work on the interoperability topics in the different areas of astrophysics described above is building up data stewardship expertise in the various ESFRI projects in order for common standards to be implemented for data sharing. A longer term aspiration is that the data stewardship skills become more widely recognised and rewarded in career paths, and that it stimulates new levels of innovation to enable new kinds of science.
ESCAPE - The European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement no. 824064.