Skip to main content
SearchLoginLogin or Signup

DOI in the IVOA: the VizieR implementation example

DOIs are fully integrated into the Web architecture and provide a canvas of citable cross referable resources. We report the Astronomical Virtual Observatory Alliance DOIs usage survey with a focus on their implementation in the CDS VizieR catalogue service process.

Published onApr 27, 2022
DOI in the IVOA: the VizieR implementation example
·

Abstract

DOIs (Digital Object Identifiers) have become essential in the scientific landscape for data citation and preservation. DOI design follows open data requirements: they are fully integrated in the Web architecture, and they provide a canvas of documented resources that can be linked by cross references. The key to success is the DOI metadata description and its documentation that improves the data visibility and citation. In this context, members of the Data Curation and Preservation (DCP) Interest Group of the International Virtual Observatory Alliance (IVOA) worked together to identify the state of the art of different aspects of the DOI in operation in astronomy. This state of the art shows different aspects for several types of data in the landscape - for example, the VizieR catalogue service of the Strasbourg Astronomical Data Center (CDS) provides DOIs for catalogues (tables and associated data). We highlight the status of the IVOA Note, and give feedback from the DOI implemention experience in VizieR.

DOI landscape

DOI in Open Science

Open data is a concept required by authorities for publicly-funded science products. Authors have to submit their results to open data repositories. It means that the data centers provide free access to well-documented data, enabling their citation and displaying references.

In the Open Data World, data are provided according to FAIR principles (Findable, Accessible, Interoperable, and Reusable). This acronym has been explained by the Research Data Alliance (RDA) with a list of criteria available in the document [1]. The acronym covers different aspects of what is required for data to be open, including data citation and data-reuse capabilities that can be exploited in an interconnected network. A reading of the RDA document shows a close connection with the DOI (Digital Object Identifier).

A DOI is a persistent identifier dedicated to digital objects, associated with a URL and described with standardized metadata. These metadata are optional. They include usage and rights, but also "Provenance" like authors, date of publication, etc. This could be considered as the minimum information required for the resource understanding and for its indexation. At least, the DOI metadata includes references with a vocabulary to specify the role between external resources. The references can be another DOI, URL or any other identifier. This last capability matches the "Interoperability" concept of RDA of an interconnected world where resources are linked together.

We could conclude that a DOI with its metadata covers the FAIR RDA criteria when they are well-documented.

The "Interoperability" concept is extended in astronomy though the Virtual Observatory (VO) which enables the comparison or combination of data provided by different data centers.

The DOI is well-established in the publication process and is commonly and widely used for citation. Links between resources are the journals concerns who could provide an interconnected DOI network based on article citations.

DOI are also available for data with specific metadata defined by the DataCite schema (https://datacite.org/). The metadata vocabulary used in DataCite is based on Scholix (see http://www.scholix.org/) which is also used in Crossref (https://www.crossref.org/) that defines the article scope.

Comparison of identifiers

In astronomy, data centers and journal publishers provide identifiers for the data or articles they publish. The nature of the identifiers is adapted to their specific usage: citation, preservation, data-reuse (see Figure 1).

The bibcode is the older identifier, dedicated to articles and commonly used to cite resources. Very popular, it is used in data centers like NED (NASA/IPAC Extragalactic Database) or CDS and is indexed in the ADS (Astrophysics Data System) bibliographic database. However, the bibcode is a naked identifier, without any metadata, and the resolution mechanism (to access the resource from an identifier) is the affair of each Data Center.

The ivoid, commonly used for data and services, is the identifier used in the Virtual Observatory (VO). In the VO network, the resources are indexed though registries which provide access to data hosted in a long list of data centers. The role of the registries is similar to that of DataCite for DOI or of ADS [2] for bibcode. All include a search engine and an identifier resolver.

The ivoid is a machine-readable identifier used in VO-registries [3] with a description that can be interpreted by VO protocols. The identifier is hidden to the final user who can not use it for citation.

At the other end of the scale, the DOI is human-usable in the sense that it can be easily included in web pages. Then, thanks to the URL resolver mechanism, the DOI links to a human-readable web page called a "landing page".

Figure 1

bibcode

ivoid

DOI

Visibility/popularity

yes

no

yes

Standard scope

biblio/astro.

VO

WWW

Citation

yes

no

yes

Preservation guaranty

no

no

yes

URL resolution

no

yes

yes

API available

no

yes

yes

Metadata

no

yes

yes

Cross references mechanism

no

yes

yes

The technologies used for ivoid and for DOI are similar: they both provide XML including metadata as provenance information (date, authors, abstract), rights, and references. In addition, the VO identifier includes the data description. For instance, a table is described with its columns, and the protocol to access the data is described and can be exploited by remote VO applications. Moreover, both services use the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). OAI-PMH is the common protocol based on HTTP for sustainable data, and it provides a search mechanism and harvesting API to access resources according to criteria like the date of registration (see Figure 2).

Figure 2

DOI and VO-registry architecture

DOI in the Virtual Observatory

DOIs are not a component of the Virtual Observatory - the standards provided through the Virtual Observatory are based on the ivoid identifier. DOI implementation is managed by data providers who decide the DOI metadata they want to publish. Two networks emerge, one focused on the editorial aspects (link, citation with DOI), and the second providing a very high concept of data interoperability. It is then a question of making a bridge between the two networks.

The Data Curation and Preservation interest Group of the IVOA (International Virtual Observatory Alliance) created a working group composed of people from different institutes in order to exchange knowledge on different aspects of DOIs.

The goal was to survey existing DOI in astronomy, to share expertise and feedback and eventually to propose DOI implementation.

DOIs are already in the data landscape but they are not included in the VO standards. For instance a common service to describe observations is called "Obscore" [4] which includes bibcodes in the metadata but not DOIs.

Example of implementation

The state of the art shows different usages of DOI and granularity. DOI are generated for articles by the publishers, for catalogues (VizieR), for observations and dataset (Chandra), for queries result sets, for services, etc.

DOI management depends on the level of detail, or granularity (what I would like to identify?) and the workflows.

For instance, Chandra provides a DOI architecture [5] allowing links between observations and datasets in its archive which have all their own DOI.

There are data centers, like the Chinese VO, which offer authors a DOI pre-registration mechanism for data before article acceptance.

The VAMDC (Atomic and Molecular Data) provides a Query Store service that allows the user to create a DOI on a request executed on their archive.

Each implementation uses the standardized DOI metadata, but the choice of metadata filling as well as the assignment process is specific. For instance, the CADC (Canadian Astronomy Data Centre) provides a service where the users assign the metadata, while the CDS builds metadata for data published by authors in an automated process.

An example of topic and proposal

DOI metadata enables the linking of one resource to another and to specify the relationship existing between both resources. The IVOA working group proposed verbs that can be used to link resources. The figure 3 summarizes links proposals.

Figure 3

DOI proposal for cross references in the VO

For example, a Dataset links a remote resource as a "variant form of" the original data. The link can be an identifier which is not necessarily a DOI, or a URL.

The case of multiple versions of datasets which are mirrored in different institutes was also discussed. For identical data available in different institutes, we propose to link the original resource with the relation "Is identical to" the original DOI or to "Cite" the original repository.

Other proposed links are "Is Supplement To" in order to link a dataset to its reference publication.

At least, when ivoid and DOI exist for the same resource, we propose also to link the resource with "alternate Identifiers" which exist in both schemas.

VizieR DOI implementation

Definition of a VizieR DOI

At CDS, we chose the catalogue granularity. A VizieR catalogue has a reference article and is composed of a set of tables and associated data like spectra or images. The catalogues enriched with metadata are accessible via services developed at CDS and compatible with the VO standards.

Data curation is carried out by CDS documentalists. The metadata includes systems (position, time, spectral band) and a precise definition of the columns. Documentalists also enrich the catalogue with visualization, links to other catalogues or external databases, etc. (see [6]).

The whole is identified with a DOI.

Implementation status

DOIs, in VizieR, are created after article publication when the catalogue becomes available in the VizieR service. It is an automated process, carried out for refereed catalogues for whom we obtain editor acceptance. Today, the workflow is active for catalogues coming from the A&A and AAS journals.

The metadata were subject to a particular attention at CDS. The idea was to provide DOIs for catalogues in addition to the DOIs provided by editors. The CDS documentalists and software engineers held discussions to list the relevant metadata: keywords, authors, orcid, links, etc. Then we proposed our metadata for acceptance to editors.

For example, abstract is a metadata highlighted in search engines like the one provided by DataCite. However, the article abstract can be subject to a licence, in which case we removed it because DataCite metadata need to be CC0 compliant. Finally, we replaced the abstract with a full text containing the title and the bibliographic reference.

Now VizieR catalogues identified with DOI are indexed in the DataCite search engine. Moreover, the Vizier DOIs were added in the metadata of the VO registries and in the search engine of the European collaborative infrastructure EUDAT (https://eudat.eu/services/b2find). The DOI contains also related identifiers like the article references which can be a DOI or a bibcode. In the case of data coming from space agencies, the metadata can be enriched with an additional identifier like the DOI used in the original dataset: for instance the Gaia catalogue provided in CDS is a "variant form of" the original Dataset provided by ESA.

Figure 4

Links used in VizieR DOI

Finally, VizieR provides a dedicated landing page for each catalog - these resources are web pages whose URL is included in the catalog DOI (see Figure 5). In VizieR, landing pages are built from a template: they contain a human view that exposes links, authors, keywords, a sky footprint and access to the data.

Figure 5

VizieR landing page example

Conclusion

The DOI is really interesting for its URL mechanism and its metadata.

We compared DOIs and the Virtual Observatory identifiers. The comparison reveals a similar architecture that can be used to both identify and describe datasets. Both have similar metadata but each is adapted to their individual usage: the DOI is human-readable, easy to include in Web pages, adapted for citation and well implemented in article publication. The VO identifier is machine-readable, hidden from users, and enables data discovery using software and services compatible with the Virtual Observatory.

DOIs draw together a network of interconnected resources as well as the existing Virtual Observatory. In both there is a capability to link resources, using identifiers or URL, that enables improvements to the connection between data and articles. Combined, they are in accordance with the FAIR principle to improve data reuse and data citation. They open capabilities that can be implemented in the future - between data and article, but also with other journals in VizieR.

The DOI is flexible, it allows updating after publication to specify new metadata available in the DOI schema. It enables metadata adaptation with ongoing standards as for instance to update the "local" VizieR keywords to the Unified Astronomy Thesaurus (UAT).

All the work done in VizieR is the result of the VizieR team collaborative work, especially the documentalists E. Perret, P. Vannier, C. fix, M. Brouty and T. Pouvreau.

We thank also the Data Curation and Preservation Interest Group in the IVOA (International Virtual Observatory) for all discussions, the ADS feedbacks, and the journals editors (A&A, AAS) who accepts VizieR DOI to be a part of their articles.

Comments
0
comment
No comments here