Is the metrics NASA Astrophysics Data System (ADS) offers as responsible as it could be? ADS is a pioneer, and the principles it has been built on are surprisingly sound. However, users should be wary of some of the later add-ons, like h-index.
The Astrophysics Data System is well established as one of the most prominent discipline specific citation databases. As such, it is a rich source of astronomy related metrics. It has been specified as “designed to be useful to astronomers, not bibliometricians”. There is a general agreement that the coverage, metadata and classifications found in the more general citation databases such as Web of Science and Scopus do not serve needs of the astronomy and astrophysics community equally well.
The Leiden Manifesto for Research Metrics (2015) calls for bibliometricians to account for variation by field in publication and citation practices. We need certain standards to ensure that the metrics from the various citation databases we use is responsible and employs state of art methods. This must be true for all citation databases, including ADS.
Using a 2012–2016 publication dataset from the University of Helsinki research evaluation, we check the astronomy/physics divide of this set using (1) ADS (2) a citation network clustering algorithm promoted as suitable for research evaluation. We also check (3) the Scopus based Topics and (4) the research field results of the new Dimensions citation index that uses AI tools.
We know Astrophysics Data System1 as an excellent tool for astronomers everywhere. It is comprehensive, easy to use, and most astronomers are working happily with it. Besides of being the definitive source of bibliographic data in astronomy, it includes abstracts, full text, and links to data. It is particularly interesting from a bibliometrics point of view, as ADS also pioneered the use of online citation indices. 
Meanwhile, bibliometrics has become a popular tool to measure research impact. It has been used extensively, and also abused. Given the rise of the responsible metrics movement in the 2010s, we wanted to check ADS — is the metrics it offers as responsible as it could be?
Before answering that question, let us first have a little overview on history. In the beginning, citations indices were a tool of scientific exploration. Their initial function was to aid in the discovery of relevant research, using citations. This is also the way ADS uses citations today.
The first citation indices by Institute of Scientific Information were printed. It was quite hard work to produce printed citation data without today’s computing power. At the same time, the number of publications was growing. This called for sustainable business models. Citation counting started, and over time, it was developed into products like impact factors, that you could sell to universities and research institutes, making bibliometrics a commercial effort.
In the 1990s, world wide web made online databases a handy tool. At that time, also ADS went live on the web, being a pioneer in providing citations to astronomy publications online. In the next decade, lots of new tools appeared on the market. These included the ISI database, renamed online as Web of Science, followed by Scopus by Elsevier and Google Scholar. Bibliometrics became a standard evaluation tool. Soon, there were lots of metrics products that were used to evaluate everything from researchers to universities.
Other, less citation-oriented services worth a mention are Crossref2, which provides DOIs to publications, and Microsoft Academic Graph (MAG)3, that has provided valuable metadata and citation links for bibliometricians. However, it was announced in May 2021 that MAG is shutting down by end of the year.
In the 2010s, there was a reaction to the over-dependence on quantitative metrics. Critical voices started to warn about misuse of bibliometrics. This resulted, among others, in Declaration of Research Assessment (DORA, 2013)4, Leiden Manifesto (2015)5, and the Open Citations Initiative (I4OC, 2017)6. All these initiatives cover quite a wide ground. We will not go through their full statements. Instead, we have chosen to have a look into a few recommendations that are particularly relevant for ADS.
First, three points from DORA:
Remove all reuse limitations on reference lists in research articles and make them available under CC0
Make available a range of article-level metrics
Consider the value and impact of all research outputs (including datasets and software)
Second, we have this recommendation from the Leiden Manifesto:
Keep data collection and analytical processes open, transparent, and simple
These sound very much like ADS. Reference lists are openly available. ADS has a range of article level metrics. There are data links. Data collection and analytical processes look fine. All this sounds very good indeed.
Next, we want to consider two further recommendations from the Leiden Manifesto.
Account for variation by field in publication and citation practices.
ADS is based on astronomy at its core, with a selection of journals as a starting point. The astronomy collection is central, but ADS also includes physics publications, also other materials that astronomers cite. However, the astronomy/physics/etc. classification is not particularly well defined. Instead, the additional classifications seem to describe the role of some of the indexed publications as additional sources. We will have a look into ADS astronomy publications and make a comparison with a dataset.
Scrutinize indicators regularly and update them.
ADS was made for astronomers — not for bibliometricians or librarians7. The indicators that ADS uses are either homemade, or they are like -index, chosen because they are widely used. We will not scrutinize the whole list of indicators provided by ADS. Instead, we made a survey for the Finnish astronomy community and asked about their use of metrics, in order to find out how they are used.
The university level assessment of research usually happens every six years. Our astronomy dataset is a subset of the dataset from University of Helsinki research assessment 2012–2016.
This is how we created it. All Helsinki university publications are routinely collected into a University of Helsinki research publication database. Publication data is imported from Web of Science and other sources, and researchers are obliged to check and complete their publication record. Practically, all astronomy papers that belong to University of Helsinki can be found there and, can be considered as a solid basis for further analysis. Some basic identifiers that we collect include digital object identifiers (DOIs) and Web of Science UTs. We harvested the latter IDs for analysis units and sent them to Leiden, where a leading European analytics company, CWTS (Centre for Science and Technology Studies, Leiden University) made the analysis for us.
Our research assessment follows the principles of DORA and Leiden Manifesto, so only those units that have good enough coverage will be analyzed. To make the assessment more responsible, there have been changes as to how publications are classified. Initially, CWTS used Web of Science categories. Those categories are based on journals, instead of article contents. The analysis used to include quite a lot of statistics to explore trends. The resulting statistics was mostly too detailed from the point of view of the annual publication volumes and other similar factors. The new approach was to create clusters of publications, based on citations. These clusters are organized into levels and they are given names like “astronomy and astrophysics”. At University of Helsinki, all astronomy publications belong to Faculty of Science. We matched these faculty publications with ADS records and selected those that ADS labels as astronomy. As a result, we had a dataset of 478 publications. CWTS uses Web of Science, so we use that for comparisons.
Next, let us have a look into what has been going on with bibliometrics tools. We loaded our dataset of 478 publications into three commercial analytics tools to see how much of our set of publications was labeled as astronomy in those other bibliometric databases. After that, we did some further exploration with our dataset.
First, we entered the DOIs of our dataset into SciVal, which is a Scopus based analytics platform. It has a clustering tool that arranges publications into topics and topic clusters based on citations. Figure 1 shows a visualization of the results. On the wheel, you can see the Scopus subject areas. Inside it, there are topics – the biggest one reads “galaxies, stars, planets”. Most of the topics are found in the obvious areas – physics and astronomy.
Next, we have Dimensions. It is a new database by Digital Science, launched in early 2018. It uses a classification scheme called Fields of Research. This scheme is not based on journals – instead, there is an article level classification by Artificial Intelligence. About 78 % of our set was identified as astronomy. The rest are physics, geosciences and chemistry (Figure 2). When Dimensions started, you could see some astronomy labeled in strange fields like psychology, but it seems the algorithm is working a bit better now.
Next, here is the Leiden clustering tool again. This time it has been launched as a part of InCites, which is a Web of Science based analytics platform. This clustering tool became available in InCites in December 2020. You can access the most recent development, the so-called Leiden algorithm, and if you have a large enough network, for which you know a meaningful partition based on scientific disciplines, you can also try it yourself, as it is available from Github.8 As you can see in Figure 3, you can see that about three quarters of our set is labeled as astronomy, physics, and space science.
All three approaches seem to tell us that publications labeled as astronomy in the ADS are not necessarily labeled as astronomy in other databases. The starting point in the ADS was the rough astronomy label, but we can get an idea of the research topics of our set from a visualization. Figure 4 shows the ADS paper network for our astronomy set. Papers are grouped based on shared references and the group names are words from paper titles. In Figure 4, tendencies of within-group publication counts are depicted.
A sample of the data, the list in Table 1 contains top 15 publications cited in ADS from our entire research assessment set of 3531 publications. The list also shows four different classifications. One is the database from ADS. Two are based on citation network clustering. Namely, microlevels by CWTS Leiden and modularity clustering. The fourth, MAG-main, refers to Microsoft academic graph and the main level of its Fields of Study. Correspondence between partitions of our set, obtained using these classifications, was assessed based on some tentative analyses. The focus was in trying to see how discriminative the ADS astrophysics database label is. The result with respect to network clustering methods was that the astronomy label reduced the number of both microlevels and clusters to two thirds or less of what would be expected if equal amount of microlevels or clusters were randomly chosen from our set. Also, the microlevels and the modularity clusters correlated strongly, which makes it less likely that a reduction of network-based classes corresponding to ADS database astronomy could be a coincidence. The abovementioned correlation was found by cross tabulating the amounts of publications in microlevels and modularity clusters, and then calculating the association.
So, the directional numerical results show that the used classifications are not contradictory, but problematic disagreement could well be found if finer details were studied.
A reason to perform these tests is the responsible bibliometrics point of view. That is, the way publications are classified is central, as is the content, or coverage, of the databases where the assessed publications are indexed. Related to this, an ambitious goal would be to try to understand how individual publications can spread information into scientific literature, especially when it comes to publications with not that many citations. The result will depend on the database.
We took a first step and tried modelling the impact for some publications in ADS and Web of Science. For this, we adapted a model for the spreading power of a node in a network . In this framework, a publication citing another allows information from the cited publication to spread in the network. The spreading power of a node (here, publication) depends most strongly on the number of citation links toward it, or when that number is low, on the number of links toward its neighbors. The results obtained in this study tell us that ADS can produce better impact, likely because it has better coverage in core astronomy publications.
The above approaches have been made from the librarian and bibliometrics point of view. However, ADS is specifically designed for astronomers, which means that we need to check what users think about metrics provided by ADS.
For this purpose, we made a questionnaire and mailed it to astronomers working in four Finnish universities (University of Helsinki, University of Turku, University of Oulu and Aalto University). They are not a big community, so 60 answers give a comprehensive enough picture. You did not need to be an astronomer to answer the questionnaire, it was enough to be an ADS user. Most respondents told they are astronomers or astrophysicists (73 %), others chose the physicist (20 %) and space physicist (5%) option.
First, we asked about level of ADS use. Most of the people who answered see it as their number one search tool. Almost half of them never need anything else for their astronomy related searches. Google Scholar came second.
Next question was about use of metrics tools. Two thirds are familiar with ADS metrics. But quite many people turn to other sources of bibliometric data, like Web of Science and Google Scholar. Two thirds say they do not use any visualizations, but the questionnaire seemed to alert a few researchers to these, and there were comments that they would look into them in the future.
There are quite a few of metrics related indices in ADS, and we are of course interested in their use. It is no surprise that -index is the clear favorite. It does not take into account several variables, e.g. the order or number of authors or the length of their career, and it is cumulative. From the responsible metrics point of view, it is not a good indicator, but it is well known, so people are using it. When asked which indicators the respondents would choose to compare researchers for a position or funding, 35 chose -index, while 24 answered they would choose none.
There was quite a lot of feedback. Responsible metrics and responsible researcher evaluation have clearly been in the spotlight, as the respondents were aware of them. There were many answers telling us that metrics is waste of time — meaningless — used too much — metrics can be gamed. We work to give astronomers high quality metrics, and they are telling us that metrics is bad, or worse.
Metrics only provide a rough (and biased) indication of an article’s impact. They are hugely misleading when used without the appropriate caveats (as is often the case). They tend to exacerbate gender imbalances.
I think "personal" metrics should be used with caution (if at all), as they fail at describing the various realities across fields and research groups
Some of the feedback was less critical. There were suggestions as to how to make metrics better, or how to use it more wisely. All of those who wrote answers like this were astronomers, while the more critical voices included answers from physicists.
As ADS is the most complete database and metrics system for astronomy and astrophysics, we should somehow try to promote this knowledge also to the other fields (and especially administrators etc. who are also using these) so that they would NOT use the others for astronomers.
The way forward in assessing research merit would be to move away from the number of publications and give more weight to quality instead of quantity.
I put a lot of weight on normalized citation rate and the normalized number of publications/number of first author publications. I also look at the tori-index as this helps at least partly to remove the effect of different citation traditions between different subfields (short vs long reference lists).
The above was only a brief look into ADS and responsible metrics. What did we learn?
We sent a questionnaire to astronomers, and we found out that Finnish ADS users are critical about metrics. It seems like a contradiction that the indicator they are most using is -index, which is not a good indicator, or something we would recommend.
We had a look at classifications and how astronomy is defined in various bibliometrics services. There are new tools, like clustering and classification by AI, and there will be even more tools in the future. Some of these can give us interesting views into research topics within astronomy. After looking into several databases and tools, we think that there is not a definitive way to label publications as astronomy. However, it is important to describe how the classification is defined. Currently, ADS does not make this as clear as it should.
However, ADS chose a good starting point, and it is on solid ground. But we would like to advise ADS to take into account this Leiden Manifesto recommendation for responsible metrics from bibliometricians: “Scrutinize indicators regularly and update them.”