Three major subject branches have been added to the UAT since LISA VIII in 2017: software, laboratory astrophysics, and astrostatistics. This paper also explores how the Unified Astronomy Thesaurus has been used during its first two years of implementation at AAS Journals.
The Unified Astronomy Thesaurus (UAT) has been formally recognized and supported by the American Astronomical Society (AAS). Starting in June of 2019, AAS journals such as the Astrophysical Journal and the Astronomical Journal have required UAT concepts for all new article submissions. The Hubble Space Telescope Call for Proposals Cycle 28 and the James Webb Space Telescope Call for Proposals Cycle 1 have both adopted the UAT as the source for descriptive keywords. We look back at usage of the UAT during its first two years of implementation and look forward to continuing to meet the needs of the astronomical community. The Unified Astronomy Thesaurus thrives on user input for growth, and the UAT Curator has seen an increase in feedback and submissions since being implemented into various systems. In order to better support continuous community feedback, the UAT Curator has defined a development roadmap to guide the yearly release cycle. UAT enhancements from the past two years will also be shared with the community. These include new concept branches, concept definitions derived from the Etymological Dictionary of Astronomy and Astrophysics, and enhanced search capabilities.
The Unified Astronomy Thesaurus (UAT) is an open, interoperable, and community-supported project that brings together existing vocabularies and keyword lists from the fields of astronomy and astrophysics into a single, unified, and freely available thesaurus. In practice, the UAT can often be thought of as a controlled vocabulary of astronomy and astronomy-related concepts that is maintained and updated once a year. However, instead of being presented as a flat list of keywords as many previous systems have been, the 2122 concepts within the UAT are organized into a hierarchy, 11 levels at its deepest, collected underneath 11 top level categories.
This hierarchy helps to define basic relationships between concepts, where child concepts are typically a type of the parent concept, a specific example of the parent concept, or a part of the parent concept. In addition to these parent-child links, the UAT can establish a connection between concepts found in vastly different hierarchical paths that are somehow related. For example, solar physics and stellar astronomy are two different branches of astronomical research, even though the Sun is clearly just one particular star. The differences between how stellar and solar astronomy are framed is reflected in the UAT; Solar Physics is one top level concept and Stellar Astronomy is a different top level concept. Even though it’s clear that a solar flare is a specific instance of a stellar flare, these concepts are found in two completely different sections of the UAT.
For another example, we can look at the nuanced difference between “Horizontal branch stars” and the “Horizontal branch” itself. The “Horizontal branch” is a specific area found on the Hertzsprung Russell diagram, which is a model of stellar evolution. Stars identified as existing within the “Horizontal branch” phase of their lives do not belong hierarchically underneath the concept of a stellar evolutionary model. Instead, “Horizontal branch stars'' are found under “Stellar evolutionary types.” Of course, we wouldn't have the concept of “Horizontal branch stars” without the stellar evolutionary model that describes them, therefore these two concepts are connected with a related concept link. In both cases, the related concept link allows us to create a connection between two concepts in unrelated sections of the UAT without forcing those concepts into a strict hierarchy.
Another important feature of the UAT is how each concept is identified. Though every concept has an English language label, the UAT is not a simple collection of words and phrases. A concept is more accurately identified by its Uniform Resource Identifier, or URI. These are a special type of URL, a unique sequence of characters permanently assigned to identify a specific concept in a way that web technologies can utilize. Information about the concept can be found at its URI, but more importantly a web application can be built to access the API side to easily pull out information about the concept’s place within the hierarchy, its related links, and other metadata associated with the concept. Using UAT concepts with URIs enables topical links to be made between papers, datasets, objects, images, and any other type of information.
Another important benefit of adopting the UAT is that it helps to codify and standardize the language used to describe astronomy. For example, without a standard vocabulary, you may end up in a situation where one astronomer describes their observational dataset with the phrase "infrared emitting galaxies,” while another astronomer may have a completely different dataset containing observations of infrared galaxies, and the decide to use a phrase like “galaxies that peak in a far-red wavelength" to describe their information. Since people generally understand the meaning and context of those two phrases, we can easily understand that they have a similar meaning. An astronomer looking for data about infrared galaxies who came across these two different descriptions would easily understand that both datasets would be useful to them.
On the other hand, a computer program that came across these descriptions may not be smart enough to make any connection, beyond using word matching to understand that both datasets contain information about galaxies. Something must specifically tell a computer program that the phrases “infrared emitting” and “peak in a far-red wavelength” (despite having no words in common) are essentially synonymous. However, if the astronomers were both pulling descriptive keywords from the same controlled vocabulary, they would likely assign overlapping concepts, such as “Infrared galaxies,” to their datasets. Then a researcher, also using the same controlled vocabulary, could use that same concept to pull both datasets together into a useful search result.
As mentioned earlier, the UAT is updated once every year, incorporating new concepts, removing deprecated concepts, and adding new content and context. These updates are driven by feedback from a community of engaged and interested astronomers, something crucial for the continued health and development of the Thesaurus. The UAT is an open thesaurus first because anyone in the community can contribute to its maintenance by suggesting term additions, refinements, revisions, and deletions, and second because it is released under a Creative Commons license to allow free use and sharing for any purpose.
Since LISA VIII was held in Strasbourg, France in 2017, there have been two major, one minor, and one patch release for the Unified Astronomy Thesaurus. Changes across these four version updates have included everything from adding new branches to making small edits to a concept’s alternate labels. Overall, about 350 new concepts have been added, while another 80 concepts were deprecated, giving the UAT a net increase of about 270 concepts since version 2.0.0 of the UAT was released in early 2017. During the UAT presentation at LISA VIII in 2017, three areas for expansion were mentioned, and we are happy to report that all three subjects have been added to the UAT as new branches .
The first new branch covers the topic of software in Astronomy. This branch was originally suggested by Alice Allen of the Astronomy Source Core Library and was filled out based on feedback from other astronomers and journal editors. Like other areas of the UAT, we don’t want to end up with a list of specifically named programs. Instead, the UAT concepts in this branch focus on describing different types of software packages, such as astronomical simulations or web services. We also added concepts to describe software documentation and licensing. The expectation is that these concepts are versatile, and can be used to tab full software packages, small scripts, papers that describe and explain software products, or even astronomy file formats.
The next new branch of the UAT covers Laboratory astrophysics, and this section is undergoing a major revision for this year’s release. This is a great example of what happens when we start with a blank slate; there were no concepts about Laboratory astrophysics research, other than some concepts covering elemental abundances. I started by reaching out to astronomers and researchers at the Center for Astrophysics in Cambridge, MA to get their input, meeting with several scientists one on one to at least get an initial sketch for the section. Since this branch was released, we’ve been informed of some areas where coverage is lacking, topics and concepts used by lab astrophysics that are not well represented within the existing set of concepts. This was to be expected; starting a new section from scratch is quite difficult, but we’ve found that once there are some seed concepts to work with it can be easier to springboard into more concepts. The structure of a branch often goes through a revision process as well, especially as new concepts are added. We expect these changes to go live at the upcoming December release.
The final major section that was added to the UAT in the last few years was a set of concepts about astrostatistics. We didn’t have much to work from in the UAT, but the collection of keywords that we’ve curated after working with astronomers and journal editors seems to be doing well so far. Of course, like the other new branches, feedback is always welcome if there are areas of astrostatistics that need improvement or clarification.
In addition to these new branches, another major improvement to the UAT has been the addition of concept definitions. Over the last year it became apparent that including definitions would be useful for multiple reasons. First, additional content and context added to each concept can help developers working on auto tagging systems. Machine learning programs have more information to work with when attempting to understand and make matches between content and concepts. Secondly, definitions help add clarity and disambiguation for anyone looking through the Unified Astronomy Thesaurus. For example, the UAT has a concept called Opposition, and its definition (Figure 1) makes it clear that this word is intended to refer to an astronomical position and not someone with a different point of view.
Overall, about 850 concepts, or one third of the current UAT, were given definitions in the most recent release. Almost all these definitions were sourced from the Etymological Dictionary of Astronomy and Astrophysics, from the Observatory of Paris. Going forward, we hope to source more definitions for the remaining two thirds of the UAT.
Over the last few years, we have made substantial updates to the documentation, concept browsing, and searching features found on the UAT website. The new documentation, focused on the processes and procedures for managing and updating the Unified Astronomy Thesaurus, is located under the “About” heading of the main menu. The Curation Process document describes how feedback is tracked, how decisions are made, and where to go to read about divisions and follow up with additional feedback. The page titled Release Cycle describes the yearly process to update the UAT, details the time frames for evaluations and decisions. The last document added to the websites the “Versioning” page, which describes what constitutes a major or minor change. These three pages were all added to the UAT website over the last year or two in order to better reflect the process of updating and managing the Unified Astronomy Thesaurus.
Some of the most substantive changes to the UAT website are found under the “Explore” heading of the main menu. Here is where a user can explore the new browsing and searching pages in order to gain a better understanding of the concepts and coverage of the current version of the Unified Astronomy Thesaurus. First, the “Alphabetical Browse” is essentially what you would expect, a simple alphabetical list of all the concepts. Clicking on a letter of the alphabet at the top section will jump a user down to that section of the UAT. Clicking on any concept will bring a user to a page displaying more detailed information about the concept, such as its parent concepts, child concepts, definitions, and related concepts if available. The “Hierarchy Browse,” also found under the “Explore” heading, shows how the UAT is organized underneath it’s 11 top level concepts. Like the first view, clicking on a particular concept will provide more detail and information. The search feature has been greatly improved and streamlined over the last year or so. Entering a word, or partial word, into the search box will result in a list of all concepts whose preferred or alternate terms that include the initial search word, including partial matches. The portion of the preferred or alternate term that matches the search query is highlighted in the search results to make it clear why any result returned as a potential result. Like the other browsing features, more information about a concept is available when you click on the term.
Since the UAT was officially adopted as the source for keywords in AAS journals in 2019, we have been collecting lists of articles that use UAT concepts as found in the Astrophysics Data system every few months. Collected together, these snapshots can start to provide statistics and insight into how the UAT has been adopted.
The first snapshot of UAT usage data was taken in December of 2019, a bit less than 6 months after the AAS began requiring new manuscripts to be submitted with UAT keywords. At that time, we found 863 articles that had UAT concepts attached to them. Across all these articles, 881 unique concepts had been used to describe papers within AAS journals, which is about one third of all UAT concepts. To put that into additional context, the previous Astronomy Subject Keywords system only had about 600 total concepts. After six months of use, astronomers had selected more unique concepts from the UAT than were even available to them before.
Over the next two years, I took additional snapshots and examined the usage statistics. Each snapshot is cumulative, including the data from previous snapshots along with any new usage added by the date of the new snapshot. By April 2020 about half of UAT concepts had been used at least once to describe AAS papers. The September 2020 snapshot also showed a dramatic increase in the number of unique concepts, with another small bump in December. The most recent snapshot as of this writing was taken on May 26, 2021, and in it over three quarters of unique UAT concepts had been selected by authors to describe manuscripts at least once. When given access to more concepts, the astronomy community responds by making full use of the deeper and more detailed UAT, and the steady growth of concept usage over time effectively demonstrates value that the UAT provides.
When evaluating how the UAT has been used to describe published papers over the last couple of years, it’s useful to also examine selected concepts themselves. Figure 2 shows the top 15 concepts for each of the same snapshots discussed above. Once again, these numbers are cumulative, meaning that the 44 times that Interstellar Medium is used in the December 2009 snapshot are included in the 126 times it is used in the April 2020 snapshot. Though we don’t have enough data yet to discover topic trends in astronomy over time, or compare top concepts between different time periods, we can start to glean some useful information. For example, in some ways the most interesting concept in these lists is “Magnetohydrodynamics,” found in the last three snapshots.
Every concept in these five lists -- except Magnetohydrodynamics -- is a parent concept, a concept that has narrower and more specific concepts attached to it. This makes a lot of sense; the authors that use these broader concepts to describe their papers are probably also selecting more specific concepts to more narrowly describe their topic. By contrast, Magnetohydrodynamics is a leaf concept; it’s found at the end of its branch with no further child concepts. The fact that so many papers are using this particular concept might indicate that the UAT needs more specific and detailed concepts. It’s still too early to make decisions based on this information, but this is the sort of trend that the Curator and Steering Committee would want to keep track of over time.
Another interesting facet to use when examining usage data is to look at the frequency of concept pairs; how often certain concepts are selected together. Concept pairs can give us more precise information about the topics trending in astronomy. For example, in the December 2019 concept pairs list (Figure 3) we can see some nuance to the topics being written about. Instead of just “Exoplanets” we see “Exoplanets & Exoplanet atmospheres” and “Exoplanets & Habitable exoplanets.” These two sets of concept pairs paint very different pictures about the content of the papers in each group, even though all these articles would be generally about Exoplanets. As we have a larger sample set over a greater period, these concept pairs can give us real insight into trending topics and areas of study in astronomy.
In fact, since we do have another year and a half worth of data, let’s look at the concept pairs from the most recent snapshot, the one taken in May of 2021 (Figure 4). What’s interesting here are these highlighted concepts which are not found in the top 15 concept lists (compare with the rightmost column of Figure 2), even though they are almost always paired with a concept that is within those most frequently used concepts. These highlighted concepts have been selected less frequently overall, yet they have very high co-occurrence with common concepts. Even more interestingly a large subset of those lesser used concepts are leaf concepts, found at the end of their various branches. This shows that very specific concepts are often paired with broader concepts in order to more accurately describe a research paper, and that authors will use both general and narrow concepts as keywords. Of course, many of these papers would have three, four, or even more concepts attached to them, all of which would help to paint a detailed picture about the content of the manuscript.
The UAT is a community project, it thrives on user interest, input, feedback and usage. The more the UAT is integrated and used in the community, the more relevant and current it will be, and in turn the more useful it will continually become to the entire community going forward. As a first step, it was formally adopted by the American Astronomical Society in June of 2019 and since then, other groups such as the Space Telescope Science Institute and the International Virtual Observatory Alliance have also taken steps to integrate the UAT into their processes and systems. Updates are made to the UAT on a yearly basis, such as adding new concepts, removing deprecated concepts, and sourcing definitions. Usage of the UAT over the last couple of years has shown that it is a robust vocabulary capable of supporting many varied use cases throughout the astronomical community.
The author wishes to thank the American Astronomical Society for stewarding and supporting the Unified Astronomy Thesaurus, and eJournalPress for developing the Concept Selection widget. The author also wishes to thank the UAT Steering Committee for their continued work to improve and promote the thesaurus.
The UAT usage snapshots have been included as supporting data, which can be downloaded here: https://doi.org/10.5281/zenodo.6362926