Project PHaEDRA is an ongoing initiative to preserve handwritten notebooks used at the Harvard College Observatory. In 2020, a new crowdsourcing element was added to the project in order to link the notebooks back to their source material: 500,000 glass plate photographs.
Project PHaEDRA (Preserving Harvard’s Early Data and Research in Astronomy) is a collaborative initiative led by Wolbach Library to digitize, transcribe, and enhance the metadata of over 2,500 notebooks and logbooks created from the mid-18th through early 20th centuries by the Harvard Computers and other researchers at the Harvard College Observatory. Digitization of the notebooks began in 2016, and was completed in 2019. As the notebooks were digitized, they were made available online through the SAO/NASA Astrophysics Data System (ADS). At the same time, the digital scans were sent to the Smithsonian Transcription Center for full transcription by volunteers. While the full transcriptions will prove useful to researchers when completed, they do not provide a mechanism for linking the PHaEDRA notebooks back to their original source material: over 500,000 glass plate photographs identified by unique plate numbers. To remedy this, PHaEDRA team members are now working with volunteers on the Zooniverse platform to identify and transcribe the plate numbers found in the notebooks. This new component of Project PHaEDRA is called Star Notes and will eventually allow for deep linking between the notebooks and the digitized glass plates. As other observatories and astronomy libraries consider what to do with their glass plate collections and associated logbooks, Project PHaEDRA may be used as a model to make our shared astronomical heritage accessible for future generations.
From the mid-18th century through the 20th century, the staff of the Harvard College Observatory used physical notebooks to record their astronomical research. This collection of notebooks was re-discovered by staff of the Wolbach Library at the Harvard-Smithsonian Center for Astrophysics in 2016. A new initiative called Project PHaEDRA was formed to share this research more widely through a digitization and transcription effort previously described at LISA VIII. Since then, Project PHaEDRA has completed digitization of the materials and started focusing on phase two goals such as outreach, education, and the creation of enhanced metadata that will someday allow researchers to connect the notebooks back to their original source material: over five hundred thousand glass plate photographs. These new efforts were multi-faceted, but the introduction of a new initiative on the Zooniverse platform called Star Notes was a major driver for this next phase of Project PHaEDRA.
In the late 19th century, The Harvard College Observatory was at the forefront of using photographs in astronomical research. This took off with the advent of the dry plate photographic process, which was significantly more practical than previous processes and allowed for high resolution photographs of the sky to be recorded onto large glass plates. By the 1890s, the observatory was producing thousands of glass plate photographs every year taken at multiple locations across the world. In order to process and catalog the vast amount of data they were collecting, the observatory’s then-director, Edward Pickering, employed a large team of women now known as the “Harvard Computers.” Pickering hired women to do this work because there were many qualified women with college degrees whom he could pay less than equally qualified male assistants. The Computers and other Harvard researchers used logbooks and notebooks to record their findings using the glass plates as their source material. While the collection of glass plates continued to be used and maintained by a series of curators and researchers over the years, and is being digitized through the DASCH project1, the accompanying notebooks were largely forgotten about until Project PHaEDRA. It is our goal to make these notebooks more accessible to the public, as these notebooks hold all kinds of important information about who these astronomers were, how they worked and collaborated, and the methods they used to make important contributions to astronomy.
At the heart of Project PHaEDRA is a collection of 2,518 physical notebooks and logbooks used by employees of the Harvard College Observatory from the mid-18th century through the 20th century. These notebooks lived in boxes at the Harvard Depository, Harvard’s off-site archival storage facility. When Project PHaEDRA was just getting started in 2016, the Project PHaEDRA team was prepared to start the long process of archival description for every notebook in this collection. Luckily, the team found out most of that work had already been done by previous employees at the Center for Astrophysics (CfA), first in a hand-written catalog starting in the 1970s, then a type-written version, and finally an Excel spreadsheet. The discovery of this previous archival work gave the project a dramatic “jump start.” The information was transferred to a SQL database and an online finding aid. The database was used to track the digitization process, which was completed in collaboration with colleagues at Harvard Library Imaging Services. This work was started in 2016 and completed in late 2019. The Wolbach Library is fortunate to share space in the CfA with the Astrophysics Data System (ADS). The ADS team offered to serve all the notebook images online in multiple formats, while also providing access to the notebooks through the ADS. This allows, for example, researchers to search for Annie Jump Cannon in the author field on ADS and find all of her original notebooks in the PHaEDRA collection as full PDFs. Two and a half years of tremendous work and coordination went into digitizing the full collection of notebooks and making them available online. This effort was made possible by great partners at Harvard Library Imaging Services and the ADS. Having the collection described, digitized and online sets the stage for the next phase of this project: full transcription of all the notebooks on the Smithsonian Transcription Center (on-going) and page-level metadata to provide deep linking to the glass plates.
To create the page-level metadata necessary for deep linking of the notebooks to the glass plates, the Project PHaEDRA team created a new initiative on the research platform the Zooniverse called Star Notes2. Volunteers on Star Notes identify and transcribe plate numbers, and note the unique identifiers that tie the notebooks to their original source material. Each time a notebook page is presented to a volunteer on Star Notes, they are asked to answer whether they see any plate numbers on the page. If they answer yes, they are then asked to draw a box around each plate number and transcribe it. To make sure the transcriptions are accurate, seven different volunteers are asked to look at each notebook page. Once a notebook has been fully transcribed, the Project PHaEDRA team can download volunteer data and use data analysis and cleanup scripts to arrive at a single consensus answer. While the Zooniverse includes an easy-to-use project builder for the front-end, and some basic tools for the back-end work, there are still many prerequisites and challenges to think through before launching a project on the platform.
Certain prerequisites needed to be met by the Project PHaEDRA team before launching Star Notes on the Zooniverse in January 2020. While some details may be specific to Star Notes, this list provides a sense of the steps a team needs to take before launching any kind of project similar to Star Notes on the Zooniverse platform.
Clear research questions that can be answered through the Zooniverse and a community of volunteers (e.g. Which pages have plate numbers? What are those plate numbers?)
Item-level descriptive metadata for the collection
High-resolution scans of every page in the collection
Hosting of the scans with unique, persistent URLs for each notebook page in a web-ready image format (e.g. JPEG)
A SQL database of the URLs from #4 cross-referenced with the metadata from #2
Batches of URLs and other metadata in CSV format using the database from #5
Project details, workflows, a field guide, task-specific documentation
Data analysis and data cleanup pipeline (in the case of Star Notes, this includes both Python scripts provided by another Zooniverse project called Notes from Nature3 and scripts written by the PHaEDRA team)
Dedicated staff time for the length of the project for: ongoing engagement with the volunteers, social media, education, project maintenance, data analysis/cleanup
Identification of plate numbers is not always straightforward. There is a high likelihood that volunteers will miss plate numbers and identify false positives. For this reason, we have seven volunteers look at each page and use an algorithm to find the consensus answers. However, even with this system in place, the transcriptions from the volunteers must be of high quality, as there still may be consensus around wrong answers. Numerous potential challenges can impede volunteers in reaching correct answers, including illegible handwriting, shorthand, and mistaking a series of letters and numbers that looks like a plate number for an actual plate number. To try to head off these challenges, the Project PHaEDRA team has employed a number of tactics. In the ‘field guide’ for the project, there are dozens of examples of handwriting, correct and incorrect plate numbers, and other ‘gotchas’ that volunteers may encounter. The field guide cannot be comprehensive since the project spans hundreds of thousands of pages written by hundreds of individuals. For individual cases where volunteers are confused, they can mark a page so that it starts a new thread in the Star Notes ‘Talk’ section, which is an online message board that any logged-in volunteer can contribute to. While this feature cannot be used to go back and make corrections, it is very useful for alerting the Project PHaEDRA team to potential issues that we can address through the field guide or the ‘need some help with this task?’ section of each workflow. Another challenge of this project is that many astronomers neglected to note the series letter(s) when marking plate numbers in their notebooks. Without these series letters, the metadata is useless. Volunteers are asked to mark potential plate numbers with a missing series letter with an asterisk (e.g., *2345) so that we can subset this data. As a follow-up initiative, the subset of plate numbers missing series letters will need to be addressed perhaps with a new workflow on the Zooniverse.
Launching Star Notes on the Zooniverse allowed Project PHaEDRA to reach many new people. While finding plate numbers in the notebooks does not require any knowledge of astronomy or the historical background of the materials, many volunteers were clearly interested in learning more about both the astronomical history and the history of the women hired as computers at HCO. In fact, in a survey of Star Notes volunteers, a majority (86%) indicated they chose to volunteer due to an interest in astronomy and space history, and nearly half (41%) of respondents indicated an interest in women’s history (Fig. 2). The Project PHaEDRA team, especially the Assistant Community Coordinator, Sam Correia, fostered volunteers’ interest in the history of Project PHaEDRA by engaging with the community on the Zooniverse, social media, and through virtual office hours.
Every project on the Zooniverse features its own ‘Talk’ section (message board) that can be customized to meet the needs of that particular project. The message board’s integration into the platform greatly encourages participation. Since Star Notes started nineteen months ago, 7,766 comments have been made by over 600 unique volunteers. That is an average of over 400 comments per month. Many of these comments are simple observations or questions about a particular page in a notebook. Other comments are several paragraphs long and demonstrate volunteers’ willingness to delve deeper and perform hours of independent research to answer other volunteers’ questions. Volunteers have also used the Zooniverse feature of tagging pages to create a folksonomy of useful keywords describing the contents of the notebooks, such as “#star_map” to indicate the presence of a drawing of stars (Fig. 3). The Project PHaEDRA team intends to make these volunteer contributions accessible to researchers, and has other plans for how they may be used that will be described in the Future Directions section of this paper.
Another way Zooniverse volunteers have helped Project PHaEDRA is through outreach on social media. On the Zooniverse message board, volunteers are encouraged to share interesting things they find in the notebooks with
this prompt: “We love science, you love science. Want to talk about science with the Zooniverse community? Saw something really awesome in the notebooks? Mention it here! We might just post the image to our social media pages.” Dozens of interesting images have been gathered this way, and many have been posted on the Project PHaEDRA social media pages to highlight the work of volunteers. Social media also is used to promote virtual events, connect with partnering institutions, and answer questions from community members.
Since November 2020, the Project PHaEDRA team has hosted a free virtual office hour on the second Tuesday of each month via the Zoom web conferencing software. This has allowed some of our most dedicated volunteers to meet each other virtually, participate in question and answer sessions, and learn from guest presenters. From June through August 2021, artists inspired by Project PHaEDRA and/or the Harvard computers were invited to speak about their work and artistic processes, ranging from textiles to books and stageplays. The next three months (September through November 2021) will feature authors including Dave Sobel (The Glass Universe) and George Johnson (Miss Leavitt’s Stars).
Project PHaEDRA has a number of different components. Some, like the identification of all plate numbers through Star Notes on the Zooniverse should wrap up in a year or two, while others, like the full transcription of every notebook on the Smithsonian Transcription Center, will take much longer to complete. Throughout the process of metadata creation and transcription, the Project PHaEDRA team plans to continue engaging with the community of volunteers by offering new opportunities for education and interaction with scholars and historians. Long term goals for Project PHaEDRA include the creation of an integrated access portal that will link the notebooks to the scans of the glass plates done through the DASCH project. Another potential project is to try to train a machine learning model to identify figures in the notebooks.
Wolbach Library is very thankful for our wonderful partners at the SAO/NASA Astrophysics Data System, the Smithsonian Transcription Center, the Zooniverse, and Harvard University Library’s Imaging Services, Archives, and Preservation Services groups. The Project PHaEDRA team extends our heartfelt gratitude to the thousands of volunteers who have given their time to the project on the Smithsonian Transcription Center and on the Zooniverse.