Skip to main content
SearchLogin or Signup

Pilot study for the autonomous discovery of unknown unknowns in photometric timeseries data

Presentation #127.07 in the session “Computation, Data Handling, Image Analysis”.

Published onJan 11, 2021
Pilot study for the autonomous discovery of unknown unknowns in photometric timeseries data

The rate at which astronomical data are collected has risen rapidly and now far exceeds the manual capacity. Unique and yet-undiscovered systems may be buried amongst systems whose parameters are familiar and well understood. Thus, methods of analysis must evolve to select these systems from the group for the purposes of further study. Artificial intelligence is a clear solution to this need. A.I., particularly neural networks, excel at pattern recognition and are scalable so that they can be incorporated into larger data pipelines. EBAI (Prša et al. 2008) demonstrated that eclipsing binary systems (EBs) could be classified using neural networks, achieving 90% accuracy on samples from the OGLE survey and nearly 100% on samples from the CALEB survey. We seek to build on the results of EBAI in two ways: expand the network to include all types of stellar systems in all publicly available time-series catalogs, and push the clustering capacity to 5σ; that is, 99.99997% of stellar systems would be classified into a known subtype of stellar system, leaving only the most unique outliers for further study. This project specifically acts as a proof of concept for this larger goal by performing classification on a smaller number of stellar systems while accepting less accuracy in the classifications. Utilizing the dataset from the Kepler Mission (Borucki et al. 2010) we perform wavelet transformations on the timeseries data for each light curve to project into time/frequency space. Subsequently the wavelet data is down-projected to three dimensions using UMAP (McInnes, Healy, Melville 2018) and then clustered using HDBSCAN. The process identifies clusters of similar systems, leaving outliers that fall between clusters ungrouped. Although we are unable to achieve the proposed outlier identification accuracy, we prove that there is promise in the methods of down-projection and clustering which could be incorporated into a larger pipeline to achieve the desired accuracy.


No comments here