Skip to main content
SearchLogin or Signup

Unsupervised Machine Learning for the Classification of Astrophysical X-ray Sources

Presentation #108.17 in the session “Missions and Instruments (Poster)”.

Published onApr 01, 2022
Unsupervised Machine Learning for the Classification of Astrophysical X-ray Sources

The Chandra Source Catalog version 2.0 (CSC 2.0) is the ultimate repository of all the X-ray sources detected by the Chandra X-ray Observatory through its history. It represents a fertile ground for discovery in X-ray astrophysics, since many of the sources it contains have not been studied in detail and only a small fraction of them have been classified. Among the potentially paradigm-changing sources that could potentially be found in Chandra data are compact object mergers, extrasolar planet transits, tidal disruption events, among many others. In order to conduct a thorough investigation of the CSC sources, we need to increase the number of those sources that are associated with an astrophysical type. In this work we propose an unsupervised learning approach to classify as many CSC 2.0 sources as possible using both X-ray and optical data (when available) as the input features of our algorithm. Unsupervised learning is particularly suitable given the difficulty to construct a representative training set of independently classified sources. By clustering CSC 2.0 sources according to their properties (hardness ratios, variability probabilities, etc.), and then associating the identified clusters with objects previously classified, we aim to propose a new methodology that could provide us with a probabilistic classification of X-ray sources in general. We employ unsupervised learning methods: K-means and Gaussian Mixtures, and apply them to a list of X-ray properties, to probabilistically classify sources in CSC 2.0. We achieve this by associating specific clusters with those CSC objects that have a classification in the SIMBAD database, and then assigning probabilistic classes by association to unclassified objects in each cluster with an algorithm based on the Mahalanobis distance. We are able to successfully identify clusters of previously identified objects that likely belong to the same class, and even within groups that were identified as having predominantly a type of source, we find sub-classes related to their unique variability and spectral properties. The result of this exercise is a robust probabilistic classification (i.e. a posterior over classes) for 10090 of CSC sources. Our methodology provides probabilistic class assignation to many CSC 2.0 sources, and can be generalized to other X-ray catalogs. We present our pipeline and the full catalog of probabilistic classes.


No comments here