Presentation #224.05 in the session “AGN and Quasars II”.
Most general-purpose classification methods, such as support vector machine and random forest, fail to account for one particularly unique characteristic of astronomical data: known measurement error. In many astronomical settings, measurement error uncertainties are given in the data, but this information is often discarded because popular classification methods do not allow us to incorporate them. We propose a model-agnostic approach that incorporates heteroscedastic measurement error into any existing classification method to better quantify uncertainty in astronomical classification problems. The proposed method first simulates perturbed realizations of the data from the Bayesian posterior predictive distribution of a measurement error model. Then, a classifier is re-fit to each simulation. The variation of any quantity of interest across the simulations naturally reflects the uncertainty propagated from the measurement errors in both training and test sets. We demonstrate the use of this approach via two numerical studies. The first is a thorough simulation study applying the proposed procedure to support vector machine and random forest, as these are well-known hard and soft classifiers, respectively. The second study is a realistic classification problem of identifying high-z (2.9 ≤ z ≤ 5.1) quasars. The data were obtained from merged catalogs of the Sloan Digital Sky Survey, the Spitzer IRAC Equatorial Survey, and the Spitzer-HETDEX Exploratory Large-area survey. Out of 10,520 high-z quasars identified by a random forest without incorporating measurement error, 2,273 are identified as potential misclassifications using the proposed method. In addition, 765 new high-z quasar candidates are identified.