Skip to main content
SearchLoginLogin or Signup

Analyzing The Uncertainty of Sloan Digital Sky Survey Data

Presentation #142.07 in the session “Sun and Solar System”.

Published onJan 11, 2021
Analyzing The Uncertainty of Sloan Digital Sky Survey Data

With over 200,000 photometric asteroid observations, the Sloan Digital Sky Survey (SDSS) is heavily relied upon by researchers seeking to understand population characteristics of asteroids. The success of those research projects is dependent on the accuracy of the SDSS dataset and the validity of the uncertainty interval associated with each data point. This research returns to the fundamental problem of quantifying the uncertainty of SDSS photometric asteroid data in an effort to understand true population distributions. Since reflectance data reveals information about the composition of asteroids, a photometric dataset of 220000 asteroids was chosen for analysis. With a dataset this large, it becomes possible to frame questions in the language of basic calculus, such as our research question: what are the photometric characteristics of the population density in the limit of zero error. Although the initial population density resembles a gaussian, it is not immediately apparent the extent to which true population features may be hidden by uncertainties in the dataset. To obtain a more accurate representation of this distribution, the dataset was partitioned by the size of the uncertainty in each data point. Histogram plots of partial datasets revealed that high uncertainty data corresponded to a wider distribution. By selecting data with ever decreasing uncertainty, a more realistic representation of the data was obtained. Indeed, two of the photometric bands studied revealed unsurprising double peeked gaussians corresponding to C and S type asteroids. The sheer size of the SDSS dataset allowed us to approximate the population density by finding the convergence of ever decreasing error. From this, the data was modeled as the sum of the gaussian curves, allowing the mean and standard deviation of each curve to be obtained.

No comments here