Presentation #110.06 in the session “Asteroid Surveys in the Era of LSST”.
In this work, we present a general-purpose Python-based computationally efficient simulated data products creation framework that works at LSST scales (>= 1e9 observations). Creating simulated databases of Solar System object observations from raw simulations (Cornwall et al, Berres et al.) is computationally challenging. The raw inputs consist of hundreds of gigabytes of CSV files, with information about individual objects spread among multiple files. In addition, many object properties are not directly simulated, but must be calculated on the fly from input quantities. The catalog creation system presented here provides an extensible yet computationally efficient framework to create such large datasets. We map each table in the database to a corresponding Python class. The schema of the table is represented as class attributes annotated with a type that is compatible with the required SQL data type. The table class has a registry system that allows an externally defined function to be associated with each attribute (table column), making it possible to compute additional quantities on-the-fly, and add additional computations over time. The code includes custom I/O routines for extremely efficient file reading, writing, and memory management. Because our code minimizes memory usage, we are able to launch a large number of parallel processes, allowing full utilization of all our cpu cores. With a modest sized machine we are able to generate the final simulated tables in a matter of seconds to minutes, in contrast to other techniques which were projected to take multiple days. This quick turn around time allows us to iterate many times while catching bugs, or adding additional columns to the output. Though motivated by LSST, the code is general purpose and may be useful to other missions expected to deliver datasets at similar scales (e.g., NEOSM).