Presentation #401.03 in the session Extrasolar Planets.
The field of exoplanet detection has expanded rapidly in recent times, with over 4000 planets already discovered. However, given the large datasets obtained through current NASA missions, manual observations have become tedious, with a large portion of data left unanalyzed. In this study, I develop a novel, high-performance machine-learning model that is both accurate and efficient in analyzing transit data both for valid candidates and habitability. The light curves for exoplanet candidates from the NASA Exoplanet Archive were obtained, then folded and processed to remove noise. Feature extraction was performed and the data for exoplanet mass, radius, flux, equilibrium temperature, and orbital period was gathered. I trained four models, SVM, KNN, Random Forest, and LightGBM, in conjunction with a Quantile Regression Model (QRM), ensuring optimal performance by addressing hyperparameter optimization and overfitting. Validation results suggest LightGBM as the highest-performing model, achieving an 88% accuracy rate and an AUC score of 95%. Notably, the model determined that Kepler ID 7376983, which scored 97.01%, has a high probability of being a valid exoplanet, especially given its correlation to the 3 other candidates in the same stellar system which scored 94.94%, 93.17%, and 88.56% respectively. In the test for habitability, Kepler ID 11462341 yielded the highest score with 81.15%, with its radius and flux being similar to Earth. These results are extremely significant as it substantiates the effectiveness of machine learning methods in exoplanet analysis through transit light curves, aiding the process of exoplanet detection and analysis.