Skip to main content
SearchLoginLogin or Signup

Unsupervised Clustering of SETI Radio Spectrograms using Convolutional Autoencoders.

Presentation #429.02 in the session “Astrobiology 2”.

Published onJan 11, 2021
Unsupervised Clustering of SETI Radio Spectrograms using Convolutional Autoencoders.

Using radio spectrograms from the Breakthrough Listen instrument at the Green Bank Telescope (GBT), our goal is to search for technosignatures, as a proxy for extraterrestrial intelligence, using unsupervised machine learning and clustering techniques. We aim to learn the essential features of the spectrograms and classify them into classes of signals with common characteristics such as morphology and bandwidth. We run an Energy Detection algorithm on GBT high spectral resolution data (~3 Hz) and medium resolution data (~3 kHz) from the public SETI (search for extraterrestrial intelligence) data archive. This algorithm detects any signal with energy deviating from the background noise, allowing us to create a database of signals including Radio Frequency Interference (RFI). We separate the output into training and validation sets which we use as input to a deep convolutional autoencoder. The autoencoder allows us to extract the meaningful features of the input from the latent vectors of the bottleneck layer. We then use a self-organizing map to reduce the latent vectors into a similarity map, on which we apply a clustering layer to determine the different classes in the data. To optimize the clustering results, we use the Improved Deep Embedded Clustering algorithm, a computationally efficient method from the literature, which simultaneously learns deep embedded feature representations and performs clustering. Finally, we apply a soft clustering threshold and use the t-SNE algorithm to visualize the clustering results. We are able to group visually similar signals together and identify anomalous ones. The results show certain classes contain a mix of signal morphologies, and we are still working to address this. These groupings can be used to eliminate classes of RFI signals and narrow down our search for technosignatures to interesting candidate signals. Our methods can be applied to a range of astronomical data that is unlabelled or too large for regular classification and outlier detection methods. We conducted this work as part of the BL@Scale team at the Berkeley SETI Research Center under the Breakthrough Listen Initiative. BL@Scale aims to create radio technosignature search pipelines by deploying search algorithms and scaling resources dynamically using a cloud-based infrastructure accessible from a web interface.

No comments here