Skip to main content
SearchLogin or Signup

Detecting Sequences in Complex Datasets using Machine Learning

Presentation #301.05 in the session “Machine Learning in Astronomy: Data Compression, Representation and Visualization (Meeting-in-a-Meeting)”.

Published onJun 18, 2021
Detecting Sequences in Complex Datasets using Machine Learning

Scientists aim to extract simplicity from observations of the complex world. This process usually includes data exploration in the search of new trends. One-dimensional trends, which are also named sequences, are of particular interest to scientific research: they are often expected in the natural world, and their detection can offer insights into simple phenomena. However, they are challenging to detect as they may be expressed in complex manners in the dataset. In this talk I will present the Sequencer, an algorithm designed to identify the main trend in a dataset in a generic way. The Sequencer constructs graphs describing the similarities between pairs of observations, using a set of distance metrics and over a range of scales. The algorithm is unsupervised, and provides a score for the extracted sequence. Using this score, the Sequencer optimizes its hyper-parameters, and is thus parameter free. I will present the application of the Sequencer to simulated and real-world datasets from astronomy and geology. I will compare its performance to those of two popular dimensionality reduction algorithms: tSNE and UMAP. I will finish by presenting several scientific discoveries we made using the Sequencer.


No comments here