We explore the use of unsupervised machine learning to classify a sample of >9000 nearby galaxies of all types from the SDSS MaNGA survey. Our aim is to find a classification that correlates with physical properties and that arises naturally when combining maps of the average properties of the stellar populations.
We use the SimCLR (Simple Contrastive Learning of visual Representations (Chen et al. 2020)) framework to obtain meaningful representations of the MaNGA integral field spectroscopy and then perform hierarchical clustering to divide them into classes. The classification should be independent from non-physical information, i.e. the size of the IFUs or the orientation of the galaxies. In order to mitigate these undesired dependencies, we adapt the SimCLR algorithm to MaNGA data. In this framework, the machine is trained to learn similarities between two images that were generated from the same base image, but were subject to different augmentations before being contrasted. In our case, we choose transformations that discourage the algorithm from learning features that are not intrinsic to the galaxies observed: random rotation, different image size, noise perturbance and Gaussian blur. Finally, we define criteria to determine whether the classification correlates reliably with physical properties or with non-physical information known to be present in the data.
We find that SimCLR reduces undesired dependencies with image size, compared to more standard algorithms for feature reduction such as PCA and uMAP. Preliminary results indicate that the most relevant classification correlates strongly with velocity dispersion, splitting the sample into slow- and fast-rotating galaxies. The class with lower velocity dispersion is then subdivided into more groups that are linked to metallicity and age, as well as velocity dispersion. One of the classes found in this second order division matches the main sequence of star forming galaxies. Our unsupervised approach recovers classes of galaxies that are extensively studied in the literature, yet that is not the case for all of our classes. We aim to study the physical properties of the clusters found to understand what makes them meaningful to our classifier.