We describe work in progress developing a modular, generative framework for photometric redshift estimation. If all galaxy spectral energy distributions (SEDs) were of the same (nontrivial) shape, and if SEDs were not altered by intervening dust and gas, and if multiband photometry had no measurement errors, galaxy redshifts could be estimated precisely with a few bands of photometric data. Photometric redshift estimation is challenging because these conditions do not hold. We address this by building a hierarchical probabilistic model for galaxy spectra and photometry, with components that account for galaxy SED diversity, extinction, measurement error, and selection effects.
Modeling SED diversity is particularly challenging. This presentation focuses on our work modeling probability distributions over SEDs—i.e., over functions of wavelength—drawing on tools from functional data analysis (FDA, statistics for populations of continuous functions) and machine learning. We use ~800,000 galaxy spectra with known redshifts from SDSS as a training sample. We apply a nonlinear manifold learning algorithm to this sample to find a low-dimensional parameterization capturing the key structure of rest-frame SEDs, essentially defining a smoothly parameterized family of “template” SEDs. Such algorithms provide a mapping from SEDs to low-dimensional coordinates that capture SED similarity (so SEDs that are similar, viewed as functions of wavelength, are assigned similar coordinates). But most such methods do not provide an inverse map (from manifold coordinates to SEDs), which we need to produce a generative model for galaxy SEDs and photometry. We use functional regression techniques from FDA to construct the inverse map.
Manifold learning requires building a graph describing the pairwise similarity of rest-frame SEDs in the training sample (with nodes corresponding to SEDs, and weighted edges quantifying similarity of pairs of SEDs). Once observed SEDs are shifted to the rest frame, pairs of SEDs may have limited overlap, and shifted SEDs will sample spectra at different wavelengths, significantly complicating construction of the SED similarity graph. We describe a hierarchical “splines-and-lines” SED model enabling SED comparisons with quantified uncertainties, accounting for unaligned sampling, limited overlap, missing data, and measurement errors.