In order to extract cosmological information from observations of the millimeter and submillimeter sky, foreground components must first be removed to produce an estimate of the cosmic microwave background (CMB). In particular, fluctuations in diffuse Galactic emission are many times brighter than the CMB anisotropy over large swaths of the sky, so this contamination must be effectively cleaned and removed from the observed sky maps. We developed a machine learning approach for doing so for full-sky temperature maps of the millimeter and submillimeter sky. We constructed a Bayesian spherical convolutional neural network architecture to produce a model that captures both spectral and morphological aspects of the foregrounds. The model was then trained using simulations that incorporated knowledge of these foreground components that was available at the time of the launch of the Planck satellite. Once validated on the simulations, the model was applied to Planck observations to produce a foreground-cleaned CMB map.