Diffusion probabilistic models (DPMs) have achieved remarkable quality in image generation that rivals GANs’. But unlike GANs, DPMs use a set of latent variables that lack semantic meaning and cannot serve as a useful representation for other tasks. This paper explores the possibility of learning semantically meaningful representations from DPMs. We propose Diffusion Autoencoders (Diff-AE), which are DPMs that use an encoder to learn high-level semantics and a decoder to generate diverse images. We show that (i) the encoder can deterministically map any input image into a 512-dimensional semantic encoding, (ii) the decoder can condition on this encoding to generate diverse high-quality images sharing the same semantics, and (iii) the resulting semantic representation is useful for downstream tasks, such as real image manipulation and interpolation. Our approach is simple: we train an encoder that learns to reverse the forward diffusion process to extract a semantic representation from an image, and a decoder that learns to reconstruct the image based on this extracted representation. We show that both can be trained jointly by optimizing a simple reconstruction loss on ImageNet. Our code and models are available at https://github.com/phizaz/diffae.