Big Purple Clouds
Posts
Explainer Series #7 - AI and Creativity - How AI Generates Art, Music, and Content

Explainer Series #7 - AI and Creativity - How AI Generates Art, Music, and Content

Big Purple Clouds
September 07, 2023

BIGPURPLECLOUDS PUBLICATIONS
Explainer Series #7 - AI and Creativity - How AI Generates Art, Music, and Content

Artificial intelligence has reached an inflection point in its ability to generate creative works like music, art, and literature. In this article, we will take a deeper dive into the technical architecture powering systems that enable algorithmic creativity using DALL-E 2, MuseNet, and GPT-3 as examples in each area.

Image Generation with Diffusion Models

DALL-E 2 produces remarkably vivid and varied images from text captions. Under the bonnet, it uses a deep neural network architecture called a diffusion model. Diffusion models are trained on massive image datasets, learning to convert random noise into realistic photos through a repeated process of conditioning and denoising.

More specifically, DALL-E 2 learns a mapping between sampled noise vectors and image data, mediated through text captions. At each training step:

The model starts with a real training image and corresponding caption.
It adds noise to degrade the image - essentially diffusing it. This distorted image is the input.
The model attempts to reverse the diffusion and restore the original image. It conditions this denoising on the associated text caption, learning which words map to which visual features.
Iterating this process across training data teaches the model to translate text into image constructs through successive denoising of noise into sharper outputs.

After extensive training, DALL-E can fabricate images from any caption by reversing its learned text-to-image diffusion mapping. Given a caption, the model begins with random noise and sequentially adds finer details guided by the text prompt. After hundreds of denoising steps, this procedural generation produces a novel image matching the description.

Crucially, because the model learned associations between language and image patterns, it can render appropriate scenery, lighting, poses, styles, compositions, and contexts for each caption. Variations in the initial noise also enable unique outputs.

Subscribe to keep reading

This content is free, but you must be subscribed to Big Purple Clouds to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.

Explainer Series #7 - AI and Creativity - How AI Generates Art, Music, and Content

BIGPURPLECLOUDS PUBLICATIONSExplainer Series #7 - AI and Creativity - How AI Generates Art, Music, and Content

Image Generation with Diffusion Models

Subscribe to keep reading

Reply

BIGPURPLECLOUDS PUBLICATIONS
Explainer Series #7 - AI and Creativity - How AI Generates Art, Music, and Content