We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language. DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and
![DALL·E: Creating images from text](https://cdn-ak-scissors.b.st-hatena.com/image/square/831817a52c264ea65765776f7294b35197080c7a/height=288;version=1;width=512/https%3A%2F%2Fimages.openai.com%2Fblob%2Fed21faee-ce44-4d91-a70f-26538ad66d5b%2Fdall-e.jpg%3Ftrim%3D0%252C0%252C0%252C0%26width%3D1000%26quality%3D80)