HISTORY OF AI - THE NEW TOOLS: DALL-E AND Midjourney
While AI as an artistic tool has gained significant attention in recent years, it's important to note that this concept is not new. One of the earliest AI systems, AARON, was developed in the 1970s by Harold Cohen. AARON created paintings and drawings based on a set of rules and decision-making processes that Cohen programmed into the software. Since AARON's development, many other AI-based tools and software have been created to aid artists in their work. These tools have contributed to the field of AI art and have made it possible for artists to explore new forms of creativity and expression. In this article, we will discuss some of the recent AI tools and software that have emerged in the 21st century and are widely used in the art world: DALL-E and Midjourney.
DALL-E
The San Francisco-based company, OpenAI, not only developed ChatGPT, which we discussed in one of our previous articles, but also the image-generative AI model, DALL-E, which is connected to the firm’s name. DALL-E is a digital image-generative AI software, based on deep learning technology, that generates digital images from natural language descriptions. It is named after the famous surrealist artist, Salvador Dali and the animated robot from the Pixar movie, WALL-E.
Prior to developing DALL-E, OpenAI was experimenting with a language processor AI. In 2019, the company released GPT-2, which had 1.5 billion parameters and was trained on a large dataset consisting of 8 million web pages. This software was designed to predict the next word in a given text and was also capable of tasks such as question-answering, text summarization, and translation. The next iteration, GPT-3, uses even larger parameters (175 billion) and is capable of performing even more impressive natural language processing tasks.
DALL-E was developed using this model, which serves as a basis for some of the underlying technology used in the project. In addition to GPT-3, the developers also utilize GAN (Generative Adversarial Networks) to bring DALL-E to life.
GAN works with two neural networks: a "generative" network that is trained on a specific dataset (such as images of flowers or objects) until it can recognize them and generate a new image; and the "discriminator" system that has been trained to distinguish between real and generated images and evaluates the generator's first attempts. After sending messages back and forth a million times, the generator AI produces better and better images. Besides GAN, DALL-E also uses a combination of other deep learning techniques, such as transformation, autoencoders, reinforcement learning, and attention mechanisms.
While DALL-E is not the first AI technology capable of generating images, it is the first designed specifically to create images from textual inputs, setting it apart from other models. Previous image-generating AI technologies, such as DeepDream and StyleGAN, were able to generate images, but the results were often less realistic, often blurry, and lacking detail.
In contrast, DALL-E is capable of creating images that resemble real photographs, with a high level of realism and detail. Like OpenAI's other product, ChatGPT, DALL-E is accessible and understandable to a wide audience. It has democratized AI-generated art, allowing anyone to collaborate with artificial intelligence to create unique images
To use DALL-E, the user provides a text prompt that describes the objects, people, or styles they want to see in an image. DALL-E then breaks down the text into discrete units called tokens, which are essentially individual words that have been isolated from the original text. These tokens are fed into a language model to understand the context and generate a semantic representation of the text. Using this representation, DALL-E creates an initial image by passing the semantic representation through an encoder network, which produces a low-dimensional vector image. After generating the initial image, DALL-E refines the image multiple times to make it more realistic. To do this, the image is passed through a series of decoder networks that gradually improve the image. During this process, a discriminator network helps to evaluate the realism of the image and provides guidance to the refinement process. This refinement continues until the image reaches sufficient realism. Once the image is complete, it is presented to the user as output.
DALL-E was first introduced in 2021. One year later, the company released an updated version, DALL-E 2. The new product delivers improved clarity between the text and visuals and speeds up the results to a few seconds. DALL-E 2 implements a diffusion technique that begins with a random pattern of dots and systematically transforms it into a picture by identifying specific characteristics of the image. The output images are now more realistic, and detailed, and boast a higher resolution than ever before. Alongside these enhancements, DALL-E 2 features a new feature known as "variations." This tool provides the AI image generator with a simple image and enables the system to generate as many variants as the user need. It is also possible to mix another image with this, cross-pollinating the two and blending the most important parts of each.
Midjourney
Midjourney is an independent research lab that has gained recognition for its text-to-image AI program with the same name. Similar to DALL-E, the program is designed to create images from textual descriptions. Users type a word or phrase at the input prompt and receive a compelling image on-screen within about a minute of computation. Midjourney has developed its own unique style that has caught the attention of many in recent years.
The idea for the program came from David Holz, co-founder of Leap Motion, a company that produces motion-sensing technology for computers and virtual reality headsets. In 2020 Holz, with a small team, began working on Midjourney, realizing the potential of AI technology. In particular, OpenAI’s CLIP technology sparked his interest in creating high-quality images from AI models using text input. Midjourney was first released in March 2022, and the team has been working to improve it ever since, releasing new versions every few months.
Midjourney's main distinction compared to other AI tools is its emphasis on painterly aesthetics in images, rather than photorealism. One of Midjourney's notable strengths is its ability to adapt real art styles and apply them to any combination of elements that the user wants. It is particularly good in generating visually stunning environments, including fantasy and sci-fi scenes.
Midjourney utilizes a Machine Learning (ML) algorithm to process the user's description. Although not much is known about the specifics, it is believed that Midjourney employs a form of the latent diffusion model, which is the same technology that powers Stable Diffusion. After analyzing the user's input, Midjourney generates the image that best matches the description, then applies the desired art style(s) before seamlessly merging them.
Midjourney is unique among other AI bots as it doesn't offer a website or mobile application for users to access its services. Instead, users must join Midjourney's Discord server, which is a platform that enables people to communicate through voice and text messages and share media content such as images and videos. Discord works with various operating systems, including Windows, Android, iOS, iPadOS, and Linux, and can be accessed through web browsers.
After creating a Discord account, users can visit the Midjourney's website and click on "Join the Beta" to get started. Upon accepting the beta invite, they will gain access to the platform. However, the free package is limited to 25 images and only allows access to the public chat, which may result in longer wait times. To use Midjourney, users can type “/imagine” in the public chat and provide a detailed prompt that includes preferred art styles, moods, and subjects. Once the input is analyzed, Midjourney will generate a grid of four images for the user to choose from.
Images created by Midjourney for all users in a Discord channel are generated and made available within about a minute, contributing to a sense of community. There are also options for upgrading to a subscription. Subscribers can send their text prompts to the Midjourney bot in a private Direct Message in the Discord app and receive images in response without public interaction from other users in a public channel. Despite this, any images generated by Midjourney are still publicly visible by default.
Although the platform is a consumer product, around 30%-50% of its users are professionals who use it for conception in commercial art projects. Midjourney helps to generate a lot of creative ideas, which can help stakeholders to converge on the idea they want more quickly. Additionally, the platform can give artists more confidence in areas they are not confident in, such as colors, composition, and backgrounds.
Midjourney has gained popularity and attracted attention from various industries. The British magazine “The Economist” and the leading Italian newspaper “Corriere della Sera” have utilized Midjourney to create covers and a comics. In 2022 a Midjourney-generated image, “Théâtre d'Opéra Spatial” won first place in a digital art competition at Colorado State Fair. The program has also been used to create illustrations for an AI-generated children's book called “Alice and Sparkle”, with the creator Ammaar Reeshi spending hours selecting the best results from hundreds of generated images.