What is AI Generated Image?
What is AI Generated Image? https://www.visualstorytell.com/wp-content/uploads/2022/09/AI-Greek-Lunch-thumb.png 366 222 Shlomi Ron Shlomi Ron https://secure.gravatar.com/avatar/906bcce31d9695cb030087534b5f0f6e?s=96&d=mm&r=g
What is AI Generated Image?
“In a nutshell, it’s an image created by artificial intelligence (AI) algorithms. In recent developments, the AI learns the visual structure, style, and text labels for a lot of images. Then upon receiving a text prompt can generate a new image.” – Shlomi Ron
It sure looks like in the past two years, several emerging visual technologies have made their big splashes into the mainstream.
Back in April of 2021, NFTs exploded with the sale of First 5000 Days by the artist Mike Winkelmann (AKA Beeple) for $69M.
Fast forward to October 2021 – coinciding with Facebook rebranding to Meta – ushered the Metaverse, Web3, and immersive storytelling.
I covered both events as they happened (links above) by trying out myself the new platforms and providing you my own perspectives and tips.
Well, it looks like we’re in for another splash reaching a boiling level with AI Generated Images.
What are we talking about in terms of size?
The global generative art market is expected to reach $397.49 Million by 2025 (Global NewsWire), but hey this is just warming up the engines.
An early ricochet that sparked this goldrush was back in October 2018, when a Portrait of Edmond de Belamy, an algorithm-generated print, was sold for $432,500.
To simplify all this action, I’ll break down the AI Generated Image story by using the classic 3-act structure.
Act 1: Setting
What is AI Generated Image?
In a nutshell, it’s an image created by artificial intelligence (AI) algorithms. In recent developments, the AI learns the visual structure, style, and text labels for a lot of images. Then upon receiving a text prompt can generate a new image.
The logic is simple. If a machine can recognize an object such as a cat, it should ideally also be able to recreate a cat.
The first AI-Generated Image system was AARON, developed by Harold Cohen beginning in the late 1960s.
In the last decade, there has been dramatic progress in an approach in machine learning known as deep learning. This allows to learn complex structure with an algorithm that has multiple layers of computation.
Although the main approach has been around for years, recent successes are due in part to the strong computing power in GPUs and the availability of large-scale image data sets.
Back in 2014, Ian Goodfellow and their team developed General Adversarial Networks (GANs), and recent iterations have become even more impressive in producing realistic images.
How does GAN work?
In GANs, generative comes from the idea of generating images, such as cats. Adversarial is due to a trick, in which one neural network tries to generate (fake) cat images that are similar to the database of (real) cat images.
Another neural network tries to tell apart the real cats in the actual database from the fake generated cat images. When that second network can no longer tell apart the real from the AI generated fake cat images, then the image generation is successful.
The AI can now generate cat images that look realistic, but they are not identical to the original cat images (they have similar statistics according to the model).
Next, DeepDream was released by Google in 2015.
In the same year, another generative technique called Neural Style Transfer was developed by Matthias Bethge and colleagues. It allows you not only to generate images of a cat but also to choose a style (say Picasso).
I’ve been experimenting with this technique quite a bit using an app called Prisma. For example, I uploaded this photo of a typical Sunday brunch in Miami Beach and added a style filter – to make it dream-like epic 🙂
Forget uploading your image just describe it!
Unlike DeepDream, where you are basically adding a filter to an existing photo you uploaded, a new crop of Text-to-image AI companies released AI image generators that allows you to create net new images based only on text prompts.
Just in case you want to get extra techie, this new approach of generating images based on natural language descriptions, is based on a version of Generative Pre-trained Transformer 3 (GPT3) – “an autoregressive language model that uses deep learning to produce human-like text (Wikipedia).”
Most notably, in January 2021, San Francisco-based Open AI, released DALL-E. And as recent as April of this year, released their newest system, DALL·E 2, that generates more realistic and accurate images with 4x greater resolution.
In May 2022, Google released its own text-to-image generator, Imagen for researchers.
What does it all mean?
Early this month, DALL-E introduced “Outpainting” allowing you to discover what may lie outside the original frame of masterpieces like Edvard Munch, The Scream (1893).
There are fee-based AI art generation programs like DALL·E 2 or Midjourney (both give you a free trial) or the recently launched free open source programs like Stability AI’s Stable Diffusion.
You’ll find many other generators popping up every day.
Read carefully the license agreement of each generator to understand how you can use the images as terms vary.
My experience kicking the tires
Here are two of the more interesting examples of what they came up with:
Granted, “Visual Storytelling Newsletter” is too short and not descriptive enough, so in my next attempt I added location, characters and style.
With Midjourney’s trial on Discord, you will get 4 options to choose from, then you can ask to upscale and add variations.
That’s what I got for this attempt using an unlikely combinations story:
“Medieval illuminated script styled co-working space with people hard working on their computers with big windows showing urban view of rainy day outside”
And when I took it up a notch, adding a my favorite artist name, using Stable Diffusion:
“A couple having lunch on a Greek island with Pieter Bruegel style with rich blue background.”
The rush to create and monetize AI-generated visual content, has gone beyond individual images, to development of entire graphic novels.
Australian freelance art director, Jordan Booker published The Terrible Misfortunes of an Intergalactic Traveler, a full narrative digital comic featuring illustrations created entirely with Midjourney.
Worth noting is also Loab that freaked out the internet with its nightmarish results.
Act 3: Conflict
I bet you already know this, you don’t have a story if there is no conflict, right?
AI-generated images have sparked several high-stakes concerns:
Abuse: When you can write anything to generate an image, this opens up risks for misinformation and polarization, including deepfakes.
Open AI tried to reduce racial and gender biases in its training data, whereas Stability AI’s founder says that other than removing illegal content it is up to the user discretion.
Ethical and Copyright issues: These AI generators may threaten designer jobs. They also use publicly available images, which may be comprised of copyrighted original artworks made by artists that never gave their permission to use them.
Some online art communities have started banning AI art and there was recent backlash when an AI-generated artwork won a first prize award at the Colorado State Art Fair.
Kotaku’s Luke Plunkett calls AI image generators as “washing machine of intellectual property” as they present another use case where tech is advancing much faster than legal safeguards.
Act 3: Possible Resolutions
Taking a clue from the past, it’s clear that photography didn’t ruin painters’ careers, cinema didn’t kill the theater, and TV didn’t abolish radio.
Each new technology didn’t take away from its predecessor but instead allowed us a new avenue to express our art.
You call this art?
This begs the philosophical question, would you consider an AI-Generated Image – that was created by simply typing a few keywords – as legit art?
I guess it boils down to how you define artistic talent?
By tools: performed using classic tools from a paintbrush, going up to today’s Photoshop, Canva and now, AI image generators.
By mixing, not stealing: Generative art that leverages original images databases in a way is like DJ mixing of original soundtracks to create one continuous original new track. You might argue the same logic also happens in culinary combinations like Asian Fusion.
In all these cases there is a human hand that proactively mush-ups authentic source content into a fresh new result.
However, mixing and learning patterns from existing styles is different from inventing a whole new Picasso style (the AI learns from what you feed it!).
By obstacles to reach proficiency: If you’re like me – when I first heard about generative images – and think, boy that’s easy, anyone can do it!
Well, technically speaking, yes. But the process is very tedious, quite addictive and iterative.
No jokes but an AI image generator is indeed like that “box of chocolates, you never know what you’re going to get.”
Audio-visual artist Christen Bach aptly sums up his process creating Entering the Data Core, an AI-illustrated comic:
“The process comes with a cost though. You will have to view the A.I. as a collaborator. You can notch it in a direction, but you don’t have 100% control, so you need to keep a flexible mind. I generated hundreds of images for these 10 pages, and constantly had to compromise on the images and sequences I had locked in my mind while writing it. Still, I was able to churn everything out in about 3 workdays (ca. 24 hours collected) from nothing to final result.” – Cartoon Brew
Final thoughts
We’re still in the novelty stage, but as AI-generators gain wider adoption and become even more sophisticated (e.g., text-to-video or text-to-3D virtual/physical objects), new provenance, content screening, legal and compensation standards will have to emerge – to minimize havoc.
One way could be for developers of AI generators to first include only images that their creators opted in and can prove provenance. Second, tokenize all images in their database as NFTs. Heck! Even turn this whole organization into a DAO.
This way all future proceeds from generated artworks will be proportionally shared with the original artist, AI artist and the AI generator – the new derivative art gang!
Short-term implications
Until then, short-term I can see how these AI visual generators could threaten a sector like stock image databases such as Pixabay or Pexels.
Why settle for the results my search query returns if I can specifically describe objects, emotions and style to create what I’m exactly looking for?
No doubt, AI generators give you a visual storytelling tool, that in a way forces you to first envision what is the visual outcome you want, then accurately use text to describe the story to trigger the visual and lastly embark on an iterative process of refinement.
Another use case is advertising. Kraft Heinz, is experimenting with DALL·E 2.
But one of its primary applications is still grabbing a growing share of the giant $65B global art market.
For now, if you give it a try, it would like feel like collaborating with an over eager designer that most of the time wants to show off crazy ideas instead of giving you exactly what you asked for 😉
What’s your take? How would you use AI generated images? Feel free to drop your comments below.
This was a sneak preview from my weekly Visual Storytelling Newsletter.
Subscribe now and also receive a FREE copy of my Top 20 Visual Storytelling Tips Picture Book
- Posted In:
- Design
- Images
- Photography
- Story Visualizing
- Visual Storytelling
Shlomi Ron
Shlomi Ron is the founder and CEO of the Visual Storytelling Institute, a Miami-based think tank with a mission to bring the gospel of visual storytelling from the world of art to more human-centric and purpose-driven marketing. A digital marketing veteran with over 20 years of experience working both on the agency and brand sides for Fortune 100/500 brands such as Nokia, IBM, and American Express. He started VSI to combine his marketing expertise with his passion for visual stories stemming from his interests in classic Italian cinema and managing the estate of video art pioneer, Buky Schwartz. At VSI, he helps brands rise above the communication noise through visual storytelling consulting, training, and thought leadership. Select clients include Estée Lauder, Microsoft, and Cable & Wireless – to name a few. He currently teaches Brand Storytelling at the University of Miami’s Business School. Thought leader and speaker at key marketing conferences. He is also the host of the Visual Storytelling Today podcast, which ranks in the top 10 best business storytelling podcasts on the Web. His book: Total Acuity: Tales with Marketing Morals to Help You Create Richer Visual Brand Stories. Outside work, he is a nascent bread baker, The Moth fan, and longtime fedora wearer likely to jive with his classic Italian cinema interest.
All stories by: Shlomi RonYou might also like
1 comment
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Leave a Reply