Art in the Age of AI | Visual Case Study
Dive into the AI-driven creation of "Art in the Age of AI Production" video trailer. Using ComfyUI, Midjourney, and LumaLab's Dream Machine, this case study demonstrates prompt engineering, AI animation, and rapid prototyping in content creation.
"Every person is now potentially an AI artist."
This realization hit me hard as I explored the rapidly evolving landscape of AI-generated art.
What does this mean for creativity, authenticity, and the very essence of art itself?
In my latest article, "Art in The Age of AI Production," I dive into these questions and more. To complement it, I've created a video trailer using AI tools—a meta-commentary on the very subject I'm exploring.
Key questions include:
How is AI challenging our concept of creativity?
What happens when algorithms can mimic masters?
Is the future of art human-AI collaboration?
In this Case Study, I aim to break down my thinking and process, to showcase the ideas, tools and techniques used to help make this come together insanely fast. And it was really fun.
The Process
Creating the Article Imagery: Promptcraft
Time Spent: 6-8 hours
After completing my final draft of the essay, I knew that including images would help the presentation and engagement when published online. I saw an opportunity to create an exercise out of this, and developed a few additional workflows to help. I started by creating a suite of images using ComfyUI, Midjourney, and Photoshop, carefully crafting prompts and inputs to capture the essence of each section of the essay. Below are the final images that I created to be used inline for the article. Clicking on them will open a detail view including the text prompts used (hover with a mouse, or tap the small white dot in the bottom right on mobile):
This was done by looking at each section of the article and deriving key themes which were then developed into complementary visual concepts. I asked myself if there was any immediate symbolism or imagery that came to mind around the identified themes, and started to craft sketches, prompts, or references for each section. For example, I knew right when I started the development of the artwork that I was interested in trying to reinterpret the theme / concept / iconography that Leonardo’s Vitruvian Man represented. This gave me a conceptual starting point for what I wanted to achieve with each image. I also used images I had created as stylistic references for future generations, which helped ensure consistent stylistic output where I wanted it.
ComfyUI Workflow
ComfyUI is an incredibly powerful open-source development environment specialized in working with AI models and computer vision applications. I use it as my main prototyping environment as I find it to be very fast for experimenting with different ideas while developing a workflow. The visual nature of it also helps to reinforce the inner-wrokings of an AI system, and where and how artists can take control.
Polishing the Images
If you are familiar with producing images with AI systems, you already know that a central drawback is that they produce relatively low resolution images compared to professional standards. There are lots of different systems I like to use for creative upscaling, but for simplicity this time I used Magnific.ai. This web app gets really great results without having to deal with the technical aspects. You can click on the below images for a detailed view as well as the prompts used.
-
~1024x1024 pixels is the optimal size for Stable Diffusion, Midjourney, Dalle and others to produce coherent images when generating output. That's literally 1/4 the resolution of 4K. At this point, I know and expect this going into any creative production with AI involved so I mentally prepare myself for the creative act of upscaling. Yes Creative Upscaling is an important thing with AI systems, and can allow you to make broad holistic improvements to output quality in ways that simple image enlargement wouldn’t allow. The big breakthrough here is with the use of tiled upscaling via the control net Tile, which separates an image into connected, contiguous tiles, and then re-generates each tile at a larger resolution.
As the guide linked above mentions, "It upscales by hallucinating new details. You can use a text prompt (and negative prompt), to guide the generation of details towards your desired image." This means that each tile of the input image gets a detailed pass through a new generation before being stitched back together, now at a higher overall resolution, and with much more detail. As a bonus, you have the opportunity to add additional prompts or image reference in this stage, which can provide dramatic transformations to the image content throughout the upscaling process.
Making the Video Trailer
Storyboarding
Time Spent: 2 hours
Honestly, I did not set out with the original goal of producing a trailer for the article. But after I finished producing the article images, I was very motivated to see how these might work together to provide a teaser for the article. I also had been looking for an opportunity to try to use the recently released LumaLabs Dream Machine as a kind of brute force animator. This ended up being a pivotal choice, allowing me to develop the animations dramatically faster than a traditional animation process.
I started by rethinking the article images in a storyboard format, and realized there were a few gaps I could close with some additional images, and a few that I would replace to tighten up the stylistic coherency, which would be more awkward in a video format. Here’s all the images from the “storyboard phase”:
Animating the Storyboard Images
Time Spent: 4 hours
I used these storyboard frames as starting and ending frames in LumaLab's Dream Machine to generate the animations, sometimes adding prompts to guide the movements in specific ways. This required a lot of iteration, and typically included several attempts before getting something I felt was working for a given shot.
Having created a lot of animated content in the past through more traditional means, I can feel the pain of the lack of control, but for quick prototyping this is pretty amazing, and honestly it was one of the most fun phases of the project.
Voiceover, Audio, and Editing
Time Spent: 3 hours
Finally I wrote a voiceover script, recorded and edited it with Descript, and rendered animated captions (also using Descript).
Deciding to just go all in on AI production, I used Suno to create a soundtrack. The prompt was “pizzicato strings and subtle deep glitch drums soundtrack for a video essay, soothing yet ominous”. I ended up running 10 generations before choosing the first one it created:
Then I laid out all the media in Premiere Pro, got a tight edit together, and viola: