If you have found yourself swept up in the whirlwind of generative AI news over the past year - from December 2022 to December 2023 - you're in good company. It's been a rapid journey, turning someone who hadn't even heard of Chat GPT a year ago into someone who now uses LLM chatbots and generative fill photoshopping like it’s second nature. The creative sphere continues to witness breakthrough after breakthrough, leaving some brimming with excitement and others a bit uneasy about the future. Personally, I found myself somewhere in the middle.
In this ever-evolving landscape, my perspective synced with tech thought leader Michael Kammes: generative artificial intelligence is becoming an essential tool across the video creation pipeline. It's a reality we can't ignore. While it won't single-handedly steal your career, being slow to adapt might pose some challenges.
The real hurdle for me was navigating the rapid pace of this change. Why start to learn tools that would be obsolete in months? My outlook shifted after trading a series of emails with Caleb Ward of Curious Refuge, a friend at the forefront of the AI filmmaking movement. Rather than remaining on the sidelines, I realized it might be more prudent to dive in now, gain a firsthand understanding from the inside and roll with the punches as these transformative tools continue to develop. So, that's exactly what I did.
Article Contents
An AI Generated Commercial
Most of what I had seen in the AI filmmaking space has revolved around sci-fi and art themes. This isn’t exactly helpful in brand films unless you’re selling NFTs on the Starship Enterprise. I was curious if the technology was at a place where it could be used to create what you might call a “typical” commercial. Just a straightforward product facing TV spot without any fancy graphics or gimmicks. I wanted to push the realism of the medium to see how close to broadcast-ready the tech might be.
There were a few caveats though… At this time, generating AI content with humans or excessive camera movements proves to be problematic (extra limbs and smeared faces aren’t exactly appealing). So I chose to stay away from those things and decided that any voiceover used needed to be generated, not recorded.
Initial Thoughts
The overall process of creating this ad was honestly tiresome. Crafting images via text to get exact framing and necessary details just isn’t there yet and iterating these images over and over to achieve something close to your vision becomes tedious. The resulting video quality from image-to-video generation also isn’t as crisp as I had hoped for after close inspection and it often produced wild unrealistic results.
On the flip side, that is likely where I failed. Rather than attempting to use this medium to create an ad that looked normal, I now feel I should have embraced the experimental nature of it and overlooked the abnormalities AI generation currently causes.
Either way, I learned a lot through this process and am excited to see where we will be at the end of 2024. Until then, here’s a look at how this ad came to be.
Good Spots Need Good Scripts
This commercial needed a solid foundation and that begins with the core concept and a creative script. Keeping in mind the caveats from above, I searched Spec Bank for a script that was product forward and required no human actors. Eventually a script for Fender Guitars came into view that ticked all of my AI filmmaking checkboxes.
Enhancing Scripts with Chat GPT
It was early November when this entire AI exploration idea came to me. Naturally the holidays were on my mind, so I wanted to give the Fender spot a Christmas spin. The script’s original environments easily lent themselves to the addition of simple Christmas flourishes, but I also wanted to push it a bit more and include a lofty Christmas voiceover. In the spirit of AI filmmaking, I headed to Chat GPT for that loftiness.
After feeding Chat GPT the Christmas-embellished script, I briefed the bot on my ideas for the vibe of the narration:
“Provide voiceover suggestions that would run under each scene in the commercial. The voice will sound like actor Billy Bob Thornton and the voiceover lines should be short with warm sentiment and leave a short beat of silence between scenes. The voiceover should not narrate what is happening in each scene, but instead be one cohesive thought that culminates to a warm conclusion.”
The resulting script came out almost too easily. It was spooky how well it worked, but what proved even spookier was turning that script into an actual voiceover.
Voiceover Without Actors
Creating the voiceover for the commercial may have been the simplest part of this entire adventure. ElevenLabs produces AI-generated voiceovers that are nothing short of mind-blowing. In truth, I spent more time searching for samples to create a cloned voice than actually creating and generating the voiceover.
Once I had landed on the Fender script, I felt the distinct character and pacing of Billy Bob Thornton’s voice would lend itself nicely to the story and his rockstar heritage only seemed fitting. After sourcing a few select interviews of Thornton, these were fed into ElevenLabs VoiceLab. Within a minute or so the voice was ready for use and I dropped in the Chat GPT generated VO script.
During speech synthesis, the current way to dictate pacing and enunciation according to the company relies heavily on context and punctuation, among other tricks. Words inside quotations receive heavier emphasis and the use of commas, ellipses, and periods help to create the intended flow.
What was interesting was that the voiceover could be generated multiple times without changing a single character in the provided text and the outcomes would vary. Ultimately I would generate and save about 6 “takes”. Those "reads" were then combined together while editing, much like if a real actor were involved.
The Elephant in the Room
Now seems like a good time to briefly address the prodigious pachyderm that’s pacing in the corner: I did not receive permission from Billy Bob Thornton to use his voice in this commercial. This spec spot was created only for my learning purposes and not intended for commercial use. Moving on…
Creating Something from Nothing
Imagery was my next mountain to climb after sourcing a script and generating VO. For that I was stuck between diving into Midjourney or toying with the relative newcomer, Leonardo.ai. After watching way too many YouTube videos on the quality subject matter, I opted for Midjourney because of its track record. That said, Leonardo.ai is making impressive strides and their user-friendly GUI seems more intuitive than Midjourney’s text-only prompting.
In case you weren’t already aware, Midjourney actually runs as a bot inside of Discord. After joining the Midjourney server, I followed this guide to create my own private Discord server for myself and the Midjourney bot to chat it up. This was only so I could keep my image generations organized and avoid searching through thousands of other user generations on the public server. The images themselves are still publicly viewable, unless you opt for a Pro Plan, which enables stealth mode. With nothing to hide, I opted for the Standard Plan.
Learning to Prompt in Midjourney
Before getting started I knew one of my challenges with AI image generation was going to be creating a consistent look. Geeky Animals on Medium was my sensei in this regard and I took most of my prompting cues from them. I also perused the images generated on Midjourney’s Explore page to see what other users were creating. If I found an image that I liked, I would jot down its prompt and adapt to my needs.
Ultimately my vision for this commercial was to produce something that looked like it had been shot with a camera in the real world (or at least the cinematic version of the real world). So it needed to be as cinematic and photo realistic as possible. I would experiment and use example images along with text to generate my images, but leaned heavily on the prompts below to give a consistent look across all of the images that were generated:
“long shot wide angle, cinematic, natural light, 1975, color film photograph, ultra realistic, UHD, ektar --ar 16:9 --style raw”
“long shot wide angle, hyper detailed photography, photorealistic, style by Canon EF 18mm f/2.8L III USM lens on a Canon EOS 5D Mark IV camera --ar 16:9”
Once the look was established, it also helped to keep Midjourney’s seed the same throughout prompting new images. Revealing an image’s seed in Midjourney is as easy as sending the bot an envelope emoji after an image is generated. The seed is then added as a parameter while prompting like this, “--seed 2924332163”.
Frustrating Aspects
In all, I generated over 450 individual images for this 7-scene commercial. So not the best efficiency ratio. In my opinion, the creative strength of Midjourney is its ability to generate random imagery with minimal user direction. The downside to this is we are still not at a level where the bot can hit ultra specific directions. Midjourney - currently - faces issues with counting and sizing objects appropriately and direct object placement is less than ideal. In the example below, I begged and pleaded with the bot to have a guitar sticking out from underneath the bed.
Through hundreds of iterations, I succeeded only a fraction of the time and never to my satisfaction. Eventually I decided to use Photoshop to composite some of the guitars into the shots where I needed them. These guitars were either generated by Midjourney or pulled from the web.
Compositing the Details
My naivety came in strongly here. I spent hours not only compositing guitars into scenes, but also adding Fender logos and other minute details to make these stills look great.
Then the shots were moved into Runway where all of that Fender-y detail was promptly decimated. Womp womp.
Making Images Move
Runway is doing incredible things with AI. Last year I used their service to instantly remove the background from interviews while cutting Netflix’s Handcarved Cinema. Now, they’re not only in the text-to-image game, but also leading the image-to-video charge. Runway generates motion from stills and even allows you to direct how that motion moves. Keeping in line with my caveats, I mostly kept the camera moving with a slow zoom in. You can see most of the shots have a nice parallax effect, all thanks to Runway.
I will say that after diving into this project though, image-to-video isn’t ready - yet - for the broadcast market. By now we have all seen stunning shots created with AI, but I truly feel that those videos or scenes are highly cultivated - not the norm. On top of that, a close inspection of rendered media reveals unsettling issues like warping, distortion, and a lot of flickering lighting. Then there’s the problem of humanoid beings morphing into existence from nowhere. Take this shot for instance. Yikes!
All of that said though, the magic that Runway is producing in front of our eyes on our screens - even our phones - in mere minutes is truly astonishing. The recent addition (Fall 2023) of motion brush in particular continues to push those boundaries to unimaginable heights.
High Resolution Image-to-Video
Now Runway has its own options for generating video with enhanced outputs and frame interpolation, but the results I witnessed were typically lackluster. Enter Topaz Labs.
Topaz Labs Video AI is the Swiss Army knife of video restoration. It is able to slow down footage, interpolate frame rates, upscale, denoise, renoise, and so much more with great results. Being an early adopter of their software a few years ago, the most recent version - v4 - is a simple to use lifesaver in the documentary space.
When talking about AI filmmaking, Topaz Labs was the natural choice to upscale and sharpen the outputs from Runway. I used Video AI to upscale the source media to 5K and interpolate the footage to true 23.98fps, while also sharpening and adding a slight grain. The final results never cleaned up as much as I had hoped, but did take the Runway outputs to a more refined level.
The Edit
Considering the minimal amount of footage that was generated from the hundreds of still images, the edit was simple and fast. Voiceover takes were combined into the best single read and then paced to the music to hit a broadcast :30. Then each AI generated shot was dropped in, most crossfading from one to the next. A few shots like the attic, storefront, and guitar neck close up were enhanced with dust, snow, and volumetric lighting. The final touch was relying on the Fender brand guide for the correct product logo and super at the end.
Final Thoughts
AI filmmaking has firmly established its presence, yet it's not on the brink of replacing anyone's job - at least not for now. These tools, put simply, are just that: tools. Throughout this journey though, I found myself missing the human touch that brings a unique essence to the video-making process. Nonetheless, I can’t ignore the substantial potential these AI tools hold for all of us in creative fields, much less editorial.
Despite the breakneck pace at which this new medium is evolving, I no longer feel as daunted as I did before delving into these tools. While AI is certainly changing how things operate in this industry, change itself is nothing unfamiliar in the world of post-production or production. So to all of you filmmakers out there, remember, Keep Calm and Carry On.