"The Name Of The Game Is Power" - Music Video by Mark DK Berry

6 months ago
15

"The Name Of The Game Is Power" - AI Music Video Creation Process:

100% AI visuals, 100% human-made music

Music: "Gone In 60 Seconds" by Mark DK Berry (available from https://markdkberry.bandcamp.com/trac... )

Introduction
With the release of Wan 2.1 i2v (image-to-video) model for Comfyui in early March 2025, it opened up a whole new world of creative possibility. The track "Gone In 60 Seconds" seemed like a good choice for my next music video to test it out.

Hardware & Setup

**Equipment**: RTX 3060 (12GB VRAM), Windows 10, 32GB system RAM
**Required Installations**: Sage attention, Triton, and Teacache
**Note**: First sage attention install killed my Comfyui installation and required rebuilding (blessing in disguise as my setup had become bloated)

Workflow Overview
I spent a few days researching workflows then picked the best I could find. The workflow is available here: https://comfyworkflows.com/workflows/...

I ended up using the same i2v workflow almost exclusively because it included interpolation and upscaling, which cut down post-production time. When making previous AI music videos, I limited myself to 5 days, but with image-to-video I aimed for better quality and more accurate depictions, giving it more time. I aimed at 10 days but managed to complete this version in 8.

Image Creation Process
1. Used Flux-dev-fp8 in Comfyui to create base images (1344 x 768)
2. Used Krita with ACLY AI plugin (SDXL and Flux model) for segmenting, inpainting, and tweaking
3. Created my own Lora in Flux (3-hour process) to avoid copyright issues after previous experiences
4. Shotcut (or Topaz) for interpolating from 16 to 24 fps (I didnt use the Topaz enhancement features as I didnt feel it helped much after testing; it just improved the clarity of the underlying digital gremlins).

Prompt Engineering
Once input images were ready, I used AI assistants (Claude, Grok, or ChatGPT) to generate prompts based on criteria from a prompt-extender developed by Wan developers. This worked much better than simple 3-sentence requests (see the workflow link for that info https://comfyworkflows.com/workflows/... ).

Video Generation

Used Wan 2.1 i2v workflow to create 3-second video clips (16fps with upscale and interpolation)
Most clips required 3-4 attempts, but some (especially car & motorcycle acceleration shots) took 20+ attempts
Tried multiple Wan models but all struggled with moving vehicles while keeping other elements stationary
Seed changes resolved most issues

Post-Production

Assembled clips in Davinci Resolve
Spent time replacing unusable shots
Acceptance & drawing a line in the sand

Lessons Learned

**Organization**: This process required tracking much more information than before, including shot-naming conventions and multiple CSV sheets to manage good/bad takes. When lipsync and ambient audio comes along, this is going to be challenging work for one person.
**Flexibility**: The story I began with wasn't the story I ended up creating - it's best to start with a plan but allow AI to redirect it
**Quality vs. Storytelling**: Sometimes sacrificing visual quality for storytelling works better, especially given hardware limitations

Final Thoughts
This AI model really feels like a leap ahead into a new dawn in visual storytelling. Just like in the 1920s when silent movies emerged, we are in the AI equivalent in 2025. Perhaps independant artists and activists will even stand a chance of challenging the Hollywood & Netflix narratives within a year or two.

Follow and support me on my social media accounts as I continue this journey into AI music video creation. I hope you enjoyed this one.

https://www.markdkberry.com
https://markdkberry.bandcamp.com/
IG @ markdkberry
X @ markdkberry

#aimusicvideo #ai #music #musicvideo #markdkberry #wanx #comfyui

Loading comments...