Users experience significant variability even with good prompts and references, especially with complex or asymmetric characters. There is a need for more predictable and controllable output from AI video generators, moving beyond trial-and-error.
I decided to leave the video game industry and become a K-Punk music producer. Youtube video here: https://lnkd.in/ek2nrZcb Meet Kiri, my new star. More seriously, I had been curious for a while to test what today’s AI video generators are really capable of. The trigger was being offered two weeks of access to the Runway platform. The best way to evaluate a tool is to build a real project with it. To be honest, this was not something I had been planning for a long time. It was more a case of “I need to try something, I need an idea quickly.” I already had the music for my next dome game, generated with Suno, so creating a music video felt like a great opportunity. That meant building a persona and a short story from scratch. Keep in mind, I am a programmer with zero experience in making this kind of content. I still invested a lot of time and energy into it. I worked on this for days. I learned a lot along the way, but it definitely takes time to create something like this, even if from the outside it may look like “just AI.” I could have polished it further, but I wanted to stay below a $200 USD budget. Tech stack: • Lyrics: ChatGPT • Music: Suno • Keyframe generation: Nano Banana and Kling Image • Video generation: Grok, Kling 3, Veo 3.1, Runway 4.5 • Lip sync: Wan and LTX for close-ups, Pixverse for the rest • Editing: manually in DaVinci Resolve • Subtitles: custom Unity code What I learned: There is no silver bullet. Every video generator has its strengths and weaknesses. Grok is surprisingly good. I do not understand why more people are not talking about it. It is cheap, very fast, and the results are exceptional. For me, it provides the best frame-to-video quality. Kling is not as strong as Grok in raw output, but it supports end frames and references, which is extremely useful for character consistency. I probably should have used it more. It is slower and more expensive, so I only used it when other tools could not achieve the shot. Character consistency could definitely have been better. Using Kling more often would have helped. Using Kling Image instead of Nano Banana for keyframes would probably also have improved stability. Sometimes the difficulty is not where you expect it. Some shots worked on the first try. Others that looked simple on paper turned into a nightmare. Frame-to-video is less magical than I initially hoped. If your character is too small on screen, consistency quickly breaks. For example, Kiri swimming sometimes turns into a mermaid after a few frames. If the character is too close, important parts of the outfit go off screen, and the model has to guess what is missing. It often guesses wrong. End of message in the first comment.