When Code Isn't Enough

/ CONTENTS

/ CONTENTS

/ METADATA

DATE:
8.12.2025
AUTHORS:
Shivansh Sharma
READING TIME:
4 MIN

/ ARTICLE

When Code Isn't Enough

As a developer, you're used to wrestling with edge cases. You build something that works - almost always - and then you hit that one weird scenario that breaks the illusion of perfection. This is the story of how I tackled one such case while building a smart cropping tool for podcast videos, and how the real breakthrough didn't come from more code - but from conversation.

I didn't just debug the issue.

I talked it out - with AI.

However, there are a few limitations with LLMs and Question Answering:

The Problem: One Face, Then Another

The project was straightforward on paper: take podcast clips (in landscape), detect the speaker's face, and crop the video to a vertical 9:16 format - perfect for YouTube Shorts and Reels.

Since these videos usually feature one person speaking, centering the crop around the detected face seemed logical. And it worked beautifully… until the host turned to their guest.

That's when the issues started.

The cropping would stay centered on the original speaker - even if the camera had clearly shifted focus. The result felt robotic. Off. As if the editor had fallen asleep.

It was a subtle bug, but it undermined the polish I wanted the final videos to have.

Enter ChatGPT: Not a Fix, a Partner

I opened ChatGPT, not for a quick fix, but for a dialogue. I wanted to explain the nuance of the problem and see what kind of reasoning it could offer back.

To my surprise, the conversation felt less like a Q&A session and more like a design jam. We debated approaches:

  • Should I recalculate face position dynamically on every frame?
  • Should I base the crop on who's speaking most?
  • What if I maintained separate tracking for each face?

We didn't settle for the first idea that sounded good. Instead, we iterated - proposing, testing, and evaluating every concept for its side effects and scalability.

The Loop: Discuss → Implement → Fail → Improve

At one point, I implemented a scene change detection approach we had discussed. It seemed promising. But after running a few test clips, it became clear that it didn't generalize well across all types of camera switches.

Instead of blindly tweaking it, I went back to the conversation.

ChatGPT suggested a different perspective: what if I didn't treat the video as a single stream of frames but as a stream of faces - each with their own behavior, their own patterns?

That small shift in framing changed everything.

We brainstormed a way to detect and track faces as persistent entities across frames, even when they momentarily disappeared. We built logic around visibility, thresholds, and clean handoffs from one speaker to another - mimicking how a human editor might follow a conversation naturally.

It Wasn't Just Technical

What struck me most was how collaborative the process felt. I wasn't just copying code - I was talking through decisions, thinking aloud, and having those thoughts challenged or improved by the AI.

Here's what made that powerful:

  • It asked clarifying questions when my intent wasn't clear.
  • It reminded me of trade-offs I hadn't considered.
  • It helped me validate why an approach might work - or fail - before even writing code.
  • It encouraged me to stay flexible and approach the problem like a designer, not just a developer.

This wasn't a prompt-and-dump interaction. It was a real feedback loop.

The Result: Something Thoughtful

By the end, I had a system that didn't just technically work - it felt smart.

It behaved the way you'd expect an attentive editor to behave: fluid, aware, and human.

But more importantly, the process of getting there wasn't lonely. It felt like solving a puzzle with a creative partner - one who never got tired, never lost patience, and never judged the dumb questions.

Lessons from the Journey

A few things I took away from this experience:

  • Great solutions don't always start with implementation. Sometimes the real fix is upstream - in how you frame the problem.
  • AI isn't just an answer machine. It's a mirror for your thought process, a sounding board, and occasionally, a source of inspiration.
  • Collaboration is not limited to humans. When you use AI to reason with you - not just for you - the quality of your work improves.
  • Being open to wrong turns leads to better insights. We abandoned ideas mid-way, but each iteration revealed a better path.

Final Thoughts

In a world where generative AI is often reduced to "prompt in, answer out," this experience reminded me of its deeper potential.

It's not about asking for code.

It's about asking the right questions, getting a response that makes you think differently, and building something better together.

This project was never just about cropping video.

It was about designing intelligence - not artificial intelligence, but shared intelligence - between human intention and machine reasoning.

And that made all the difference.

/ NEWSLETTER

Stay updated with our latest articles and insights. Subscribe to our newsletter.

We respect your privacy. Unsubscribe at any time.