Stop Building AI Agents Until You Watch This (Avoid Failure)
Host, AI with Arun Show
Struggling with AI agents that hallucinate or ignore your rules? Most enterprises are paralyzed by "pre-launch anxiety" because they can't guarantee reliability. In this AI with Arun Show episode, host Arun Trivedi sits down with Harsh Nigam to break down the exact framework for production-ready AI agents: why you must define what an agent should not do before what it should, why you should never let a model talk directly to users, and why AI engineering still needs classic engineering discipline.
Walk away knowing how to apply it
What the conversation covers
Chatbot vs agent: tools, data, and memory as the moving parts
Why manual testing fails once an agent is used at scale
Hallucinations vs guardrails in production, and how each is handled
AI evals: a test suite that mirrors real happy, edge, and adversarial cases
Pre-launch anxiety in fintech and healthcare, and how to move past it
Map what the agent should NOT do first, then build the features
From prototype to production: the accuracy that is actually good enough
Continuous testing and model drift after launch, with PII and compliance from day one
The agent harness: enforce the rules in code instead of trusting the model
How the roles of QA and engineering blur over the next five years
The ideas worth remembering
Define the guardrails, what the agent must not do, before the features
Do not trust the model to follow your rules, force them in code with an agent harness
Evaluate on many signals, not just accuracy, and keep testing for model drift
AI engineering still requires classic engineering discipline
Don't stop being a great engineer just because you're using AI.— Harsh Nigam
Who you'll hear from
Harsh Nigam
Arun Trivedi
Host, AI with Arun Show
See ContextQA in action
Go from watching to doing — spin up an AI agent and watch it test, self-heal, and report for you.