Building Trust with the FAQ Bot: Confidence, Fallbacks and Human Handoff

If Sprint 3 was about putting a heartbeat into the platform with bookings, Sprint 4 was about giving the system a voice. My aim was simple: create a lightweight FAQ bot that could handle common questions and, crucially, know when not to answer.

That last part matters more than people often realise. In an enterprise setting, a bot that confidently gives the wrong answer does more harm than one that admits, “I am not sure.” My challenge in this sprint was to get the bot talking, but also to build in the guardrails that stop it from bluffing.

Teaching the Bot to Measure Its Confidence

I started with a curated list of FAQs stored in a YAML file, the digital equivalent of an index card box. Each question was turned into an embedding, a sort of mathematical fingerprint that captures its meaning. When a user asks something, the system compares the new fingerprint against the stored ones to see which is most similar.

That similarity on its own is not enough. You would not trust someone who just shrugs and says, “I think it is about 70% right.” So I introduced a confidence score, a simple 0–1 scale that shows how sure the bot is of its match. This gave me a dial to adjust. For now, I have set the threshold at 0.6. If the score is below that, the bot holds back.

Knowing When to Hand Over to a Human

The next step was teaching the bot what to do when it was not confident enough. In those cases, it now triggers what I call a “human handoff.” Instead of guessing, it flags the question for review.

That is where email comes in. The bot collects the original query, the closest match it found, and the confidence score, then sends that information via SMTP. In development I used MailHog so I could see the messages without firing anything into the real world, a bit like testing a fire alarm without evacuating the building.

This handoff closes the loop. It ensures that questions outside the bot’s comfort zone do not just disappear into the void. Someone gets notified, and the user is not left with a half-baked answer.

Making the System Explainable

One thing I have learned on projects like this is that the technology is only half the battle. For an AI assistant to be trusted in an enterprise, people need to understand how it behaves.

That is why I updated the documentation to explain not just what the bot does, but why. I wanted anyone, whether they are technical or not, to be able to see the logic:

The bot answers when it is confident.
It stays silent when it is not.
In those cases, it makes sure a human knows.

That clarity builds trust, not only in the system itself but in the delivery process.

Hitting the Bumps Along the Way

Of course, no sprint is without its headaches. I hit a few along the way:

Import errors: At one point, the FAQ router could not find functions it needed. It turned out to be a packaging issue, a small but painful reminder that in Python, file structure matters.
Email in tests: My first test runs tried to send real emails. Not ideal. I fixed this by mocking the email sender so nothing could leak beyond the test environment.
Changing libraries: The test client broke when httpx updated its API. Switching to the new ASGITransportfixed it, but it was another example of how fast-moving dependencies can trip you up.

Each bump slowed me down temporarily, but solving them strengthened the platform.

What Would Be Different at Enterprise Scale?

Building this solo meant I could make quick decisions, change file structures in minutes, and mock out services like SMTP with little ceremony. In a large-scale enterprise, the same work would look very different:

Data ownership: Instead of a simple YAML file, FAQs would likely be stored in a knowledge management system with version control, permissions, and approvals.
Email handoff: Rather than using MailHog, integration would be with corporate SMTP or ticketing systems like ServiceNow, often requiring security reviews and approvals.
Governance: A confidence threshold would not just be a number in code. It would be documented, reviewed with stakeholders, and potentially signed off as part of risk management.
Team dynamics: Instead of one person resolving errors, there would be specialist teams (DevOps, security, QA) handling their areas, with change requests logged and tracked formally.

The delivery specialist’s role in that environment is not just to get the bot working, but to align these moving parts so the solution is safe, explainable, and ready for scale.

Wrapping Up

By the end of Sprint 4, the platform had taken a big step forward. I now have:

A working FAQ bot that answers curated questions.
A confidence scoring system with a configurable threshold.
A safe fallback path that routes low-confidence queries to humans.
Documentation that explains the logic in plain language.

This was not about flashy features. It was about building trust, teaching the platform when to speak, when to stay quiet, and when to pass the baton. In enterprise AI delivery, that balance is what makes the difference between a proof of concept and something organisations can actually rely on.

Sneak Peek: What’s Next

Next up in Sprint 5, I will move from backend plumbing to the first visible user experience. The focus will be on scaffolding a frontend with React, Tailwind, and shadcn/ui, then wiring it to a simple sentiment dashboard. Initially, this will show a basic green and red indicator using dummy data, but the intent is larger: to demonstrate how an AI assistant can not only respond to questions but also measure and reflect sentiment in real time.

I will also introduce Cypress for end-to-end testing, so the full user journey gets the same level of rigour as the backend. This is when the platform will start to feel alive, moving from APIs and test fixtures to something you can see and click.