Vivid Bot Website: Custom RAG & AI Chat Support Agent Integration
Customer service operations are undergoing a rapid AI transformation. However, generic out-of-the-box chatbots often hallucinate, lack access to internal inventory, and frustrate customers. This case study details how Bhalli Software Solutions designed and integrated a custom Retrieval-Augmented Generation (RAG) pipeline, resolving 85% of support queries autonomously and dropping support tickets by 50%.
1. The Challenge: Customer Support Overload & Context Deficit
Our client, Vivid Bot Alliance, supported a catalog of over 20,000 unique inventory configurations, manuals, and FAQs.
Their customer support team struggled to handle incoming inquiries, leading to:
- High Ticket Volume: Over 2,500 tickets per week, primarily consisting of simple questions like "Is Part X compatible with System Y?".
- Slow Response Times: Average customer wait times spiked to 14 hours during peak seasons.
- Hallucination Risks: Early tests with standard ChatGPT pings generated incorrect answers about product specifications, introducing liability risks.
2. The Solution: Structured RAG Pipeline with Gemini & pgvector
We built a custom knowledge engine that retrieves context from a vector database before prompting the AI, ensuring that answers are grounded in the client's official manuals.
The Technical Architecture
[User Query] ──► [FastAPI Middleware] ──► [Generate Query Embedding]
│
▼
[Response] ◄── [Gemini LLM] ◄── [Inject Context] ◄── [pgvector Search]
pgvector Semantic Context Search
We parsed the client's PDF manuals and catalog database into text chunks, converted them into 1536-dimension embeddings, and stored them in a PostgreSQL database using pgvector.
Below is the FastAPI backend endpoint we implemented to perform cosine similarity searches:
# app/services/vector_search.py
from fastapi import APIRouter, HTTPException
from pgvector.sqlalchemy import HNSWIndex
from sqlalchemy import text
from app.db import database
router = APIRouter()
@router.post("/query-context")
async def query_context(query_embedding: list[float], limit: int = 3):
# Perform cosine similarity search on the documents table
# the <=> operator represents cosine distance in pgvector
search_query = """
SELECT content, 1 - (embedding <=> :emb) AS similarity
FROM document_chunks
ORDER BY embedding <=> :emb
LIMIT :limit
"""
try:
rows = await database.fetch_all(
query=text(search_query),
values={"emb": str(query_embedding), "limit": limit}
)
# Filter out chunks with low semantic similarity
return [row["content"] for row in rows if row["similarity"] > 0.78]
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
This retrieved context is dynamically injected into the system prompt of the Gemini API, ensuring the response matches the official manuals.
3. Results & Business Impact
The implementation of the Vivid Bot RAG chat platform resulted in:
- 85% Autonomous Resolution: The chatbot successfully resolved 85% of incoming technical queries without requiring a human agent handoff.
- 50% Ticket Reduction: Support queues cleared, allowing agents to focus on complex, high-value account issues.
- High Answer Precision: The pgvector similarity guardrail completely eliminated hallucinations, ensuring that only verified manual content was served to users.
4. Let's Build Your Intelligent Knowledge Base
Are you looking to implement Generative AI, custom chat agents, or Retrieval-Augmented Generation (RAG) to automate operations and drive customer satisfaction? Work with a bhalli generative ai development specialist to design secure and reliable systems.
Book a Free AI Strategy Session with BhalliSoft to analyze your documentation structure and discuss custom LLM configurations for your business.






