Vivid Bot Website: Custom RAG & AI Chat Support Agent Integration

Customer service operations are undergoing a rapid AI transformation. However, generic out-of-the-box chatbots often hallucinate, lack access to internal inventory, and frustrate customers. This case study details how Bhalli Software Solutions designed and integrated a custom Retrieval-Augmented Generation (RAG) pipeline, resolving 85% of support queries autonomously and dropping support tickets by 50%.

1. The Challenge: Customer Support Overload & Context Deficit

Our client, Vivid Bot Alliance, supported a catalog of over 20,000 unique inventory configurations, manuals, and FAQs.

Their customer support team struggled to handle incoming inquiries, leading to:

High Ticket Volume: Over 2,500 tickets per week, primarily consisting of simple questions like "Is Part X compatible with System Y?".
Slow Response Times: Average customer wait times spiked to 14 hours during peak seasons.
Hallucination Risks: Early tests with standard ChatGPT pings generated incorrect answers about product specifications, introducing liability risks.

2. The Solution: Structured RAG Pipeline with Gemini & pgvector

We built a custom knowledge engine that retrieves context from a vector database before prompting the AI, ensuring that answers are grounded in the client's official manuals.

The Technical Architecture

[User Query] ──► [FastAPI Middleware] ──► [Generate Query Embedding]
                                                   │
                                                   ▼
[Response] ◄── [Gemini LLM] ◄── [Inject Context] ◄── [pgvector Search]

pgvector Semantic Context Search

We parsed the client's PDF manuals and catalog database into text chunks, converted them into 1536-dimension embeddings, and stored them in a PostgreSQL database using pgvector.

Below is the FastAPI backend endpoint we implemented to perform cosine similarity searches:

# app/services/vector_search.py
from fastapi import APIRouter, HTTPException
from pgvector.sqlalchemy import HNSWIndex
from sqlalchemy import text
from app.db import database

router = APIRouter()

@router.post("/query-context")
async def query_context(query_embedding: list[float], limit: int = 3):
    # Perform cosine similarity search on the documents table
    # the <=> operator represents cosine distance in pgvector
    search_query = """
        SELECT content, 1 - (embedding <=> :emb) AS similarity 
        FROM document_chunks 
        ORDER BY embedding <=> :emb 
        LIMIT :limit
    """
    try:
        rows = await database.fetch_all(
            query=text(search_query), 
            values={"emb": str(query_embedding), "limit": limit}
        )
        # Filter out chunks with low semantic similarity
        return [row["content"] for row in rows if row["similarity"] > 0.78]
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This retrieved context is dynamically injected into the system prompt of the Gemini API, ensuring the response matches the official manuals.

3. Results & Business Impact

The implementation of the Vivid Bot RAG chat platform resulted in:

85% Autonomous Resolution: The chatbot successfully resolved 85% of incoming technical queries without requiring a human agent handoff.
50% Ticket Reduction: Support queues cleared, allowing agents to focus on complex, high-value account issues.
High Answer Precision: The pgvector similarity guardrail completely eliminated hallucinations, ensuring that only verified manual content was served to users.

4. Let's Build Your Intelligent Knowledge Base

Are you looking to implement Generative AI, custom chat agents, or Retrieval-Augmented Generation (RAG) to automate operations and drive customer satisfaction? Work with a bhalli generative ai development specialist to design secure and reliable systems.

Book a Free AI Strategy Session with BhalliSoft to analyze your documentation structure and discuss custom LLM configurations for your business.

Vivid Bot Website: Custom RAG & AI Chat Support Agent Integration

1. The Challenge: Customer Support Overload & Context Deficit

2. The Solution: Structured RAG Pipeline with Gemini & pgvector

The Technical Architecture

pgvector Semantic Context Search

3. Results & Business Impact

4. Let's Build Your Intelligent Knowledge Base

Ready to build your next project?

Other Success Stories

DriveMate Mobile App: High-Performance GPS Tracking & Fleet Logistics

Riyal Mobile App: HIPAA-Compliant Telehealth & Multi-Currency Payments

Spectoral Website: Headless Next.js Migration & LCP Performance Optimization