Nov 20, 2025 • 10 min read

AI Model Optimization for a Fintech Virtual Advisor

Q: Why is knowledge base accuracy important for an AI assistant?

Knowledge base accuracy ensures the AI provides answers grounded in verified information instead of hallucinating or guessing. It improves answer quality, consistency, and user trust.

Q: How does AWS Bedrock support AI model optimization?

AWS Bedrock enables a unified platform to compare LLMs, integrate vector-based retrieval, and evaluate metrics, allowing repeatable testing and infrastructure-free optimization.

Q: What is a 'faithfulness score'?

A faithfulness score indicates how closely an AI's response aligns with a verified knowledge base. A high score means fewer hallucinations and more trustable answers.

Q: Why was Amazon Nova Pro chosen over Sonnet 3.7?

Nova Pro provided better cost-performance balance, integrated more easily with AWS Bedrock, and delivered fast, consistent results compared to Sonnet 3.7.

Q: How did Perfsys ensure measurable improvement?

Perfsys ran over 500 automated test cases with a repeatable evaluation pipeline to track improvements in accuracy, latency, and KB consistency with quantifiable metrics.

Q: What tools and frameworks were used?

AWS Bedrock, DeepEval, Amazon S3, PostgreSQL, AWS Lambda, and API Gateway were used to create an optimized, testable, serverless AI system.

Achievements: 100% critical issues resolved, 95% faithfulness score improved, 70% QA time reduction

Summarize this case study with AI:

Introduction
Background
The Challenge
Our Approach to AI Model Optimization
Results
Conclusion & Next Steps
FAQ

Introduction

A German B2B fintech company operating in the financial services partnered with Perfsys to improve the performance and reliability of its AWS-based AI assistant through AI model optimization. The client, a small team of under 10 employees, operates a well-known digital platform that aggregates verified customer reviews of financial advisors, banks, and insurers — helping consumers make informed financial decisions.

The client’s vision was to build a reliable, knowledge-based AI assistant capable of answering complex user queries, referencing verified data, and maintaining context during long user interactions.

Background

The client had already implemented a serverless AWS architecture consisting of:

Amazon Bedrock for AI inference
Amazon S3 as a knowledge base repository
AWS Lambda and API Gateway for orchestration
A web UI for the frontend interface

Each AI “agent” represented a unique financial advisor persona sharing access to a centralized knowledge base stored in S3.

Initial AWS architecture for the fintech AI assistant before AI model optimization. Initial AWS serverless architecture used by the client, including S3-based knowledge banks, Bedrock LLM agents, Lambda orchestration, and API Gateway endpoints.

Despite this advanced setup, the agents were inconsistent, prone to hallucination, and often ignored the knowledge base, which compromised reliability. The client engaged us to quantify, diagnose, and systematically improve agent performance.

The Challenge

While the infrastructure was functional, the core challenge lay in AI quality and consistency:

Agents forgot their personalities or initial instructions during extended conversations
Context retention dropped significantly after 3–4 exchanges
Agents produced hallucinated or incorrect answers, sometimes ignoring KB data
No automated evaluation existed to track answer accuracy or reference validity

The client’s main goal was clear:

“Ensure the AI agent provides accurate, reference-backed answers from the knowledge base, with measurable and repeatable quality metrics.”

Our Approach to AI Model Optimization

Perfsys designed a three-phase improvement strategy combining evaluation automation, model experimentation, and AI model optimization at the prompt level.

Baseline Evaluation Pipeline

We began by developing a custom Evaluation Pipeline based on the DeepEval framework. This pipeline allowed automatic testing of hundreds of AI interactions to measure:

Faithfulness score (accuracy of KB reference usage)
Response consistency
Invocation time (latency)

The evaluation pipeline enabled:

Running 500+ automated test cases across multiple sessions
Establishing quantitative baselines for each tested model
Reproducing real user interaction patterns

This became the foundation for systematic AI model optimization across all candidate models.

Multi-phase Agent Evaluation Journey showing how Perfsys refined the testing pipeline, validated model performance, and selected Amazon Nova Pro.

Model Comparison & AI Model Optimization Strategy

As part of our AI model optimization work, we tested three different models within AWS Bedrock:

Model	KB Reference Failure Rate	Invocation Time	Cost/Performance Notes
Claude 3 Haiku	❌ 80%	5.6 sec	Fast, but unreliable KB referencing
Claude 3 Sonnet 3.7	✅ 15%	6.8 sec	Accurate, but higher cost
Amazon Nova Pro	✅ 19%	7.7 sec	Best balance of speed, cost, and accuracy

The testing revealed that Claude 3 Haiku, the client’s initial choice, failed to reference the KB correctly in 80% of cases.

While Sonnet 3.7 had better accuracy, Amazon Nova Pro offered optimal performance-to-cost ratio and superior consistency within Bedrock’s ecosystem.

Model comparison results used for AI model optimization across Haiku, Sonnet, and Nova Pro. Model Comparison Testing results showing KB reference failure rates for Haiku 3, Sonnet 3.7, and Amazon Nova Pro.

Post-Migration Issue Resolution

After migrating to Amazon Nova Pro, we identified and resolved several system-level issues:

Issue	Description	Resolution	Status
Language Support	Agent defaulted to German only	Updated system prompt	✅ Resolved
Contact Info Retrieval	Failed to provide consultant details	Added structured fallback prompts	✅ Resolved
KB File Conflicts	Duplicate answers in S3 files	Logic updated to prefer most recent version	✅ Resolved
Out-of-Scope Handling	Agents generated irrelevant or invented answers	Improved fallback strategy	✅ Resolved

Need to improve the accuracy and stability of your AI agent?

Perfsys specializes in evaluation automation and AI model optimization on AWS. Contact us to discuss how we can help strengthen your AI workflows.

Results

Within six weeks, Perfsys successfully delivered a measurable improvement in AI performance and consistency through targeted AI model optimization.

Key Quantitative Outcomes

100% of critical issues resolved (language, fallback, KB consistency, and hallucination handling)
Faithfulness score improved from 80% → 95%, ensuring nearly all answers are KB-based
Evaluation automation reduced manual QA time by 70%, validating 500+ test cases per iteration

Impact Summary

Valid answer consistency and reliability of KB usage significantly improved
Invocation latency remained stable (~7 seconds average)
Maintenance simplified through automated evaluation cycles

Conclusion & Next Steps

Through systematic testing, evaluation automation, and Bedrock-native AI model optimization, we helped the client transform a poorly performing AI assistant into a reliable, measurable, and scalable knowledge-based agent.

Next steps include:

Expanding multilingual testing (DE, EN, FR)
Integrating new agent personalities for domain-specific advisory roles
Deploying the evaluation pipeline to monitor new model updates automatically

FAQ

What is AI model optimization?

AI model optimization is the process of improving an AI model so it delivers more accurate, consistent, and reliable results in real-world use. It includes testing different models, refining prompts, adjusting knowledge-base retrieval, and measuring accuracy and latency. In short, it helps the AI use the right information, avoid hallucinations, and perform efficiently in production.

Why is knowledge base accuracy important for an AI assistant?

Knowledge base accuracy ensures the AI agent provides answers grounded in verified information rather than hallucinating or guessing. High KB accuracy directly improves answer quality, consistency, and user trust—especially in industries like fintech.

How does AWS Bedrock support AI model optimization?

AWS Bedrock provides a unified platform for comparing LLMs, integrating vector-based knowledge retrieval, and measuring performance metrics. This enables repeatable evaluations, reduced hallucinations, and optimized AI behavior without managing complex infrastructure.

What was the primary cause of low-quality responses?

The root cause was the model’s limited faithfulness to the knowledge base. The initial Claude 3 Haiku model often generated answers from its own reasoning instead of referencing verified KB data stored in S3.

What is a “faithfulness score”?

A faithfulness score measures how closely an AI’s answer aligns with the knowledge base. A high score indicates the response is grounded in approved information rather than invented or hallucinated content.

Why was Amazon Nova Pro chosen over Sonnet 3.7?

Amazon Nova Pro provided the best balance of cost, speed, and accuracy for this use case. While Sonnet 3.7 showed slightly higher faithfulness, Nova Pro integrated more efficiently with AWS Bedrock and delivered faster, more consistent performance at a better cost-to-quality ratio.

How did Perfsys ensure measurable improvement?

Perfsys implemented a repeatable evaluation pipeline, running 500+ automated test cases to compare model performance before and after changes. This allowed us to track accuracy, latency, KB consistency, and overall improvements with quantifiable metrics.

What tools and frameworks were used?

AWS Bedrock – LLM hosting and knowledge base retrieval
DeepEval Framework – automated evaluation pipeline
Amazon S3 & PostgreSQL – vectorized knowledge-base storage
AWS Lambda + API Gateway – serverless orchestration

AWS Experts, On-Demand

Need to move fast? Our cloud team is ready to scale, secure, and optimize your systems. Get serverless expertise, 24/7 support, and seamless CI/CD pipelines when you need it most.

FREE CONSULTATION CONTACT US

Other case studies

i Got This! AI Mental Health Assistant MVP — Built a secure, scalable AI assistant on AWS Bedrock, combining empathetic avatars with serverless AWS architecture and cutting development and testing costs by over 90%. Read the i Got This! AI mental health assistant case study
Infoplay: Digital Signage — Built a serverless CMS for ferry displays that cut off-season costs by 80% and reduced maintenance latency. See the Infoplay AWS digital signage case study
AWS Migration & CI/CD for B2B Software — Replaced Azure with a secure, SOC2-ready AWS platform, cutting deployment time and reducing overhead. Read the AWS migration and CI/CD automation case study

Eugene Orlovsky

CEO & Founder | Serverless architect with 10+ years in distributed systems