Perfsys Logo

AI Model Optimization for a Fintech Virtual Advisor

100%

Critical issues resolved

95%

Faithfulness score achieved

70%

Reduction in QA time

Key Insights

Location

Germany

Project duration

6 weeks

Industry

Fintech

Technologies used

AWS Bedrock, Amazon Nova Pro, DeepEval, Amazon S3, PostgreSQL, AWS Lambda, API Gateway

Solutions

Introduction

A German B2B fintech company operating in the financial services partnered with Perfsys to improve the performance and reliability of its AWS-based AI assistant through AI model optimization. The client, a small team of under 10 employees, operates a well-known digital platform that aggregates verified customer reviews of financial advisors, banks, and insurers — helping consumers make informed financial decisions.

The client's vision was to build a reliable, knowledge-based AI assistant capable of answering complex user queries, referencing verified data, and maintaining context during long user interactions.

Background

The client had already implemented a serverless AWS architecture consisting of:

  • Amazon Bedrock for AI inference
  • Amazon S3 as a knowledge base repository
  • AWS Lambda and API Gateway for orchestration
  • A web UI for the frontend interface

Each AI "agent" represented a unique financial advisor persona sharing access to a centralized knowledge base stored in S3.

Initial AWS architecture for the fintech AI assistant before AI model optimization.
Initial AWS serverless architecture used by the client, including S3-based knowledge banks, Bedrock LLM agents, Lambda orchestration, and API Gateway endpoints.

Despite this advanced setup, the agents were inconsistent, prone to hallucination, and often ignored the knowledge base, which compromised reliability. The client engaged us to quantify, diagnose, and systematically improve agent performance.

The Challenge

While the infrastructure was functional, the core challenge lay in AI quality and consistency:

  • Agents forgot their personalities or initial instructions during extended conversations
  • Context retention dropped significantly after 3–4 exchanges
  • Agents produced hallucinated or incorrect answers, sometimes ignoring KB data
  • No automated evaluation existed to track answer accuracy or reference validity

The client's main goal was clear:

"Ensure the AI agent provides accurate, reference-backed answers from the knowledge base, with measurable and repeatable quality metrics."

Our Approach to AI Model Optimization

Perfsys designed a three-phase improvement strategy combining evaluation automation, model experimentation, and AI model optimization at the prompt level.

Baseline Evaluation Pipeline

We began by developing a custom Evaluation Pipeline based on the DeepEval framework . This pipeline allowed automatic testing of hundreds of AI interactions to measure:

  • Faithfulness score (accuracy of KB reference usage)
  • Response consistency
  • Invocation time (latency)

The evaluation pipeline enabled:

  • Running 500+ automated test cases across multiple sessions
  • Establishing quantitative baselines for each tested model
  • Reproducing real user interaction patterns

This became the foundation for systematic AI model optimization across all candidate models.

Agent evaluation journey showing the phases leading to AI model optimization.
Multi-phase Agent Evaluation Journey showing how Perfsys refined the testing pipeline, validated model performance, and selected Amazon Nova Pro.

Model Comparison & AI Model Optimization Strategy

As part of our AI model optimization work, we tested three different models within AWS Bedrock:

Model
Claude 3 Haiku
Claude 3 Sonnet 3.7
Amazon Nova Pro
KB Reference Failure Rate
80% failure
15% failure
19% failure
Invocation Time
5.6 sec
6.8 sec
7.7 sec
Cost/Performance Notes
Fast, but unreliable KB referencing
Accurate, but higher cost
Best balance of speed, cost, and accuracy

The testing revealed that Claude 3 Haiku, the client's initial choice, failed to reference the KB correctly in 80% of cases.

While Sonnet 3.7 had better accuracy, Amazon Nova Pro offered optimal performance-to-cost ratio and superior consistency within Bedrock's ecosystem.

Model comparison results used for AI model optimization across Haiku, Sonnet, and Nova Pro.
Model Comparison Testing results showing KB reference failure rates for Haiku 3, Sonnet 3.7, and Amazon Nova Pro.

Post-Migration Issue Resolution

After migrating to Amazon Nova Pro , we identified and resolved several system-level issues:

Issue
Language Support
Contact Info Retrieval
KB File Conflicts
Out-of-Scope Handling
Description
Agent defaulted to German only
Failed to provide consultant details
Duplicate answers in S3 files
Agents generated irrelevant or invented answers
Resolution
Updated system prompt
Added structured fallback prompts
Logic updated to prefer most recent version
Improved fallback strategy
Status
Resolved
Resolved
Resolved
Resolved
Need to improve the accuracy and stability of your AI agent?

Need to improve the accuracy and stability of your AI agent?

Perfsys specializes in evaluation automation and AI model optimization on AWS. Contact us to discuss how we can help strengthen your AI workflows.

Contact Us
Chevron right

Results

Within six weeks, Perfsys successfully delivered a measurable improvement in AI performance and consistency through targeted AI model optimization.

Key Quantitative Outcomes

  • 100% of critical issues resolved (language, fallback, KB consistency, and hallucination handling)
  • Faithfulness score improved from 80% → 95%, ensuring nearly all answers are KB-based
  • Evaluation automation reduced manual QA time by 70%, validating 500+ test cases per iteration

Impact Summary

  • Valid answer consistency and reliability of KB usage significantly improved
  • Invocation latency remained stable (~7 seconds average)
  • Maintenance simplified through automated evaluation cycles

Conclusion & Next Steps

Through systematic testing, evaluation automation, and Bedrock-native AI model optimization, we helped the client transform a poorly performing AI assistant into a reliable, measurable, and scalable knowledge-based agent.

Next steps include:

  • Expanding multilingual testing (DE, EN, FR)
  • Integrating new agent personalities for domain-specific advisory roles
  • Deploying the evaluation pipeline to monitor new model updates automatically

FAQ

Eugene Orlovsky

Eugene Orlovsky

CEO & Founder | Serverless architect with 10+ years of hands-on experience designing cloud-native architectures on AWS, backed by multiple AWS certifications. He is writing bridges deep technical expertise with real-world business strategy, covering topics from AWS best practices to scaling tech-driven organizations.

Recommended for You

View All News
Chevron right

AWS Experts, On-Demand

Need to move fast? Our cloud team is ready to scale, secure, and optimize your systems. Get serverless expertise, 24/7 support, and seamless CI/CD pipelines when you need it most.