Week 1: Architecture of an AI Financial Platform
- carocsteads
- Mar 3
- 4 min read
What happens when AI agents handle real financial workflows?
I've been working on FinBot CTF — an AI-powered financial platform built for the OWASP Agentic AI project. The goal is to explore what happens when you give AI agents real financial responsibilities, such as onboarding vendors, processing invoices, flagging fraud, and authorizing payments.
But before I write about how I test it, I need to explain what I'm actually testing. Because the architecture is what makes this hard.
What does FinBot do?
FinBot is a vendor management portal. Vendors log in, submit invoices, communicate with an AI assistant, and get paid. Behind the scenes, AI agents handle the workflow — deciding whether to approve a vendor, whether an invoice looks legitimate, and whether a payment should be flagged.
The CTF part means players try to manipulate the AI agents into doing things they shouldn't. Get an invoice approved that should be rejected. Convince the onboarding agent to bypass compliance checks. The system detects these attempts and awards points.
That dual purpose — real financial workflows and a security challenge layer — is what shapes every architectural decision.
The layers
Frontend: two portals
There are two web interfaces built with plain HTML, JavaScript, and CSS served through FastAPI:
Vendor Portal — where vendors interact with the AI assistant, submit invoices, and check payment status
Admin Portal — where administrators monitor the system, review vendor applications, and manage configuration
Both portals communicate with their respective backends via API calls and maintain real-time updates through WebSockets.
Backend: FastAPI
The backend is Python and FastAPI. FastAPI handles routing, authentication middleware, session management, and WebSocket connections. SQLAlchemy sits on top of the database layer — PostgreSQL in production, SQLite in development and testing.
The split between PostgreSQL and SQLite matters for testing. I'll cover it in detail next week, but the short version is: SQLite's in-memory mode makes fast, isolated unit tests possible. Every test gets a clean database with zero setup time.
The AI agent layer
This is where it gets interesting. There are five specialized agents:
Agent | Responsibility |
Onboarding Agent | Reviews new vendor applications, validates documents, and makes approval decisions |
Invoice Agent | Processes invoice submissions, validates amounts, and checks against policy |
Payments Agent | Handles payment authorization and transaction queries |
Fraud Agent | Monitors for anomalies, flags suspicious patterns, and runs compliance checks |
Communication Agent | Sends notifications, generates status updates, handles vendor inquiries |
Each agent is backed by an LLM. But agents don't talk to LLMs directly — they go through an LLM integration layer made up of five client classes. I'll dedicate an entire post to those clients. For now: one routes requests to the right provider, one wraps any client with session identity and observability, and one is a deterministic fake used in tests.
The data layer
Five databases back the system:
Vendors — profiles, approval status, risk scores
Invoices — submissions, amounts, approval history
Agent Memory — conversation context, learned preferences
Config — feature flags, policy thresholds, system settings
Emails — templates, sent messages, notification logs
The piece most people get wrong: Redis Streams
Every time an agent makes a decision, a tool call happens, or a user submits something, an event is emitted to Redis Streams.
Most people hear "Redis" and think cache. Redis Streams is different — it's a persistent, ordered log of messages, similar to Apache Kafka but lightweight and built into Redis. Each event is written to a stream, consumed by a processor group, and stored as a CTFEvent record in the database.
This is what makes the CTF layer possible. The event processor runs as a background task, reading from two streams:
finbot:events: agents — everything the AI agents do (tool calls, LLM requests, decisions)
finbot:events: business — everything that happens to the data (invoice submitted, vendor approved, payment processed)
The processor checks each event against challenge detectors and badge evaluators. If a detector fires — say, an agent was manipulated into bypassing an invoice threshold — the challenge is marked complete, and the player gets points. All of this happens asynchronously, without blocking the main request.
The WebSocket layer then pushes the update to the player's browser in real time.
Here's what that pipeline looks like:
User action
→ FastAPI route
→ Agent processes request
→ Event emitted to Redis Stream
→ CTF Processor (background task)
→ Challenge detector fires
→ WebSocket push to browser
Everything in that chain is testable independently. That's by design.
Why Docker
The full stack — FastAPI app, PostgreSQL, Redis — runs in Docker Compose. This means:
Every developer runs the same environment
CI runs the same environment
The test database is isolated from the dev database
For testing, Docker lets me spin up a real PostgreSQL instance in CI without external dependencies. But for unit tests, I skip Docker entirely and use SQLite in-memory — it's faster and self-contained.
The full tech stack
Layer | Technology |
Backend framework | Python · FastAPI |
ORM | SQLAlchemy |
Production database | PostgreSQL |
Development / test database | SQLite |
Event streaming | Redis Streams |
Real-time updates | WebSockets |
AI providers | Ollama (local) · OpenAI |
Containerization | Docker Compose |
Why is this architecture hard to test?
Five AI agents. Five LLM clients. An event-driven pipeline. Async background tasks. Multi-database backends. WebSocket connections.
Every layer has its own failure modes:
An agent can call the wrong tool
An LLM client can mutate the caller's request
An event can be emitted with PII in the payload
A detector can fire on the wrong event type
A WebSocket push can carry stale data
None of these failures crashes the application visibly. They manifest as incorrect behavior — an approved invoice that should be rejected, a session signature that doesn't match, a Redis event that contains the full conversation history.
That's what my test strategy is designed to catch.
Next week: How I approach testing this system — and the one rule that changed how I write every test.
FinBot CTF is an open-source project under OWASP Agentic AI. The codebase is on GitHub.
Comments