Week 3: Testing the Challenge Detection Layer

carocsteads
Mar 27
3 min read

Most test suites verify that a system does the right thing when inputs are normal. FinBot adds a harder requirement: verify that the system notices when something wrong is happening, even when the wrong thing looks normal on the surface.

That is what the CTF detector layer does. And testing it requires a different way of thinking.

What Detectors Do

Every event that flows through Redis Streams gets evaluated against a set of detectors. Each detector answers one question: did this event — combined with the session history — indicate that an AI agent was manipulated?

Examples of what a detector might look for:

- An invoice was approved that exceeded the policy threshold and the approval came from an AI agent, not a human

- A vendor onboarding completed without the standard document validation step

- A payment was authorized within seconds of a new vendor being added — a timing pattern consistent with social engineering

- An agent called a tool it has never called before in this session type, after receiving an unusually long message

None of these are binary. Each one is a pattern. The detector must evaluate context, not just the event in isolation.

What Makes This Hard to Test

The challenge with testing detectors is that the failure modes are asymmetric:

A false negative means an attack goes undetected. The CTF player completes a challenge that should have been caught. The system looks like it is working because no error occurred — but it missed the point entirely.

A false positive means a legitimate action gets flagged. A vendor submits a normal invoice, the detector fires, and the player gets points for something they did not do. The scoring system is now unreliable.

Both failures are silent. Neither crashes anything.

The tests for the detector layer cover three categories:

First, detectors fire on the right events.

For each detector, there is a test that constructs exactly the event sequence that should trigger it — the right event type, the right data shape, the right session context — and asserts the detector returns a match.

Second, detectors do not fire on similar but different events.

An invoice approval detector should fire when an AI approves an oversized invoice. It should not fire when a human approves the same invoice, or when an AI approves a correctly-sized one. These negative cases are as important as the positive ones.

Third, detectors handle missing or malformed event data without crashing.

Events arriving from Redis may be missing fields, have unexpected types, or carry data from a different schema version. A detector that crashes on a malformed event takes down the entire scoring pipeline for every player.

Why the Order of Events Matters

Several detectors are stateful — they look at a sequence of events, not a single event. Testing these requires constructing an event history in the right order and asserting the detector only fires when the sequence is complete.

This is the kind of test that almost never gets written because it requires understanding the attack pattern well enough to reproduce it in a controlled way. Writing the test forces you to articulate exactly what the attack looks like — which is, by itself, useful.

The Coverage Metric That Matters Here

For detector tests, line coverage is the wrong metric. A detector can have 100 percent line coverage and still miss every real attack if the test inputs are too similar to each other.

The metric that matters is: how many distinct attack patterns are covered? How many edge cases around each trigger condition? How many negative cases confirm the detector does not over-fire?

That framing — coverage of the attack surface, not coverage of the code — is what shapes the test suite for this layer.

Next: Testing the tool layer — the functions that AI agents actually call when they make a decision.