Inside Vwala’s Savings Engine

Building in Public #008: On onions, Swiss cheese, and the layers that create moat

Mar 12, 2026

At Vwala, the goal is simple: help families save money.

We started with a focus on energy. It was a good wedge for multiple reasons. But our mission is broader: help families save money across the board, not just on one bill. To do that, we need a much deeper understanding of their financial lives. As we prepare for that next launch, we’re building the systems that make it possible.

Getting access to bank transaction data is necessary, but it’s not the hard part. The hard part is turning noisy, inconsistent transaction data into actual understanding, and then building the layers on top that can reason over it and turn it into savings.

That’s what this post is about. It’s the first in a series on the architecture of what I currently call Vwala’s Savings Engine (I’m still hoping to come up with a better name). In this first piece, I’ll walk through the four layers and why they exist.

Future posts will go deeper on each layer individually.

Let’s get into it!

🧅 Onions

The Savings Engine is structured as four stages. I think of them like onion layers: each stage adds meaning on top of the one before it, and each deeper layer depends on the integrity and output of the previous one.

┌─────────────────────────────────────────────────────────────────┐
│   STAGE 1: INGEST                                               |
│   Ingest raw bank data into our database                        |
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│   STAGE 2: ENRICH                                               │
│   Understand what each transaction actually means               │
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│   STAGE 3: ANALYZE                                              │
│   Reason over the enriched data to build a financial picture    │
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│   STAGE 4: ACT                                                  │
│   Turn insights into real savings.                              │
└─────────────────────────────────────────────────────────────────┘

Two design principles matter here.

First, dependencies only flow one way. Analyze can’t reason over data that hasn’t been enriched. Act can’t create savings without analysis.
The layers should stay decoupled. Enrichment should not care where raw data came from. Ingestion should not care how enrichment works. Each layer has its own constraints, failure modes, and system design, and we want those concerns mostly isolated.

📥 Ingest

Ingest is the entry point. Its job is simple: get raw bank data into our system reliably and securely.

The Second Payment Services Directive (PSD2) is a European Union regulation that mandates banks to open their data to third-party providers (TPPs). This directive is powerful leverage because it forces banks to make account data accessible, with user consent, to regulated third parties.

But, to access the data you need to be a regulated Account Information Service Provider (AISP), which brings a lot of operational and compliance overhead. So we use an aggregator. We partnered with Tink, owned by Visa, which gives us access to thousands of banks across Europe out of the box.

From a systems perspective, there are two sync modes:

First-time sync: when a user connects a bank, we pull up to 12 months of historical transactions, often hundreds or thousands at once.
Continuous sync: every 4 hours, we pull new transactions incrementally.

Both flows publish domain events that trigger the next stage, enrichment, without tightly coupling the two layers.

There’s no real moat here. This is mostly an integration problem. But it still needs solid engineering:

Background processing
Idempotent upserts because the same data could get synced repeatedly
Retries, backoff, and recovery when bank connectors fail
Consent lifecycle management, including expiry, revocation, …

✨ Enrich

If Ingest is the foundation, Enrich is where the system starts becoming interesting, and where the quality of everything downstream is largely determined.

Here’s the problem in concrete terms. This is what a raw bank transaction might look like when it lands in our database:

description:  "Purchase - Total Nb000624 Turnhou - 2300 Turnhout Be - 04/10/25 16:54 - Kaart XXXX XXXX XXXX XXXX - Hellemans Nick"
counterParty: TOTAL NB000624 TURNHOU
amount:       -67.43
date:         2026-01-14
direction:    DEBIT

And this is what we want it to look like after enrichment:

p2p:          false                                  (confidence=1)
merchant:     TotalEnergies                          (confidence=0.96)
category:     transport.car_and_fuel.fuel            (confidence=0.98)
recurring:    false                                  (confidence=1)

That transformation, from a cryptic bank string to semantic meaning, is what makes everything downstream possible. Without it, you can tell a user they spent €67.43 on January 14. With it, you can tell them they spent €150 on fuel last month, and that this is 18% above their usual baseline.

That’s the difference between data access and actual product intelligence.

To get there, we run three core classifiers on every transaction:

Peer-to-peer (P2P) detection: is this an internal transfer between your own accounts, or a real external payment?
Merchant detection: who did you actually pay?
Categorization: what kind of transaction is this?

This is the most complex stage in the system, especially from a data and modeling perspective. It’s also a classic Pareto domain: with relatively modest effort, you can cover a large share of real-world cases, but the last 20% gets disproportionately hard.

The Swiss cheese model

Each classifier runs through a waterfall of strategies, also called the Swiss cheese model: no single strategy catches everything, but if you stack enough layers together, very little slips through.

Strategy 1                          ████░████████░██████
Strategy 2                          ██████░██████████░██
Strategy 3                          █████████░██████████
Strategy 4                          ████████████████████
                                    ────────────────────
Combined coverage                   ████████████████████

The order matters. Cheap and deterministic first; expensive and probabilistic last.

    Transaction
         │
         ▼
┌──────────────────┐
│ Deterministic    │  Fast, cheapest
│ (NER, rules, ...)│  Known merchants (aliases), IBAN matching
└────────┬─────────┘
         │ low confidence?
         ▼
┌──────────────────┐
│ Small LLM        │  Still cheap
└────────┬─────────┘  Ambiguous descriptions, unknown merchants
         │
         │ still low confidence?
         ▼
┌──────────────────┐
│ Frontier LLM     │  Expensive
│ (web grounding)  │  Maximum capability for maximum difficulty
└──────────────────┘

A known merchant alias should never need to hit an LLM. The LLM tier exists for genuinely ambiguous cases.

That’s why every strategy returns a confidence score. We define escalation thresholds in the pipeline, and only move to the next strategy when the current one is not confident enough.

The Flywheel

We want the enrichment pipeline to get cheaper, faster, and more accurate over time.

In the beginning, you start relatively cold. So the system needs a deliberate learning loop:

Start with a small hand-labeled golden set. A relatively small set of high-confidence merchant and category labels already covers a surprising amount of everyday spend: supermarkets, utilities, insurance, telecom, entertainment, and so on.
Use a high-quality oracle downstream as your downstream strategy. Early on, cost matters less than correctness. Frontier models, web grounding, and other expensive strategies are acceptable if they give us strong labels on hard cases.
Use that oracle to label a much larger transaction set

                LLM calls per transaction
               ▲
          1.0  │ ██
               │ ██ ██
               │ ██ ██ ██
               │ ██ ██ ██ ▓▓
               │ ██ ██ ██ ▓▓ ░░
               │ ██ ██ ██ ▓▓ ░░ ░░ ░░ ░░
               └──────────────────────────→ Month
                 1  2  3  4  5  6  7  8

Over time, expensive inference should happen less and less often. As the deterministic layer learns more merchants, aliases, and patterns, fewer transactions need to escalate to the LLM tier at all.

🔍 Analyze

Analyze is where the data starts to speak.

The job here is to reason over the enriched transaction stream and turn it into a financial picture: recurring expenses, subscriptions, income patterns, spending shifts, category trends, and eventually household-level benchmarks.

None of this works on raw bank strings. Even something as seemingly simple as detecting a recurring subscription depends on prior enrichment: you need merchant resolution, category context, and enough consistency across time and amount to distinguish a real subscription from noise. Garbage in, garbage out.

We’re also building something we call “families like me”: a way to compare your spending and budget patterns against households with similar profiles. Not against generic averages, but against peers whose context actually resembles your life.

⚡ Act

Act is where the loop closes. It’s also where everything ties back to our North Star Metric: money saved for our users.

Insight without action is just interesting analytics. The point of this layer is to turn understanding into changed behavior and real savings. That can happen in a few different ways:

Notifications when subscriptions are about to renew, so you can cancel before the charge hits
Education: a lot of financial well-being is also a knowledge problem. Weekly quizzes, explainers, and similar tools are easy to gamify and can be surprisingly effective.
Fixed cost detection: if you’ve been with the same energy provider for two years, the odds are good that switching would save you money. We want to make that obvious and the path to acting on it frictionless
Leaky buckets: small, recurring costs that quietly accumulate and that users often don’t even realize they’re paying
And many, many more!

This is the layer users actually feel. If we do the previous layers well, this is where that work turns into euros saved.

That’s the overview. Four layers, one direction of data flow, and a system designed so each layer can evolve independently. Future posts will go deeper on each layer individually.

But the more important point is this: access to raw bank data is table stakes. The defensibility starts when you can turn messy transactions into structured understanding, and it compounds in everything you build on top of that: analysis, interventions, and ultimately money saved.

PS: We’re looking for a Founding Engineer to join us. If these are the kinds of problems you want to be in the middle of, reach out. I’d love to talk.

Nick’s Substack

Discussion about this post

Ready for more?