Fraud Signals at Scale: Lessons from Nordic Banking

May 17, 2021 Astrid Holm

Before I joined Morildsen, I spent several years building the fraud detection infrastructure for a Nordic banking group — a system processing roughly €2B in daily transaction volume across retail, corporate, and payment card channels. The scale forced clarity about which approaches actually work under adversarial conditions and which ones look good in research papers but fail in production.

When I look at insurance fraud detection technology now, I see the industry at a stage the banking sector passed through roughly a decade ago. The specific lessons are transferable. The specific gaps are predictable. And the companies that will win are already making the right architectural choices — whether they realize it or not.

The Latency Problem Is Different in Insurance

In card fraud detection, the latency constraint is brutal: you have 150–300 milliseconds to score a transaction before the authorization decision times out. Everything about the system design — feature store architecture, model serving infrastructure, fallback rules for system degradation — is organized around that constraint. You cannot afford a model that is 2% more accurate but adds 200ms of inference time.

Insurance operates on fundamentally different latency curves. A motor claim may take 2–3 weeks from FNOL to payment. Property claims routinely run longer. The adversarial pressure is different: organized fraud rings have time to craft convincing claim narratives, stage supporting documentation, and coach multiple participants in a coordinated story. The model is not responding in milliseconds; it is reasoning about a claims file that the fraudster has had days to prepare.

This changes what signal matters. In card fraud, real-time behavioral signals — velocity, geographic anomaly, device fingerprint — dominate. In insurance, the temporal structure of the claim matters: when certain documentation is submitted relative to the FNOL date, how quickly the policyholder escalated from reported to formal claim, the sequence of contacts with the carrier. These are slower signals, but they are the ones that separate opportunistic exaggeration from organized fraud.

What Insurance Data Pipelines Are Still Missing

The banking fraud infrastructure I worked with had one significant advantage over comparable insurance systems: the underlying data was already digital, already structured, and already arriving in a format amenable to ML processing. Card transactions are timestamped, geocoded, amount-denominated, merchant-categorized events. The feature engineering challenge is substantial, but the raw data quality is high.

Insurance claims data is a different landscape. A meaningful fraction of claims data in European carriers still arrives via PDF, email, and paper. The claimant-provided narrative — the loss description — is unstructured text in multiple languages, with varying conventions across brokers and direct-sales channels. Supporting documentation (repair estimates, medical reports, police records) arrives as scanned images with variable quality. The ML system is not scoring clean structured data; it is reasoning about a heterogeneous document set.

The companies that are addressing this at the data pipeline layer — building extraction, standardization, and normalization infrastructure before the fraud scoring layer — are solving the real problem. The ones that have built sophisticated scoring models on top of poorly structured input data have created a system that will degrade unpredictably as input quality varies across carriers and product lines.

The Label Contamination Problem at Depth

In banking fraud detection, ground truth is expensive but achievable. You can contact the cardholder. The transaction either was or was not authorized. Chargeback resolution provides confirmed fraud labels, imperfect but usable. The label quality problem is manageable at scale.

Insurance fraud labels are structurally worse. A claim is marked as confirmed fraud only when the carrier pursues formal investigation and succeeds in establishing fraud — a combination that requires sufficient expected recovery to justify legal cost, strong enough evidence to prevail, and willingness to absorb the reputational friction of challenging a claimant. Carriers routinely settle claims they suspect are inflated because the investigation cost exceeds the recovery value. Those settled claims enter the training data labeled as legitimate.

The practical consequence is that standard supervised learning approaches understate fraud rates in training and consequently learn a conservative decision boundary. The model is calibrated to a universe where fraud is rarer than it actually is.

The approaches that handle this correctly use carrier-specific calibration sessions — working with claims managers to identify a subset of historical claims where the team has high confidence in the ground truth label, even if that subset is small. You start with a high-quality-but-small labeled set and use semi-supervised methods to extend signal to the larger unlabeled corpus. This is slower and more expensive than training on whatever data you have. It is also the difference between a system that works in production and one that performs impressively on historical test sets.

Scale Changes What You Can Learn

The most important lesson from banking fraud detection at scale is one that sounds obvious but has real implications for how insurtech companies should think about data partnerships: the signal you cannot see at the carrier level becomes visible at the portfolio level.

A single mid-size carrier processing 50,000 claims per year may see a specific fraud pattern — organized whiplash claims in a particular regional cluster, coordinated property claims linked to a specific contractor network — appearing in 30–40 claims per year. That is too sparse to train a reliable detection model from that carrier's data alone. The same pattern appearing across six carriers in the same geography becomes a statistically robust training signal.

The insurance industry has historically been poor at data collaboration precisely because carriers view claims experience as proprietary competitive data. The companies building privacy-preserving consortium architectures — federated learning approaches or tokenized claim pattern sharing — are addressing a genuine structural gap that the banking sector addressed through industry data utilities.

We are not saying the consortium approach is technically straightforward. The regulatory requirements around policyholder data sharing in GDPR jurisdictions add real complexity. But the insurtech teams who have figured out a compliant mechanism for cross-carrier pattern sharing have built something worth significantly more than their technology alone.

The Operations Layer Matters as Much as the Model

The final lesson from banking fraud infrastructure: the model is a small fraction of the value. The operations layer — how alerts are routed to investigators, how case management workflows are structured, how feedback from investigations is captured back into training data — determines whether the model's performance translates into actual leakage reduction.

A carrier that deploys a sophisticated fraud model but routes its output into an investigation team that processes alerts on a 3-week lag, without structured feedback capture, is not extracting the value the model is generating. The feedback loop is broken. The model cannot improve from production experience.

The insurtech teams building end-to-end systems — model plus alert management plus investigator workflow plus outcome capture — are building something more durable than the ones who have optimized only the model component. That is the architecture we look for.