Risk APIs and the Platform Layer Emerging in Commercial Lines

7 October 2024 Astrid Holm

Personal lines insurance discovered, roughly between 2015 and 2020, that the companies best positioned to improve risk pricing were not necessarily those with the best actuarial models in-house — they were the companies with the best access to structured, real-time risk data. Telematics aggregators, weather data API providers, property characteristics data companies: the platform layer beneath personal lines underwriting models became a distinct and valuable category. Commercial lines are now replicating this structural pattern, roughly two years behind, and the platform companies that capture the commercial risk data layer will have network effects that individual carrier model development cannot replicate.

Why Commercial Lines Lagged

The lag is explainable, not accidental. Personal lines risk objects — individual vehicles, residential properties, individual lives — are more standardised and more numerous, which made building the data aggregation infrastructure more tractable and more commercially justifiable. Commercial lines risk objects are more heterogeneous: a €50M property policy for a chemical plant and a €50M property policy for a data centre have completely different risk profiles, different surveying requirements, and different data structures. Aggregating data across commercial risks at the level of granularity needed for ML feature engineering is genuinely harder.

The second issue is data access. Personal lines data flows through digital channels — telematics devices, IoT sensors, consumer apps — in a way that is relatively easy to access with carrier consent. Commercial risk data is embedded in broker submissions, engineering surveys, historical claims files, and inspection reports that are typically unstructured PDFs or paper documents managed in broker systems. Extracting structured features from these sources requires either carrier system integration (slow, expensive, carrier-specific) or document intelligence infrastructure (newer, faster, but still imperfect for complex risk documents).

The Three Data Layers That Are Crystallising

What we are watching in the commercial lines data market is three distinct platform layers beginning to form, each corresponding to a different data source and a different set of potential API customers.

Property risk data. Structured APIs for commercial property characteristics — building age, construction materials, occupancy type, proximity to flood zones, seismic exposure, fire protection class — aggregated from satellite imagery, building permits, and property registry sources. The commercial property insurers and reinsurers who can query this data at the point of quote generation, rather than after a manual survey, get a meaningful advantage in pricing speed and accuracy for the standard commercial property segment. Several European companies are building toward this; the one that achieves sufficient coverage of European commercial building stock first will have a genuinely defensible data position.

Fleet and telematics data. Commercial fleet insurance is the most advanced commercial lines segment for API data integration, largely because telematics hardware was deployed in commercial fleets (driven by insurance and fleet management cost pressure) before personal motor telematics became widespread. The risk APIs here are more mature — real-time vehicle location, driver behaviour scoring, maintenance compliance data. Zego's model, which we backed in 2021, is built around exactly this: using live fleet telematics to enable usage-based commercial motor pricing rather than annual flat-rate fleet cover. The platform effect in fleet data accrues to whoever holds the most coherent cross-fleet data set that can improve driver behaviour scoring at the individual and fleet level simultaneously.

Business operations risk data. This is the least mature and most interesting category. APIs that expose structured signals about a business's operational risk profile — cyber posture scores from network scanning APIs, supply chain dependency scores, employer practices risk indicators from HR platform integrations, environmental compliance scores from regulatory data feeds — are being built now. The potential customers are underwriters of cyber, management liability, and business interruption cover, where the risk factors are notoriously difficult to assess from a one-page broker submission. Cytora's platform is directly in this territory, building the structured data representation of commercial risk that underwriters need for ML-based assessment.

Platform Economics in Risk APIs

The commercial logic for building a risk data platform rather than building proprietary risk models for a single carrier is the same as the commercial logic for any B2B data business: you write the data acquisition and structuring cost once, you sell it to multiple carriers, and your data quality improves as you serve more customers (because feedback on model performance from carrier deployments provides signal for improving data quality and coverage). The marginal cost of serving additional carrier customers is substantially lower than the marginal cost of the first deployment.

The platform risk is the same as any B2B data business: if a large carrier decides to build equivalent data capabilities in-house, they represent both a lost customer and a competitor. The durability of the platform position depends on whether the data network effects — the improvement that comes from aggregating data across multiple carrier deployments — are genuinely stronger than what any single carrier can achieve with their own data. In commercial lines, where individual carriers see a small fraction of the total market's risk events, the aggregation advantage is substantial. A carrier seeing 3% of commercial property claims in a given market cannot train a model to the accuracy achievable by a platform that aggregates 60% of commercial property claims data across 12 carriers.

What Founders Building Here Need

The seed-stage commercial risk API companies that are best positioned are those that have already solved the carrier data-sharing problem for at least two tier-1 carriers in a single line of business. Getting that first carrier data-sharing agreement is the hardest problem — it requires trust, clear data governance documentation, and a commercial structure that gives the carrier confidence their proprietary loss data is not being shared with competitors in identifiable form.

We are not saying the model-development work is trivial — it is not. But in this category, distribution authority and data partnerships are the founding team's primary constraint, not technical architecture. The technical architecture for a commercial risk data API is well-understood; the carrier relationships that give it data value are the non-replicable component. Founders who recognise this and invest accordingly — who hire carrier relationship builders and data licensing specialists early, not just ML engineers — tend to make faster progress than those who build technically first and negotiate data access second.