Build

Build is an AI company for the built world. We accelerate the projects of our partners by building AI workers that can run complex institutional workflows. Underpinning all of it is the same requirement: high-quality structured geospatial data, covering the right constraints, in the right geography, kept current and constantly expanding.

That dependency is easy to state and hard to satisfy. The world's geospatial data is fragmented across hundreds of government portals, environmental agencies, and infrastructure operators, each publishing in different formats, at different cadences, with different authentication requirements. Getting it right, at scale, across multiple markets and use cases, is not a data problem. It's an engineering problem.

This piece is a deep dive into our journey with data and how we built a system to support our continually growing offering.

The World's Geospatial Data Is a Mess

The quality of any development workflow – whether you're screening a data centre site, modelling flood exposure across a residential portfolio, running regulatory analysis on an industrial land bank, or assessing grid capacity for an energy asset – is only as good as the data behind it.

The world's geospatial information is published across hundreds of government portals, environmental agencies, energy regulators, and infrastructure operators. Each publishes in its own format: ArcGIS Feature Services, OGC Web Feature Services, EU INSPIRE directive services, OpenStreetMap Overpass queries. Each speaks a subtly different dialect of spatial data. A coordinate in Denmark might live in a different projection system to the one Portugal uses. Texas grid operators return data in proprietary schemas. German environmental data is governed at the federal state level – Bavaria and Brandenburg are effectively separate integration problems. And that's before you account for rate limits, authentication schemes, intermittent endpoint failures, and the silent killer of all geospatial work: the empty feature collection that looks like success but isn't.

When your query and the server's data are in different coordinate reference systems, you get zero results back – no exception, no error flag, just silence that passes downstream as a clean result. In a manual pipeline, this goes unnoticed until someone in a client meeting asks why the flood data for Lower Saxony is blank.

The traditional approach to all of this is manual. A GIS analyst spends weeks hunting for the right government portal, resolving coordinate system conflicts, and checking whether the data actually covers the area they care about. Then they do it again for the next country. At the pace we're expanding – new geographies, new asset classes, new constraint layers for new client workflows – that approach makes the team's bandwidth the ceiling on growth. We built a different way.

From One Country to Fifteen

A year ago, expanding to a new country meant a researcher spending days sourcing the right endpoints, a developer spending days integrating them, and no systematic guarantee that any of it was still working a month later. Every new market was a ground-up exercise.

Today we cover 15+ countries across the US and Europe. Several of those go deeper than the national level: individual US energy markets, Germany's federal states, UK regional planning authorities – each with their own data sources, projection systems, and publishing conventions. Adding a new country or sub-national geography no longer takes weeks. It takes hours, running in the background, on the same scaffolding we've extended region by region since we started.

The Geospatial Factory is what made that shift possible.

The Geospatial Factory

The factory is an automated pipeline that discovers, validates, and deploys geospatial data sources across every country and sub-national geography we operate in. It's built on three layers.

Discovery. Every data source in our codebase is annotated with structured metadata directly in code. A registry scanner walks the entire codebase at startup, discovers every datasource, layer, and tool, and registers it in a central database – no configuration files, no manual registration. If the function exists, it's in the system. The gap between writing a new data source and it being discoverable, validated, and ready to deploy collapses to near zero.

Validation. Each of our 700 tools carries a validate test feature: a real-world feature we've confirmed actually contains data for that region. A substation in Fingrid's grid database. A Natura 2000 boundary in Portugal. A flood zone polygon in the Ruhr Valley. When we run validation, every tool is executed against its specific feature and checked for valid geometry. This catches CRS mismatches, schema changes, endpoint failures, and empty collection errors – every class of silent failure that passes unnoticed through manual pipelines. If it's not validated against a real feature in that exact geography, it doesn't reach production.

Deployment. Each new country or sub-national geography passes through structured phases: endpoint discovery, coordinate system normalisation, feature validation, and production registration. What previously took a human analyst a week is now a pipeline run.

The Agent Architecture

Eight agents run the factory:

Agent	Role
Tool Research Agent	Given a set of starting endpoints, probes and tests data sources, conducts further research to find gaps, validates against a base checklist, saves findings to a Linear project
Verification Agent	Runs structured cross-checks against each source before it goes near production: authority, coverage accuracy, data currency, and known failure modes
Tool Building Agent	Passes verified sources through the Data Source & Tool Factory, builds functional tools, and defines the expected features used for ongoing health checks
Layer Building Agent	Handles sources without an API or WFS – applies a priority system, converts GeoPackage files to GeoParquet, and stores on S3 for querying
Decision Agent (currently human)	Aggregates all pipeline signals, weighs source quality and consistency, and issues a pass/fail determination before any source is registered in production
Deployment Agent	Manages the pipeline from Decision Agent approval through to production registration: endpoint discovery, CRS normalisation, feature validation, and deployment
Layer Health Agent	Runs daily image rendering tests against each layer, monitors response times, and on failure uploads the last known good state to S3
Tool Health Agent	Runs daily validation against each tool's expected feature. Success requires no action. Failure creates a Linear ticket, triggers a Claude review, and escalates to human review if unresolved
Dougie Agent	Once a source is in production, Dougie routes validated tools and layers into the client-facing workflows where they're actually used, bridging the factory to the product

Each agent owns a distinct job. The Dougie Agent is the last mile: the point at which data infrastructure becomes product intelligence, and a validated source becomes part of a live workflow for a client.

What Automation Isn't Replacing

The factory handles discovery and validation. It doesn't replace judgment.

Before any source touches a client deliverable, it passes through human review. Our team checks that the endpoint is authoritative, that the data is current, and that it covers what it claims to cover. The Decision Agent (currently a human) aggregates all pipeline signals and issues the final pass/fail/uncertain judgement before the deployment agent moves.

So whilst automation raises the ceiling on how much we can discover, monitor, and maintain. Human review is currently what makes any of it trustworthy.

The Compounding Effect

Each country we add makes every other country more valuable. The CRS handling we built for Portugal transfers immediately to Spain. The ArcGIS fetcher that handles US federal data handles Australian state government services. The Danish WFS client – with its axis-order quirk – became a template for every Scandinavian government service we've integrated since.

Shared tooling. Shared validation patterns. Shared infrastructure. The marginal cost of the seventeenth country is a fraction of the first, and the coverage compounds in all directions: more geographies, more asset classes, more constraint domains – all built on the same validated foundation.

What This Changes for Our Clients

The workflows we run are only as credible as the data that powers them. When a client's team runs a site screening in Finland, a flood exposure analysis across a UK residential portfolio, or a regulatory constraint check on an industrial land bank in Germany, the answer they get back carries an implicit claim: that the underlying data was right.

That claim used to rest on assumption: that the government portal was still live, that the coordinate system hadn't shifted, that the endpoint returning an empty result actually meant there was nothing there. Assembling reliable data for any given analysis was itself weeks of work, and the confidence in the result reflected that uncertainty.

What the Geospatial Factory changes is where that confidence comes from. The constraint layers are already there before the question is asked – validated against a real feature in that exact geography, tested again that morning. When our workflows touch flood risk data in the Ruhr Valley, substation proximity in Texas, or planning zone classifications in Greater London, those sources have passed through the same pipeline: discovered, verified, approved, deployed, and checked daily.

Speed is part of it. But what our clients rely on is the assurance that the data behind the answer was confirmed, not assumed.

Build automates the development workflows that institutional real estate runs on – across planning, environmental, utilities, energy, and regulatory domains, across every country our clients operate in. To learn more, reach out to your Build contact or contact our team. If you're interested in working on the problems described here, check out our open roles.

Using Agents to Scale Build's Geospatial Data Sources

The World's Geospatial Data Is a Mess

From One Country to Fifteen

The Geospatial Factory

The Agent Architecture

What Automation Isn't Replacing

The Compounding Effect

What This Changes for Our Clients

Build's (Top Secret) Strategy is Very Simple

Build the
Extraordinary.

Offering

Use Cases

Company

Resources