Workflows

Site Selection Data: What to Pull, Where to Get It, and How to Use It

Site selection decisions fail when critical data layers are incomplete or assembled too slowly. This guide covers the six data categories every development team needs -- power availability, zoning, environmental risk, transportation, market demand, and parcel ownership -- with specific sources and flags for each. Includes how AI-assisted workflows are changing the assembly process.

by Build Team April 1, 2026 5 min read

Site Selection Data: What to Pull, Where to Get It, and How to Use It

The six data layers behind every development decision -- and how modern teams are assembling them faster.

The Data Problem in Site Selection

Site selection fails when teams work from incomplete data, stale data, or data layers that weren't assembled to talk to each other. A site that clears the power screen but fails on environmental review. A parcel that's correctly zoned but sits in a transmission-constrained subregion. A market with strong absorption but a labor pool that can't support the tenant's headcount.

Every one of these failures is a data failure -- not a judgment failure.

The good news: almost all of the critical data exists, and most of it is publicly available. The constraint has been assembly speed and integration. AI is changing both.

The Six Data Layers That Drive Site Selection

Layer 1: Power Availability

For energy-intensive development -- data centers, manufacturing, EV logistics, cold storage -- power is often the primary screen.

What to pull:

  • EIA Form 861 (annual utility sales and customer data by utility)

  • FERC OATT filings for available transfer capacity on the relevant transmission segments

  • RTO/ISO daily capacity reports (PJM, MISO, CAISO, ERCOT, NYISO, SPP)

  • Individual utility integrated resource plans (IRPs)

  • HIFLD (Homeland Infrastructure Foundation-Level Data) transmission line and substation dataset

What to look for: Available generation capacity within the relevant load zone, spare transformer capacity at substations within 2-5 miles, distance to the nearest 138kV or 230kV transmission line, and whether the utility's IRP reflects load growth that limits capacity for large new customers.

Many utilities publish their large customer interconnection process and queue timelines publicly. Dominion Energy, Duke and ERCOT's data portal are relatively accessible. Others require direct utility engagement to get accurate numbers.

Layer 2: Zoning and Entitlement

Zoning is the most fragmented data layer -- managed at the county or municipal level, inconsistently digitized, and updated without centralized notification.

What to pull:

  • Municipal and county GIS portals (most major jurisdictions publish parcel-level zoning)

  • State GIS clearinghouse data (available through most state data portals)

  • PLSS (Public Land Survey System) data for rural sites

  • Historic use and brownfield designations (EPA ACRES database)

  • Overlay zone restrictions: floodplain, wetlands, airport approach zones, agricultural protection

What to watch: Many jurisdictions have interim zoning freezes, overlay amendments, or pending rezoning petitions that don't appear in the standard GIS layer. Development teams routinely screen a site as "correctly zoned" and discover 90 days later that a rezoning petition is pending or the parcel is subject to a development agreement from a prior transaction. There is no substitute for a direct call to the planning department before advancing a site.

Layer 3: Environmental Risk

Environmental risk screening catches deal-killers before significant capital is spent on formal due diligence.

What to pull:

  • EPA EnviroFacts (Superfund, RCRA, CERCLIS, ECHO databases)

  • National Wetlands Inventory (NWI) -- U.S. Fish and Wildlife Service

  • FEMA National Flood Insurance Program maps (NFIP flood zone layer)

  • NLCD (National Land Cover Database) for current land cover and impervious surface

  • USFWS IPaC (Information for Planning and Consultation) for endangered species and critical habitat

  • State voluntary cleanup program registries

What to watch: FEMA flood maps are notoriously out of date in many jurisdictions. A parcel that reads Zone X (minimal flood hazard) in the official NFIP map may have experienced repeated flooding. Supplementing FEMA data with NOAA storm track data and USGS stream gauge records is standard practice for flood-sensitive sites.

Layer 4: Transportation and Labor Market

For industrial, logistics and manufacturing development, proximity to interstate access, port access and labor market depth are often determinative.

What to pull:

  • FHWA Highway Performance Monitoring System (HPMS) for traffic counts and road classification

  • BLS Quarterly Census of Employment and Wages (QCEW) by county -- labor force by occupation

  • Census LEHD (Longitudinal Employer-Household Dynamics) for commute patterns

  • Port throughput data (Army Corps of Engineers Waterborne Commerce Statistics)

  • Rail network data (AAR, individual Class I railroad GIS layers)

What to watch: BLS labor market data lags by 12-18 months. For sites in tightening labor markets -- logistics corridors, EV manufacturing clusters -- current wage surveys from local economic development organizations are more reliable for underwriting labor cost assumptions.

Layer 5: Market Demand

Demand data calibrates whether there's a tenant or buyer for what you're proposing to build.

What to pull:

  • CBRE, JLL and Cushman quarterly market reports (publicly available for major markets)

  • Real Capital Analytics (MSCI) for transaction volume and pricing benchmarks

  • Asset-class specific research: CBRE Data Center Solutions reports for data centers, Prologis Research for industrial and logistics

What to watch: Broker reports aggregate market-level data. Development decisions often require submarket-level analysis that broker reports don't provide at sufficient resolution. AI-assisted synthesis from multiple sources tends to outperform relying on a single broker report for granular submarket analysis -- particularly in emerging industrial corridors or secondary data center markets where coverage is thinner.

Layer 6: Parcel and Ownership Data

Identifying who owns a site and whether they're likely to transact is the bridge from screening to acquisition.

What to pull:

  • County assessor records (most now available via public API or web scraping)

  • Regrid, LightBox or similar parcel aggregation platforms for clean national coverage

  • Secretary of state entity records for LLC ownership chains

  • UCC filings and title plant data for encumbrance screening

What to watch: LLC anonymity layers make true ownership verification slow. For large acquisitions, a title search is required regardless of what assessor records show. The assessor record tells you who paid taxes last year. The title plant tells you who can actually sign a deed.

How AI Changes the Assembly Process

The traditional approach: each data layer gets pulled by a different team member, formatted into a different spreadsheet, and manually cross-referenced. Two analysts, two weeks, for a 50-parcel screen.

Modern AI-assisted workflows run these layers in parallel. An agent ingests parcel boundaries, queries each relevant data source, scores each site against a defined criteria set, and returns a ranked shortlist with flagged issues. The analyst reviews the output rather than assembling it.

The real value isn't speed alone -- it's consistency. Manual screens miss things. Power availability gets skipped for sites where the team has a strong relationship with the local economic development office. Environmental flags get rationalized. AI-driven screens apply the same criteria to every parcel.

Where human judgment stays critical: Interpreting utility capacity signals (utilities often understate available capacity in formal queues), reading local political dynamics and deciding whether a flagged risk is a dealbreaker or manageable. The sites development teams are winning are the ones where the data was assembled faster and acted on before competing buyers knew the site was available.