Site Selection Data: What to Pull, Where to Get It, and How to Use It
The six data layers behind every development decision -- and how modern teams are assembling them faster.
The Data Problem in Site Selection
Site selection fails when teams work from incomplete data, stale data, or data layers that weren't assembled to talk to each other. A site that clears the power screen but fails on environmental review. A parcel that's correctly zoned but sits in a transmission-constrained subregion. A market with strong absorption but a labor pool that can't support the tenant's headcount.
Every one of these failures is a data failure -- not a judgment failure.
The good news: almost all of the critical data exists, and most of it is publicly available. The constraint has been assembly speed and integration. AI is changing both.
The Six Data Layers That Drive Site Selection
Layer 1: Power Availability
For energy-intensive development -- data centers, manufacturing, EV logistics, cold storage -- power is often the primary screen.
What to pull:
EIA Form 861 (annual utility sales and customer data by utility)
FERC OATT filings for available transfer capacity on the relevant transmission segments
RTO/ISO daily capacity reports (PJM, MISO, CAISO, ERCOT, NYISO, SPP)
Individual utility integrated resource plans (IRPs)
HIFLD (Homeland Infrastructure Foundation-Level Data) transmission line and substation dataset
What to look for: Available generation capacity within the relevant load zone, spare transformer capacity at substations within 2-5 miles, distance to the nearest 138kV or 230kV transmission line, and whether the utility's IRP reflects load growth that limits capacity for large new customers.
Many utilities publish their large customer interconnection process and queue timelines publicly. Dominion Energy, Duke and ERCOT's data portal are relatively accessible. Others require direct utility engagement to get accurate numbers.
Layer 2: Zoning and Entitlement
Zoning is the most fragmented data layer -- managed at the county or municipal level, inconsistently digitized, and updated without centralized notification.
What to pull:
Municipal and county GIS portals (most major jurisdictions publish parcel-level zoning)
State GIS clearinghouse data (available through most state data portals)
PLSS (Public Land Survey System) data for rural sites
Historic use and brownfield designations (EPA ACRES database)
Overlay zone restrictions: floodplain, wetlands, airport approach zones, agricultural protection
What to watch: Many jurisdictions have interim zoning freezes, overlay amendments, or pending rezoning petitions that don't appear in the standard GIS layer. Development teams routinely screen a site as "correctly zoned" and discover 90 days later that a rezoning petition is pending or the parcel is subject to a development agreement from a prior transaction. There is no substitute for a direct call to the planning department before advancing a site.
Layer 3: Environmental Risk
Environmental risk screening catches deal-killers before significant capital is spent on formal due diligence.
What to pull:
EPA EnviroFacts (Superfund, RCRA, CERCLIS, ECHO databases)
National Wetlands Inventory (NWI) -- U.S. Fish and Wildlife Service
FEMA National Flood Insurance Program maps (NFIP flood zone layer)
NLCD (National Land Cover Database) for current land cover and impervious surface
USFWS IPaC (Information for Planning and Consultation) for endangered species and critical habitat
State voluntary cleanup program registries
What to watch: FEMA flood maps are notoriously out of date in many jurisdictions. A parcel that reads Zone X (minimal flood hazard) in the official NFIP map may have experienced repeated flooding. Supplementing FEMA data with NOAA storm track data and USGS stream gauge records is standard practice for flood-sensitive sites.
Layer 4: Transportation and Labor Market
For industrial, logistics and manufacturing development, proximity to interstate access, port access and labor market depth are often determinative.
What to pull:
FHWA Highway Performance Monitoring System (HPMS) for traffic counts and road classification
BLS Quarterly Census of Employment and Wages (QCEW) by county -- labor force by occupation
Census LEHD (Longitudinal Employer-Household Dynamics) for commute patterns
Port throughput data (Army Corps of Engineers Waterborne Commerce Statistics)
Rail network data (AAR, individual Class I railroad GIS layers)
What to watch: BLS labor market data lags by 12-18 months. For sites in tightening labor markets -- logistics corridors, EV manufacturing clusters -- current wage surveys from local economic development organizations are more reliable for underwriting labor cost assumptions.
Layer 5: Market Demand
Demand data calibrates whether there's a tenant or buyer for what you're proposing to build.
What to pull:
CBRE, JLL and Cushman quarterly market reports (publicly available for major markets)
Real Capital Analytics (MSCI) for transaction volume and pricing benchmarks
Asset-class specific research: CBRE Data Center Solutions reports for data centers, Prologis Research for industrial and logistics
What to watch: Broker reports aggregate market-level data. Development decisions often require submarket-level analysis that broker reports don't provide at sufficient resolution. AI-assisted synthesis from multiple sources tends to outperform relying on a single broker report for granular submarket analysis -- particularly in emerging industrial corridors or secondary data center markets where coverage is thinner.
Layer 6: Parcel and Ownership Data
Identifying who owns a site and whether they're likely to transact is the bridge from screening to acquisition.
What to pull:
County assessor records (most now available via public API or web scraping)
Regrid, LightBox or similar parcel aggregation platforms for clean national coverage
Secretary of state entity records for LLC ownership chains
UCC filings and title plant data for encumbrance screening
What to watch: LLC anonymity layers make true ownership verification slow. For large acquisitions, a title search is required regardless of what assessor records show. The assessor record tells you who paid taxes last year. The title plant tells you who can actually sign a deed.
How AI Changes the Assembly Process
The traditional approach: each data layer gets pulled by a different team member, formatted into a different spreadsheet, and manually cross-referenced. Two analysts, two weeks, for a 50-parcel screen.
Modern AI-assisted workflows run these layers in parallel. An agent ingests parcel boundaries, queries each relevant data source, scores each site against a defined criteria set, and returns a ranked shortlist with flagged issues. The analyst reviews the output rather than assembling it.
The real value isn't speed alone -- it's consistency. Manual screens miss things. Power availability gets skipped for sites where the team has a strong relationship with the local economic development office. Environmental flags get rationalized. AI-driven screens apply the same criteria to every parcel.
Where human judgment stays critical: Interpreting utility capacity signals (utilities often understate available capacity in formal queues), reading local political dynamics and deciding whether a flagged risk is a dealbreaker or manageable. The sites development teams are winning are the ones where the data was assembled faster and acted on before competing buyers knew the site was available.