Places Core Data Product¶
Places is the core output of the Replica pipeline. It simulates the complete activities and movements of residents, visitors, and commercial vehicles in a region for a typical day within a season. Places are delivered as megaregions, each covering between 10 and 50 million residents across multiple states. The output is a complete trip table, population table, and routing table for each modeled day.
Temporal Coverage¶
Seasons¶
A Places season is a 13-week modeling period. The output represents a typical day within that window, not a specific calendar date.
| Season | Months Covered | Day Types |
|---|---|---|
| Spring | March, April, May | Thursday (weekday), Saturday (weekend) |
| Fall | September, October, November | Thursday (weekday), Saturday (weekend) |
Available seasons¶
All megaregions have the following seasons:
- Fall 2019
- Fall 2021
- Fall 2022
- Spring 2023
- Fall 2023
- Spring 2024
- Fall 2024
Caveats: - No 2020 seasons exist — Replica skipped 2020 due to COVID. - No pre-2019 data — Fall 2019 is the earliest available season. - Spring 2021 is no longer offered, but legacy Studies using it remain accessible.
What "typical day" means¶
Each season produces output for a typical Thursday (midweek) and a typical Saturday (weekend). This is distinct from AADT, which averages across all days of the year without distinguishing weekday from weekend or accounting for seasonal variation.
Megaregions¶
The country is divided into 12 megaregions: Alaska, Cal-Nev, Great Lakes, Hawaii, Mid-Atlantic, North Atlantic, North Central, Northeast, Northwest, South Atlantic, South Central, and Southwest.
Cross-megaregion trips¶
Limited support exists for travel between megaregions. Some trips have "out of region" origin/destination values. These may originate from bordering counties ("donut region") and include:
- Trips by border-county residents who work or attend school within the megaregion
- External-to-external trips using a road within the megaregion
Donut trips¶
In the cross-megaregion context, "core" refers to the interior of a megaregion and "donut" refers to counties whose boundaries adjoin a neighboring megaregion (border counties). Donut trips are specifically trips between one megaregion's core and the donut of an adjoining megaregion. These are distinct from generic cross-megaregion trips and are generated separately when building trip tables for border areas. People in the donut have person.residentType = 'DONUT' in the trip table.
Trips¶
Definition¶
A trip is a movement between places. A trip begins when a person leaves a location and ends when they stop to perform a non-travel activity.
Example: Walking from home to a cafe, sitting for coffee, then walking to work = two trips (home-to-cafe and cafe-to-work).
Multi-modal trips¶
A single trip can involve multiple modes (e.g., walk to bus stop, ride bus). These are modeled as separate trip segments but counted as one trip. Primary mode is assigned by this ranking:
- Public transit
- Driving / Auto passenger / Taxi / TNC
- Biking
- Walking
Transit-to-transit transfers are part of the same trip. However, unlinked transit legs will appear separately if you sum trips by transit submode or click on an individual station in a Study.
Caveat: All transit access in the model is via walking or driving — no bike, passenger drop-off, or for-hire vehicle access to transit is modeled.
Dwell time threshold¶
Trips are separated at stay points detected from composite data sources:
- Stays over 5 minutes are detected with >85% accuracy
- Stays of 15 minutes are detected at ~90% accuracy
- Very short stops (e.g., drive-through coffee) often cannot be detected and will not generate a separate trip
Transit wait time¶
Trip start times are optimized so there is no wait time before the first transit leg. For transfer wait times, compare end/start times of consecutive legs in BigQuery.
Trip-level fields¶
| Field | Description |
|---|---|
| Origin / Destination | Specific lat/lon points (building-footprint level, not zone-to-zone) |
| Origin / Destination land use | Land use category at each end |
| Trip distance | Total distance traveled |
| Trip duration | Total elapsed time |
| Start / End time | Departure and arrival timestamps |
| Trip mode | Private auto driver, private auto passenger, public transit, walking, biking, freight, TNC |
| Trip purpose | home, work, errands, eat, social, shop, recreation, commercial, school, region_departure |
| Routing | Complete network links and transit routes for each trip |
Trip purpose¶
Replica uses a location choice model (LCM) to determine purpose for discretionary activities (everything other than home/work/school). The model selects specific venues and POIs as destinations, proportional to their observed aggregate popularity by day-of-week and hour.
Caveat — purpose in dense areas: In areas with many nearby venues (e.g., a shopping plaza), individual trip destinations are randomized within proximity of observed locations, weighted by venue popularity. A real person's hairdresser visit may become a synthetic person's Target visit, while another synthetic person handles the hairdresser trip.
Caveat — "work" trips for unemployed people: A small percentage of unemployed people do take work trips. ACS employment status is binary and annual — "unemployed" people may have held temporary jobs, attended interviews, done day labor, or been students with part-time work.
Airport Trips¶
Spring 2024 and later¶
Both visitors and residents travel via airports, with boardings/alightings proportional to Bureau of Transportation Statistics counts. Filter to purpose region_departure for airport trips.
Before Spring 2024¶
Only visitors could travel to airports. Resident airport trips were limited to those who work at the airport (purpose = "work"). Resident departures/arrivals by plane were not modeled.
"Other" mode near airports¶
"Other" trips near airports typically represent people arriving on flights. Replica removes most suspected airline flight movements, but some leak through. Do not analyze "Other" mode trips — they are not reliable.
Population¶
Available attributes¶
Each trip is linked to a synthetic person with these characteristics:
| Attribute | Notes |
|---|---|
| Age | |
| Sex | |
| Race and ethnicity | |
| Primary language | |
| Employment status | |
| Industry of employment | 2017 NAICS codes (see below) |
| Home location | |
| Work location | |
| Individual and household income | |
| Work-from-home status | |
| Vehicle ownership status | |
| Resident or visitor status |
Synthetic population construction¶
The population is built from Census data — PUMS, ACS, CTPP, and LEHD — to create a statistically representative synthetic population. It is calibrated against recent Census ACS estimates. The specific ACS version for a given season can be found in the Study data source details.
Visitors¶
Visitors are people who do not normally live or work in the megaregion, who either: - Stayed overnight in the megaregion, or - Entered and exited via a port of entry (usually an airport) the same day
Caveat: Visitors have no demographic data (age, income, race, etc.). Filtering to visitors only will produce blank demographic breakdowns.
Trips with no person data¶
Trips lacking person attributes are either: - Commercial (freight) trips, or - Visitor trips
Minors¶
Minors are not represented in mobile data (legally prohibited). They are assigned a school based on age and proximity to home. Enrollment totals come from public enrollment data. Home locations are based on Census data. Factors like school choice programs are not modeled.
Underrepresented populations¶
Replica does not require seeing all mobile devices. Calibration to Census data means mobile coverage gaps do not directly translate to missing population. However, Census data itself has small margins of error for some groups.
Industry of Employment¶
Replica uses 2017 NAICS codes. Granularity varies by workplace:
- When a workplace maps to multiple NAICS codes, Replica selects the most confident level of detail
- Example: If three 6-digit NAICS codes share the same 5-digit prefix, the 5-digit code is used (100% confidence) rather than guessing which 6-digit code applies (33% confidence each)
- Some industries have codes with only 4 digits
Vehicle Notes¶
| Topic | Detail |
|---|---|
| Zero-vehicle households | Can still have private auto trips — ACS data shows some zero-vehicle households reporting auto commutes (borrowed, rented, or employer-provided cars) |
| Rental cars | Included in "private auto" trips |
| Vehicle occupancy | Not available |
| Motorcycles | Not broken out separately |
| BEV data | Geospatial distribution from third-party consumer marketing data; calibrated to state-level totals from registration/sales data |
Data Quality and Uncertainty¶
General guidance¶
- Larger sample sizes = higher certainty. Filtering to very small populations or geographies (e.g., a single tract with many demographic filters) increases noise.
- Samples are scaled to match true population at census block group level, but sparse-data areas may have more variance.
- Each completed Places model includes a Quality Report comparing outputs to ground truth data (transit counts, traffic counts).
Road network confidence¶
Confidence increases with road size and importance. Large arterials and highways have higher confidence due to greater volume and sensor availability. Smaller residential roads carry more uncertainty — the model may occasionally route drivers through shortcuts real drivers would avoid.
Transit boardings and alightings¶
Replica does not calibrate transit boardings/alightings to customer-supplied data. Line-level ridership is generally reliable, and boarding/alighting locations are realistic, but real passengers may choose stops differently than modeled (e.g., walking farther to a more comfortable stop).
AADT comparison¶
Replica volumes are averages for a typical mid-week or weekend day over a 13-week season. AADT averages over all days of the year. Direct comparison requires accounting for this difference.
Rural and sparse areas¶
Rural areas have coverage but lower density. Transit in rural areas is limited — Replica models transit agencies with a minimum threshold of 500 daily boardings per route, so smaller agencies in smaller cities often lack transit coverage.
Land Use as Input¶
Land use is a key upstream input that: - Assigns buildings to home, work, and school locations (e.g., residents live in residential buildings, not single-use offices) - Defines possible destinations for discretionary trips (restaurants, retail, stadiums, etc.)
Joining land use to trip data¶
Land use download files include a FIPS code field. Join land use to trip activity data through this field.
Data Sources¶
Replica uses five categories of third-party data:
- Mobile location data — LBS data, vehicle in-dash GPS, POI data
- Consumer/resident data — Demographic data from public and private sources
- Built environment data — Land use, building footprints, transportation networks
- Economic activity data — Credit, debit, cash transactions
- Ground truth data — Auto/freight volumes, transit ridership
Data Access¶
- BigQuery access may be available depending on subscription tier.
- Data download files use
.gz(gzip) compression. - Custom geographies can be uploaded as zipped shapefiles, KML, or GeoJSON.
BigQuery Schema¶
Places data lives in two CDS (Core Data Service) tables per region/season/day combination:
| CDS Table | Contents |
|---|---|
cds-population |
One row per synthetic person. Demographics, household attributes, home/work/school locations. Includes CORE, DONUT, and VISITOR resident types. Excludes OTHER_RESIDENT_TYPE (synthetic freight trip-takers). |
cds-denormalized-trips |
One row per trip. Pre-filtered to type = 'TRAVEL' and optionIndex = 'FIRST_OPTION'. Person attributes are denormalized into a nested person struct on each trip row. |
Table naming¶
Tables follow this naming convention:
Project: core-data-service-prod
Dataset: {region}_{year}_{season}
Table: {region}_{year}_{season}_{sample}_{day}_v{version}_trip
Example fully-qualified reference:
Finding published tables¶
Look up the current production table IDs with:
SELECT *
FROM `model-159019.core_data.released_cdp_table_output`
WHERE output_name IN ('places_population', 'places_denormalized_trips')
AND tag = 'prod'
Use the latest release for each region/day/season combination.
Trip Table Columns¶
These are the key columns on the denormalized trip table.
| Column | Type | Description |
|---|---|---|
id |
STRING | Trip ID |
travelPurpose |
STRING | Trip purpose enum (see Enum Values below) |
primaryTravelMode |
STRING | Primary travel mode enum (see Enum Values below) |
startPlace |
STRUCT | Origin place. Access .id, .location.latitude, .location.longitude, .geometry |
endPlace |
STRUCT | Destination place. Same sub-fields as startPlace |
distanceMeters |
FLOAT | Trip distance in meters |
startTime |
TIMESTAMP | Trip departure time |
endTime |
TIMESTAMP | Trip arrival time |
geometry |
GEOGRAPHY | Full trip path (linestring) |
travelSegments |
REPEATED RECORD | Array of travel segments (see Travel Segments below) |
person |
STRUCT | Nested person demographics (see Person Struct below) |
tours |
REPEATED RECORD | Tour info; access tours[OFFSET(0)].type for tour type |
land_use_l1 / land_use_l2 |
STRING | Origin land use classification (level 1 and 2) |
building_use_l1 / building_use_l2 |
STRING | Origin building use classification |
BLOCKID20_origin |
STRING | 2020 Census Block FIPS at trip origin |
BLOCKID20_destination |
STRING | 2020 Census Block FIPS at trip destination |
BLOCKID10_origin |
STRING | 2010 Census Block FIPS at trip origin |
BLOCKID10_destination |
STRING | 2010 Census Block FIPS at trip destination |
previousActivityType |
STRING | Activity before this trip (e.g., HOME, WORK) |
optionIndex |
STRING | Always FIRST_OPTION in the trip table |
type |
STRING | Always TRAVEL in the trip table |
Person Struct¶
Person demographics are nested on each trip row under person.*. When residentType = 'VISITOR', all fields except id and residentType are NULL.
| Field | Type | Notes |
|---|---|---|
person.age |
FLOAT | Age in years |
person.sex |
STRING | MALE, FEMALE |
person.race_ethnicity |
STRING | e.g., WHITE_NOT_HISPANIC_OR_LATINO, BLACK_NOT_HISPANIC_OR_LATINO |
person.employment |
STRING | EMPLOYED, UNEMPLOYED, NOT_IN_LABOR_FORCE |
person.income |
FLOAT | Individual income |
person.household.income |
FLOAT | Household income |
person.household.numVehicles |
STRING | Vehicles available to the household |
person.household.personIds |
ARRAY<STRING> |
Use ARRAY_LENGTH(person.household.personIds) for household size |
person.wfh |
STRING | Work-from-home status: REMOTE, IN_PERSON, EMPLOYED_NOT_WORKING, UNEMPLOYED_NOT_WORKING |
person.residentType |
STRING | CORE, DONUT, VISITOR (see Enum Values below) |
person.commuteMode |
STRING | ACS commute mode |
person.industry_detailed |
STRING | Detailed industry; fall back to person.industry if NULL |
person.education |
STRING | e.g., SOME_COLLEGE, BACHELORS_DEGREE |
person.language |
STRING | e.g., ENGLISH, SPANISH |
person.building_type |
STRING | e.g., SINGLE_FAMILY |
person.tenure |
STRING | e.g., OWNER, RENTER |
person.school_grade_attending |
STRING | e.g., NOT_ATTENDING_SCHOOL |
person.person_fingerprint |
STRING | Deterministic person ID for display |
person.household_fingerprint |
STRING | Deterministic household ID for display |
person.household.housingUnit.geometry |
GEOGRAPHY | Home location point |
person.officeUnit.geometry |
GEOGRAPHY | Work location point (NULL if no work/school location) |
Travel Segments¶
Each trip contains an array of travelSegments. Unnest them to access segment-level detail:
SELECT
activity.id AS trip_id,
segment.mode,
segment.transitRoute.id AS transit_route_id,
segment.transitRoute.operator AS transit_agency,
segment.vehicle.type AS vehicle_type
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
Key segment fields:
Field (relative to segment) |
Type | Description |
|---|---|---|
mode |
STRING | Segment-level mode (can differ from trip primaryTravelMode) |
transitRoute.id |
STRING | Transit route ID, e.g., "MTA New York City Transit:7" |
transitRoute.operator |
STRING | Transit agency name |
transitRoute.type |
STRING | Transit submode (e.g., SUBWAY, BUS) |
transitRoute.line |
STRING | Transit route/line name |
vehicle.type |
STRING | Vehicle type |
vehicle.fuelType |
STRING | Fuel type (e.g., ELECTRIC) |
startPlace.id / endPlace.id |
STRING | Segment origin/destination (e.g., transit stop IDs) |
travelSegmentNetworkLinks |
REPEATED RECORD | Network links traversed in this segment |
Network links require a double unnest:
CROSS JOIN UNNEST(travelSegments) segment
CROSS JOIN UNNEST(segment.travelSegmentNetworkLinks) tsnl
-- then use: tsnl.networkLink.id
Enum Values¶
primaryTravelMode¶
| Value | Description |
|---|---|
PRIVATE_AUTO |
Private vehicle (includes rental cars) |
PUBLIC_TRANSIT |
Bus, rail, ferry, etc. |
ON_DEMAND_AUTO |
TNC / taxi |
COMMERCIAL |
Freight / commercial vehicle |
WALKING |
Walking |
BIKING |
Biking |
travelPurpose¶
| Value | Description |
|---|---|
HOME |
Returning home |
WORK |
Commute to workplace |
SCHOOL |
Travel to school |
SHOPPING |
Shopping trip |
EAT |
Dining out |
SOCIAL |
Social visit |
RECREATION |
Recreation / leisure |
ERRANDS |
Errands |
LODGING |
Hotel / lodging |
REGION_DEPARTURE |
Airport / port-of-entry trip |
COMMERCIAL |
Commercial / freight purpose |
WORK_FROM_HOME |
WFH activity (not present in trips table -- use population table) |
person.residentType¶
| Value | Description | Person data available? |
|---|---|---|
CORE |
Lives and/or works in the megaregion | Yes |
DONUT |
Lives in bordering counties | Yes |
VISITOR |
Overnight or same-day visitor | No -- all demographics are NULL |
OTHER_RESIDENT_TYPE |
Synthetic freight trip-takers | Excluded from CDS tables entirely |
Note on OTHER_RESIDENT_TYPE: These synthetic people exist only to carry COMMERCIAL and stage trips. They are filtered out of both CDS tables. Trips by these people (freight) will have no person data in the denormalized trips table.
Essential Query Patterns¶
The fundamental trip filter¶
Every trip query must include this filter. The raw table contains non-travel activities and alternative route options; this restricts to actual trips:
Link volumes¶
Count trips traversing each network link (road segment):
SELECT
tsnl.networkLink.id AS link_id,
COUNT(*) AS volume
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
CROSS JOIN UNNEST(segment.travelSegmentNetworkLinks) tsnl
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY link_id
Origin-destination flows¶
Count trips between geographic areas:
SELECT
origin_geo.raw_id AS origin,
dest_geo.raw_id AS destination,
COUNT(DISTINCT id) AS trips
FROM `{table}` activity
JOIN geos AS origin_geo
ON ST_COVERS(origin_geo.geom, ST_GEOGPOINT(startPlace.location.longitude, startPlace.location.latitude))
JOIN geos AS dest_geo
ON ST_COVERS(dest_geo.geom, ST_GEOGPOINT(endPlace.location.longitude, endPlace.location.latitude))
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY origin, destination
VMT (vehicle miles traveled)¶
SELECT SUM(segment.distanceMeters) / 1609.34 AS vmt_miles
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
AND segment.mode IN ('PRIVATE_AUTO', 'ON_DEMAND_AUTO', 'COMMERCIAL')
Time bucketing¶
Trips have timestamps in UTC. Convert to local time and bucket by hour or quarter-hour:
-- Hourly
EXTRACT(HOUR FROM startTime AT TIME ZONE 'America/New_York') AS hour
-- Quarter-hourly (0-95, i.e., 96 fifteen-minute intervals per day)
TRUNC(
(EXTRACT(HOUR FROM startTime AT TIME ZONE 'America/New_York') * 60
+ EXTRACT(MINUTE FROM startTime AT TIME ZONE 'America/New_York')) / 15
) AS qtr_hour
Common timezone values: America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Anchorage, Pacific/Honolulu.
Trip counts by mode and purpose¶
SELECT
primaryTravelMode AS mode,
travelPurpose AS purpose,
person.residentType AS resident_type,
COUNT(DISTINCT id) AS trips
FROM `{table}`
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY mode, purpose, resident_type
Select link analysis¶
Select link analysis lets you pick specific roadway segments and see all trips that traverse them. This is especially useful for understanding the origin-destination profile of traffic on a particular corridor. For nationwide travel analysis, a "stitched" dataset exists that joins trip segments across megaregion boundaries, enabling cross-megaregion select link queries. Without stitching, trips that cross a megaregion border are truncated at the boundary. Use the stitched dataset when the selected links carry significant inter-regional traffic (e.g., interstate highways near megaregion borders).
POI visits (discretionary trips)¶
Count visits to specific points of interest, excluding non-discretionary purposes:
SELECT
endPlace.id AS place_id,
endPlace.name AS place_name,
primaryTravelMode AS mode,
travelPurpose AS purpose,
COUNT(DISTINCT id) AS visits
FROM `{table}`
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
AND travelPurpose NOT IN ('HOME', 'WORK', 'SCHOOL', 'LODGING', 'REGION_DEPARTURE')
GROUP BY place_id, place_name, mode, purpose
ORDER BY visits DESC
Transit boardings by stop¶
Count boardings and alightings at transit stops, excluding transfers (same operator):
WITH transit_segments AS (
SELECT
segment.id,
segment.startPlace.id AS origin_stop_id,
segment.endPlace.id AS destination_stop_id,
IFNULL(
LAG(segment.transitRoute.operator) OVER (PARTITION BY activity.id ORDER BY segment.startTime)
= segment.transitRoute.operator,
FALSE
) AS is_transfer_boarding,
IFNULL(
LEAD(segment.transitRoute.operator) OVER (PARTITION BY activity.id ORDER BY segment.startTime)
= segment.transitRoute.operator,
FALSE
) AS is_transfer_alighting
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
AND primaryTravelMode = 'PUBLIC_TRANSIT'
AND segment.transitRoute.id IS NOT NULL
)
SELECT
origin_stop_id AS stop_id,
COUNT(DISTINCT id) AS boardings
FROM transit_segments
WHERE NOT is_transfer_boarding
GROUP BY stop_id
Partial run scaling¶
Development builds use a fraction of households (typically 10%). When querying these tables, scale counts by 1 / fraction_of_households. The fraction is encoded in the table name (e.g., 10pct = 0.1). For 100pct tables (production), no scaling is needed.
100% vs 10% runs: A "100% run" is the complete Places run of the full population for a megaregion — this is the production dataset. A "10% run" is a calibrated run using 10% of the population, used as a faster QC step before committing to the full run. The typical pipeline is: baseline run, then 10% calibrated run (quick QC), then 100% complete run (full QC). When querying 10% tables, multiply trip counts by 10 to approximate full-population totals.
Data Config Hierarchy¶
Places configs follow a four-level hierarchy:
configs/
├── usa/{season}.yaml # Nationwide: CDP inputs, versions, season dates
├── regions/{region}.yaml # Region definition: counties, timezone, router settings
├── {region}/{season}/season.yaml # Region+season: inherits from usa/{season}.yaml
└── {region}/{season}/{day}.yaml # Day-level: Thursday (weekday) or Saturday (weekend)
- USA config — publishing, CDP input versions, season specification (start/end dates)
- Region config — counties, timezones, router infrastructure settings
- Season config — combines USA + region via prototype inheritance
- Day config — the build target for a full day of activity in a region
Templates for new regions/seasons are in configs/template/.
Building¶
Default baseline_fraction is 0.1 (10% sample for development builds).
Dependencies¶
Places depends on all upstream CDPs: - Geos — region boundaries - Land Use — places/POI definitions - Population — synthetic households and persons - Transportation Network — routers for trip routing