Places Core Data Product¶

Places is the core output of the Replica pipeline. It simulates the complete activities and movements of residents, visitors, and commercial vehicles in a region for a typical day within a season. Places are delivered as megaregions, each covering between 10 and 50 million residents across multiple states. The output is a complete trip table, population table, and routing table for each modeled day.

Temporal Coverage¶

Seasons¶

A Places season is a 13-week modeling period. The output represents a typical day within that window, not a specific calendar date.

Season	Months Covered	Day Types
Spring	March, April, May	Thursday (weekday), Saturday (weekend)
Fall	September, October, November	Thursday (weekday), Saturday (weekend)

Available seasons¶

All megaregions have the following seasons:

Fall 2019
Fall 2021
Fall 2022
Spring 2023
Fall 2023
Spring 2024
Fall 2024

Caveats: - No 2020 seasons exist — Replica skipped 2020 due to COVID. - No pre-2019 data — Fall 2019 is the earliest available season. - Spring 2021 is no longer offered, but legacy Studies using it remain accessible.

What "typical day" means¶

Each season produces output for a typical Thursday (midweek) and a typical Saturday (weekend). This is distinct from AADT, which averages across all days of the year without distinguishing weekday from weekend or accounting for seasonal variation.

Megaregions¶

The country is divided into 12 megaregions: Alaska, Cal-Nev, Great Lakes, Hawaii, Mid-Atlantic, North Atlantic, North Central, Northeast, Northwest, South Atlantic, South Central, and Southwest.

Cross-megaregion trips¶

Limited support exists for travel between megaregions. Some trips have "out of region" origin/destination values. These may originate from bordering counties ("donut region") and include:

Trips by border-county residents who work or attend school within the megaregion
External-to-external trips using a road within the megaregion

Donut trips¶

In the cross-megaregion context, "core" refers to the interior of a megaregion and "donut" refers to counties whose boundaries adjoin a neighboring megaregion (border counties). Donut trips are specifically trips between one megaregion's core and the donut of an adjoining megaregion. These are distinct from generic cross-megaregion trips and are generated separately when building trip tables for border areas. People in the donut have person.residentType = 'DONUT' in the trip table.

Trips¶

Definition¶

A trip is a movement between places. A trip begins when a person leaves a location and ends when they stop to perform a non-travel activity.

Example: Walking from home to a cafe, sitting for coffee, then walking to work = two trips (home-to-cafe and cafe-to-work).

A single trip can involve multiple modes (e.g., walk to bus stop, ride bus). These are modeled as separate trip segments but counted as one trip. Primary mode is assigned by this ranking:

Public transit
Driving / Auto passenger / Taxi / TNC
Biking
Walking

Transit-to-transit transfers are part of the same trip. However, unlinked transit legs will appear separately if you sum trips by transit submode or click on an individual station in a Study.

Caveat: All transit access in the model is via walking or driving — no bike, passenger drop-off, or for-hire vehicle access to transit is modeled.

Dwell time threshold¶

Trips are separated at stay points detected from composite data sources:

Stays over 5 minutes are detected with >85% accuracy
Stays of 15 minutes are detected at ~90% accuracy
Very short stops (e.g., drive-through coffee) often cannot be detected and will not generate a separate trip

Transit wait time¶

Trip start times are optimized so there is no wait time before the first transit leg. For transfer wait times, compare end/start times of consecutive legs in BigQuery.

Trip-level fields¶

Field	Description
Origin / Destination	Specific lat/lon points (building-footprint level, not zone-to-zone)
Origin / Destination land use	Land use category at each end
Trip distance	Total distance traveled
Trip duration	Total elapsed time
Start / End time	Departure and arrival timestamps
Trip mode	Private auto driver, private auto passenger, public transit, walking, biking, freight, TNC
Trip purpose	home, work, errands, eat, social, shop, recreation, commercial, school, region_departure
Routing	Complete network links and transit routes for each trip

Trip purpose¶

Replica uses a location choice model (LCM) to determine purpose for discretionary activities (everything other than home/work/school). The model selects specific venues and POIs as destinations, proportional to their observed aggregate popularity by day-of-week and hour.

Caveat — purpose in dense areas: In areas with many nearby venues (e.g., a shopping plaza), individual trip destinations are randomized within proximity of observed locations, weighted by venue popularity. A real person's hairdresser visit may become a synthetic person's Target visit, while another synthetic person handles the hairdresser trip.

Caveat — "work" trips for unemployed people: A small percentage of unemployed people do take work trips. ACS employment status is binary and annual — "unemployed" people may have held temporary jobs, attended interviews, done day labor, or been students with part-time work.

Airport Trips¶

Spring 2024 and later¶

Both visitors and residents travel via airports, with boardings/alightings proportional to Bureau of Transportation Statistics counts. Filter to purpose region_departure for airport trips.

Before Spring 2024¶

Only visitors could travel to airports. Resident airport trips were limited to those who work at the airport (purpose = "work"). Resident departures/arrivals by plane were not modeled.

"Other" mode near airports¶

"Other" trips near airports typically represent people arriving on flights. Replica removes most suspected airline flight movements, but some leak through. Do not analyze "Other" mode trips — they are not reliable.

Population¶

Available attributes¶

Each trip is linked to a synthetic person with these characteristics:

Attribute	Notes
Age
Sex
Race and ethnicity
Primary language
Employment status
Industry of employment	2017 NAICS codes (see below)
Home location
Work location
Individual and household income
Work-from-home status
Vehicle ownership status
Resident or visitor status

Synthetic population construction¶

The population is built from Census data — PUMS, ACS, CTPP, and LEHD — to create a statistically representative synthetic population. It is calibrated against recent Census ACS estimates. The specific ACS version for a given season can be found in the Study data source details.

Visitors¶

Visitors are people who do not normally live or work in the megaregion, who either: - Stayed overnight in the megaregion, or - Entered and exited via a port of entry (usually an airport) the same day

Caveat: Visitors have no demographic data (age, income, race, etc.). Filtering to visitors only will produce blank demographic breakdowns.

Trips with no person data¶

Trips lacking person attributes are either: - Commercial (freight) trips, or - Visitor trips

Minors¶

Minors are not represented in mobile data (legally prohibited). They are assigned a school based on age and proximity to home. Enrollment totals come from public enrollment data. Home locations are based on Census data. Factors like school choice programs are not modeled.

Underrepresented populations¶

Replica does not require seeing all mobile devices. Calibration to Census data means mobile coverage gaps do not directly translate to missing population. However, Census data itself has small margins of error for some groups.

Industry of Employment¶

Replica uses 2017 NAICS codes. Granularity varies by workplace:

When a workplace maps to multiple NAICS codes, Replica selects the most confident level of detail
Example: If three 6-digit NAICS codes share the same 5-digit prefix, the 5-digit code is used (100% confidence) rather than guessing which 6-digit code applies (33% confidence each)
Some industries have codes with only 4 digits

Vehicle Notes¶

Topic	Detail
Zero-vehicle households	Can still have private auto trips — ACS data shows some zero-vehicle households reporting auto commutes (borrowed, rented, or employer-provided cars)
Rental cars	Included in "private auto" trips
Vehicle occupancy	Not available
Motorcycles	Not broken out separately
BEV data	Geospatial distribution from third-party consumer marketing data; calibrated to state-level totals from registration/sales data

Data Quality and Uncertainty¶

General guidance¶

Larger sample sizes = higher certainty. Filtering to very small populations or geographies (e.g., a single tract with many demographic filters) increases noise.
Samples are scaled to match true population at census block group level, but sparse-data areas may have more variance.
Each completed Places model includes a Quality Report comparing outputs to ground truth data (transit counts, traffic counts).

Road network confidence¶

Confidence increases with road size and importance. Large arterials and highways have higher confidence due to greater volume and sensor availability. Smaller residential roads carry more uncertainty — the model may occasionally route drivers through shortcuts real drivers would avoid.

Transit boardings and alightings¶

Replica does not calibrate transit boardings/alightings to customer-supplied data. Line-level ridership is generally reliable, and boarding/alighting locations are realistic, but real passengers may choose stops differently than modeled (e.g., walking farther to a more comfortable stop).

AADT comparison¶

Replica volumes are averages for a typical mid-week or weekend day over a 13-week season. AADT averages over all days of the year. Direct comparison requires accounting for this difference.

Rural and sparse areas¶

Rural areas have coverage but lower density. Transit in rural areas is limited — Replica models transit agencies with a minimum threshold of 500 daily boardings per route, so smaller agencies in smaller cities often lack transit coverage.

Land Use as Input¶

Land use is a key upstream input that: - Assigns buildings to home, work, and school locations (e.g., residents live in residential buildings, not single-use offices) - Defines possible destinations for discretionary trips (restaurants, retail, stadiums, etc.)

Joining land use to trip data¶

Land use download files include a FIPS code field. Join land use to trip activity data through this field.

Data Sources¶

Replica uses five categories of third-party data:

Mobile location data — LBS data, vehicle in-dash GPS, POI data
Consumer/resident data — Demographic data from public and private sources
Built environment data — Land use, building footprints, transportation networks
Economic activity data — Credit, debit, cash transactions
Ground truth data — Auto/freight volumes, transit ridership

Data Access¶

BigQuery access may be available depending on subscription tier.
Data download files use .gz (gzip) compression.
Custom geographies can be uploaded as zipped shapefiles, KML, or GeoJSON.

BigQuery Schema¶

Places data lives in two CDS (Core Data Service) tables per region/season/day combination:

CDS Table	Contents
`cds-population`	One row per synthetic person. Demographics, household attributes, home/work/school locations. Includes `CORE`, `DONUT`, and `VISITOR` resident types. Excludes `OTHER_RESIDENT_TYPE` (synthetic freight trip-takers).
`cds-denormalized-trips`	One row per trip. Pre-filtered to `type = 'TRAVEL'` and `optionIndex = 'FIRST_OPTION'`. Person attributes are denormalized into a nested `person` struct on each trip row.

Table naming¶

Tables follow this naming convention:

Project:  core-data-service-prod
Dataset:  {region}_{year}_{season}
Table:    {region}_{year}_{season}_{sample}_{day}_v{version}_trip

Example fully-qualified reference:

`core-data-service-prod.north_atlantic_2021_Q4`.`north_atlantic_2021_Q4_100pct_thursday_v1_trip`

Finding published tables¶

Look up the current production table IDs with:

SELECT *
FROM `model-159019.core_data.released_cdp_table_output`
WHERE output_name IN ('places_population', 'places_denormalized_trips')
  AND tag = 'prod'

Use the latest release for each region/day/season combination.

Trip Table Columns¶

These are the key columns on the denormalized trip table.

Column	Type	Description
`id`	STRING	Trip ID
`travelPurpose`	STRING	Trip purpose enum (see Enum Values below)
`primaryTravelMode`	STRING	Primary travel mode enum (see Enum Values below)
`startPlace`	STRUCT	Origin place. Access `.id`, `.location.latitude`, `.location.longitude`, `.geometry`
`endPlace`	STRUCT	Destination place. Same sub-fields as `startPlace`
`distanceMeters`	FLOAT	Trip distance in meters
`startTime`	TIMESTAMP	Trip departure time
`endTime`	TIMESTAMP	Trip arrival time
`geometry`	GEOGRAPHY	Full trip path (linestring)
`travelSegments`	REPEATED RECORD	Array of travel segments (see Travel Segments below)
`person`	STRUCT	Nested person demographics (see Person Struct below)
`tours`	REPEATED RECORD	Tour info; access `tours[OFFSET(0)].type` for tour type
`land_use_l1` / `land_use_l2`	STRING	Origin land use classification (level 1 and 2)
`building_use_l1` / `building_use_l2`	STRING	Origin building use classification
`BLOCKID20_origin`	STRING	2020 Census Block FIPS at trip origin
`BLOCKID20_destination`	STRING	2020 Census Block FIPS at trip destination
`BLOCKID10_origin`	STRING	2010 Census Block FIPS at trip origin
`BLOCKID10_destination`	STRING	2010 Census Block FIPS at trip destination
`previousActivityType`	STRING	Activity before this trip (e.g., `HOME`, `WORK`)
`optionIndex`	STRING	Always `FIRST_OPTION` in the trip table
`type`	STRING	Always `TRAVEL` in the trip table

Person Struct¶

Person demographics are nested on each trip row under person.*. When residentType = 'VISITOR', all fields except id and residentType are NULL.

Field	Type	Notes
`person.age`	FLOAT	Age in years
`person.sex`	STRING	`MALE`, `FEMALE`
`person.race_ethnicity`	STRING	e.g., `WHITE_NOT_HISPANIC_OR_LATINO`, `BLACK_NOT_HISPANIC_OR_LATINO`
`person.employment`	STRING	`EMPLOYED`, `UNEMPLOYED`, `NOT_IN_LABOR_FORCE`
`person.income`	FLOAT	Individual income
`person.household.income`	FLOAT	Household income
`person.household.numVehicles`	STRING	Vehicles available to the household
`person.household.personIds`	`ARRAY<STRING>`	Use `ARRAY_LENGTH(person.household.personIds)` for household size
`person.wfh`	STRING	Work-from-home status: `REMOTE`, `IN_PERSON`, `EMPLOYED_NOT_WORKING`, `UNEMPLOYED_NOT_WORKING`
`person.residentType`	STRING	`CORE`, `DONUT`, `VISITOR` (see Enum Values below)
`person.commuteMode`	STRING	ACS commute mode
`person.industry_detailed`	STRING	Detailed industry; fall back to `person.industry` if NULL
`person.education`	STRING	e.g., `SOME_COLLEGE`, `BACHELORS_DEGREE`
`person.language`	STRING	e.g., `ENGLISH`, `SPANISH`
`person.building_type`	STRING	e.g., `SINGLE_FAMILY`
`person.tenure`	STRING	e.g., `OWNER`, `RENTER`
`person.school_grade_attending`	STRING	e.g., `NOT_ATTENDING_SCHOOL`
`person.person_fingerprint`	STRING	Deterministic person ID for display
`person.household_fingerprint`	STRING	Deterministic household ID for display
`person.household.housingUnit.geometry`	GEOGRAPHY	Home location point
`person.officeUnit.geometry`	GEOGRAPHY	Work location point (NULL if no work/school location)

Travel Segments¶

Each trip contains an array of travelSegments. Unnest them to access segment-level detail:

SELECT
  activity.id AS trip_id,
  segment.mode,
  segment.transitRoute.id AS transit_route_id,
  segment.transitRoute.operator AS transit_agency,
  segment.vehicle.type AS vehicle_type
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'

Key segment fields:

Field (relative to `segment`)	Type	Description
`mode`	STRING	Segment-level mode (can differ from trip `primaryTravelMode`)
`transitRoute.id`	STRING	Transit route ID, e.g., `"MTA New York City Transit:7"`
`transitRoute.operator`	STRING	Transit agency name
`transitRoute.type`	STRING	Transit submode (e.g., `SUBWAY`, `BUS`)
`transitRoute.line`	STRING	Transit route/line name
`vehicle.type`	STRING	Vehicle type
`vehicle.fuelType`	STRING	Fuel type (e.g., `ELECTRIC`)
`startPlace.id` / `endPlace.id`	STRING	Segment origin/destination (e.g., transit stop IDs)
`travelSegmentNetworkLinks`	REPEATED RECORD	Network links traversed in this segment

Network links require a double unnest:

CROSS JOIN UNNEST(travelSegments) segment
CROSS JOIN UNNEST(segment.travelSegmentNetworkLinks) tsnl
-- then use: tsnl.networkLink.id

Enum Values¶

`primaryTravelMode`¶

Value	Description
`PRIVATE_AUTO`	Private vehicle (includes rental cars)
`PUBLIC_TRANSIT`	Bus, rail, ferry, etc.
`ON_DEMAND_AUTO`	TNC / taxi
`COMMERCIAL`	Freight / commercial vehicle
`WALKING`	Walking
`BIKING`	Biking

`travelPurpose`¶

Value	Description
`HOME`	Returning home
`WORK`	Commute to workplace
`SCHOOL`	Travel to school
`SHOPPING`	Shopping trip
`EAT`	Dining out
`SOCIAL`	Social visit
`RECREATION`	Recreation / leisure
`ERRANDS`	Errands
`LODGING`	Hotel / lodging
`REGION_DEPARTURE`	Airport / port-of-entry trip
`COMMERCIAL`	Commercial / freight purpose
`WORK_FROM_HOME`	WFH activity (not present in trips table -- use population table)

`person.residentType`¶

Value	Description	Person data available?
`CORE`	Lives and/or works in the megaregion	Yes
`DONUT`	Lives in bordering counties	Yes
`VISITOR`	Overnight or same-day visitor	No -- all demographics are NULL
`OTHER_RESIDENT_TYPE`	Synthetic freight trip-takers	Excluded from CDS tables entirely

Note on OTHER_RESIDENT_TYPE: These synthetic people exist only to carry COMMERCIAL and stage trips. They are filtered out of both CDS tables. Trips by these people (freight) will have no person data in the denormalized trips table.

Essential Query Patterns¶

The fundamental trip filter¶

Every trip query must include this filter. The raw table contains non-travel activities and alternative route options; this restricts to actual trips:

WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'

Link volumes¶

Count trips traversing each network link (road segment):

SELECT
  tsnl.networkLink.id AS link_id,
  COUNT(*) AS volume
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
CROSS JOIN UNNEST(segment.travelSegmentNetworkLinks) tsnl
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY link_id

Origin-destination flows¶

Count trips between geographic areas:

SELECT
  origin_geo.raw_id AS origin,
  dest_geo.raw_id   AS destination,
  COUNT(DISTINCT id) AS trips
FROM `{table}` activity
JOIN geos AS origin_geo
  ON ST_COVERS(origin_geo.geom, ST_GEOGPOINT(startPlace.location.longitude, startPlace.location.latitude))
JOIN geos AS dest_geo
  ON ST_COVERS(dest_geo.geom, ST_GEOGPOINT(endPlace.location.longitude, endPlace.location.latitude))
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY origin, destination

VMT (vehicle miles traveled)¶

SELECT SUM(segment.distanceMeters) / 1609.34 AS vmt_miles
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
  AND segment.mode IN ('PRIVATE_AUTO', 'ON_DEMAND_AUTO', 'COMMERCIAL')

Time bucketing¶

Trips have timestamps in UTC. Convert to local time and bucket by hour or quarter-hour:

-- Hourly
EXTRACT(HOUR FROM startTime AT TIME ZONE 'America/New_York') AS hour

-- Quarter-hourly (0-95, i.e., 96 fifteen-minute intervals per day)
TRUNC(
  (EXTRACT(HOUR FROM startTime AT TIME ZONE 'America/New_York') * 60
   + EXTRACT(MINUTE FROM startTime AT TIME ZONE 'America/New_York')) / 15
) AS qtr_hour

Common timezone values: America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Anchorage, Pacific/Honolulu.

Trip counts by mode and purpose¶

SELECT
  primaryTravelMode AS mode,
  travelPurpose AS purpose,
  person.residentType AS resident_type,
  COUNT(DISTINCT id) AS trips
FROM `{table}`
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY mode, purpose, resident_type

Select link analysis¶

Select link analysis lets you pick specific roadway segments and see all trips that traverse them. This is especially useful for understanding the origin-destination profile of traffic on a particular corridor. For nationwide travel analysis, a "stitched" dataset exists that joins trip segments across megaregion boundaries, enabling cross-megaregion select link queries. Without stitching, trips that cross a megaregion border are truncated at the boundary. Use the stitched dataset when the selected links carry significant inter-regional traffic (e.g., interstate highways near megaregion borders).

POI visits (discretionary trips)¶

Count visits to specific points of interest, excluding non-discretionary purposes:

SELECT
  endPlace.id AS place_id,
  endPlace.name AS place_name,
  primaryTravelMode AS mode,
  travelPurpose AS purpose,
  COUNT(DISTINCT id) AS visits
FROM `{table}`
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
  AND travelPurpose NOT IN ('HOME', 'WORK', 'SCHOOL', 'LODGING', 'REGION_DEPARTURE')
GROUP BY place_id, place_name, mode, purpose
ORDER BY visits DESC

Transit boardings by stop¶

Count boardings and alightings at transit stops, excluding transfers (same operator):

WITH transit_segments AS (
  SELECT
    segment.id,
    segment.startPlace.id AS origin_stop_id,
    segment.endPlace.id AS destination_stop_id,
    IFNULL(
      LAG(segment.transitRoute.operator) OVER (PARTITION BY activity.id ORDER BY segment.startTime)
        = segment.transitRoute.operator,
      FALSE
    ) AS is_transfer_boarding,
    IFNULL(
      LEAD(segment.transitRoute.operator) OVER (PARTITION BY activity.id ORDER BY segment.startTime)
        = segment.transitRoute.operator,
      FALSE
    ) AS is_transfer_alighting
  FROM `{table}` activity
  CROSS JOIN UNNEST(travelSegments) segment
  WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
    AND primaryTravelMode = 'PUBLIC_TRANSIT'
    AND segment.transitRoute.id IS NOT NULL
)
SELECT
  origin_stop_id AS stop_id,
  COUNT(DISTINCT id) AS boardings
FROM transit_segments
WHERE NOT is_transfer_boarding
GROUP BY stop_id

Partial run scaling¶

Development builds use a fraction of households (typically 10%). When querying these tables, scale counts by 1 / fraction_of_households. The fraction is encoded in the table name (e.g., 10pct = 0.1). For 100pct tables (production), no scaling is needed.

100% vs 10% runs: A "100% run" is the complete Places run of the full population for a megaregion — this is the production dataset. A "10% run" is a calibrated run using 10% of the population, used as a faster QC step before committing to the full run. The typical pipeline is: baseline run, then 10% calibrated run (quick QC), then 100% complete run (full QC). When querying 10% tables, multiply trip counts by 10 to approximate full-population totals.

Data Config Hierarchy¶

Places configs follow a four-level hierarchy:

configs/
├── usa/{season}.yaml              # Nationwide: CDP inputs, versions, season dates
├── regions/{region}.yaml          # Region definition: counties, timezone, router settings
├── {region}/{season}/season.yaml  # Region+season: inherits from usa/{season}.yaml
└── {region}/{season}/{day}.yaml   # Day-level: Thursday (weekday) or Saturday (weekend)

USA config — publishing, CDP input versions, season specification (start/end dates)
Region config — counties, timezones, router infrastructure settings
Season config — combines USA + region via prototype inheritance
Day config — the build target for a full day of activity in a region

Templates for new regions/seasons are in configs/template/.

Building¶

arti build places/configs/{region}/{season}/{day}.yaml

Default baseline_fraction is 0.1 (10% sample for development builds).

Dependencies¶

Places depends on all upstream CDPs: - Geos — region boundaries - Land Use — places/POI definitions - Population — synthetic households and persons - Transportation Network — routers for trip routing