Skip to content

Places Core Data Product

Places is the core output of the Replica pipeline. It simulates the complete activities and movements of residents, visitors, and commercial vehicles in a region for a typical day within a season. Places are delivered as megaregions, each covering between 10 and 50 million residents across multiple states. The output is a complete trip table, population table, and routing table for each modeled day.


Temporal Coverage

Seasons

A Places season is a 13-week modeling period. The output represents a typical day within that window, not a specific calendar date.

Season Months Covered Day Types
Spring March, April, May Thursday (weekday), Saturday (weekend)
Fall September, October, November Thursday (weekday), Saturday (weekend)

Available seasons

All megaregions have the following seasons:

  • Fall 2019
  • Fall 2021
  • Fall 2022
  • Spring 2023
  • Fall 2023
  • Spring 2024
  • Fall 2024

Caveats: - No 2020 seasons exist — Replica skipped 2020 due to COVID. - No pre-2019 data — Fall 2019 is the earliest available season. - Spring 2021 is no longer offered, but legacy Studies using it remain accessible.

What "typical day" means

Each season produces output for a typical Thursday (midweek) and a typical Saturday (weekend). This is distinct from AADT, which averages across all days of the year without distinguishing weekday from weekend or accounting for seasonal variation.


Megaregions

The country is divided into 12 megaregions: Alaska, Cal-Nev, Great Lakes, Hawaii, Mid-Atlantic, North Atlantic, North Central, Northeast, Northwest, South Atlantic, South Central, and Southwest.

Cross-megaregion trips

Limited support exists for travel between megaregions. Some trips have "out of region" origin/destination values. These may originate from bordering counties ("donut region") and include:

  • Trips by border-county residents who work or attend school within the megaregion
  • External-to-external trips using a road within the megaregion

Donut trips

In the cross-megaregion context, "core" refers to the interior of a megaregion and "donut" refers to counties whose boundaries adjoin a neighboring megaregion (border counties). Donut trips are specifically trips between one megaregion's core and the donut of an adjoining megaregion. These are distinct from generic cross-megaregion trips and are generated separately when building trip tables for border areas. People in the donut have person.residentType = 'DONUT' in the trip table.


Trips

Definition

A trip is a movement between places. A trip begins when a person leaves a location and ends when they stop to perform a non-travel activity.

Example: Walking from home to a cafe, sitting for coffee, then walking to work = two trips (home-to-cafe and cafe-to-work).

Multi-modal trips

A single trip can involve multiple modes (e.g., walk to bus stop, ride bus). These are modeled as separate trip segments but counted as one trip. Primary mode is assigned by this ranking:

  1. Public transit
  2. Driving / Auto passenger / Taxi / TNC
  3. Biking
  4. Walking

Transit-to-transit transfers are part of the same trip. However, unlinked transit legs will appear separately if you sum trips by transit submode or click on an individual station in a Study.

Caveat: All transit access in the model is via walking or driving — no bike, passenger drop-off, or for-hire vehicle access to transit is modeled.

Dwell time threshold

Trips are separated at stay points detected from composite data sources:

  • Stays over 5 minutes are detected with >85% accuracy
  • Stays of 15 minutes are detected at ~90% accuracy
  • Very short stops (e.g., drive-through coffee) often cannot be detected and will not generate a separate trip

Transit wait time

Trip start times are optimized so there is no wait time before the first transit leg. For transfer wait times, compare end/start times of consecutive legs in BigQuery.

Trip-level fields

Field Description
Origin / Destination Specific lat/lon points (building-footprint level, not zone-to-zone)
Origin / Destination land use Land use category at each end
Trip distance Total distance traveled
Trip duration Total elapsed time
Start / End time Departure and arrival timestamps
Trip mode Private auto driver, private auto passenger, public transit, walking, biking, freight, TNC
Trip purpose home, work, errands, eat, social, shop, recreation, commercial, school, region_departure
Routing Complete network links and transit routes for each trip

Trip purpose

Replica uses a location choice model (LCM) to determine purpose for discretionary activities (everything other than home/work/school). The model selects specific venues and POIs as destinations, proportional to their observed aggregate popularity by day-of-week and hour.

Caveat — purpose in dense areas: In areas with many nearby venues (e.g., a shopping plaza), individual trip destinations are randomized within proximity of observed locations, weighted by venue popularity. A real person's hairdresser visit may become a synthetic person's Target visit, while another synthetic person handles the hairdresser trip.

Caveat — "work" trips for unemployed people: A small percentage of unemployed people do take work trips. ACS employment status is binary and annual — "unemployed" people may have held temporary jobs, attended interviews, done day labor, or been students with part-time work.


Airport Trips

Spring 2024 and later

Both visitors and residents travel via airports, with boardings/alightings proportional to Bureau of Transportation Statistics counts. Filter to purpose region_departure for airport trips.

Before Spring 2024

Only visitors could travel to airports. Resident airport trips were limited to those who work at the airport (purpose = "work"). Resident departures/arrivals by plane were not modeled.

"Other" mode near airports

"Other" trips near airports typically represent people arriving on flights. Replica removes most suspected airline flight movements, but some leak through. Do not analyze "Other" mode trips — they are not reliable.


Population

Available attributes

Each trip is linked to a synthetic person with these characteristics:

Attribute Notes
Age
Sex
Race and ethnicity
Primary language
Employment status
Industry of employment 2017 NAICS codes (see below)
Home location
Work location
Individual and household income
Work-from-home status
Vehicle ownership status
Resident or visitor status

Synthetic population construction

The population is built from Census data — PUMS, ACS, CTPP, and LEHD — to create a statistically representative synthetic population. It is calibrated against recent Census ACS estimates. The specific ACS version for a given season can be found in the Study data source details.

Visitors

Visitors are people who do not normally live or work in the megaregion, who either: - Stayed overnight in the megaregion, or - Entered and exited via a port of entry (usually an airport) the same day

Caveat: Visitors have no demographic data (age, income, race, etc.). Filtering to visitors only will produce blank demographic breakdowns.

Trips with no person data

Trips lacking person attributes are either: - Commercial (freight) trips, or - Visitor trips

Minors

Minors are not represented in mobile data (legally prohibited). They are assigned a school based on age and proximity to home. Enrollment totals come from public enrollment data. Home locations are based on Census data. Factors like school choice programs are not modeled.

Underrepresented populations

Replica does not require seeing all mobile devices. Calibration to Census data means mobile coverage gaps do not directly translate to missing population. However, Census data itself has small margins of error for some groups.


Industry of Employment

Replica uses 2017 NAICS codes. Granularity varies by workplace:

  • When a workplace maps to multiple NAICS codes, Replica selects the most confident level of detail
  • Example: If three 6-digit NAICS codes share the same 5-digit prefix, the 5-digit code is used (100% confidence) rather than guessing which 6-digit code applies (33% confidence each)
  • Some industries have codes with only 4 digits

Vehicle Notes

Topic Detail
Zero-vehicle households Can still have private auto trips — ACS data shows some zero-vehicle households reporting auto commutes (borrowed, rented, or employer-provided cars)
Rental cars Included in "private auto" trips
Vehicle occupancy Not available
Motorcycles Not broken out separately
BEV data Geospatial distribution from third-party consumer marketing data; calibrated to state-level totals from registration/sales data

Data Quality and Uncertainty

General guidance

  • Larger sample sizes = higher certainty. Filtering to very small populations or geographies (e.g., a single tract with many demographic filters) increases noise.
  • Samples are scaled to match true population at census block group level, but sparse-data areas may have more variance.
  • Each completed Places model includes a Quality Report comparing outputs to ground truth data (transit counts, traffic counts).

Road network confidence

Confidence increases with road size and importance. Large arterials and highways have higher confidence due to greater volume and sensor availability. Smaller residential roads carry more uncertainty — the model may occasionally route drivers through shortcuts real drivers would avoid.

Transit boardings and alightings

Replica does not calibrate transit boardings/alightings to customer-supplied data. Line-level ridership is generally reliable, and boarding/alighting locations are realistic, but real passengers may choose stops differently than modeled (e.g., walking farther to a more comfortable stop).

AADT comparison

Replica volumes are averages for a typical mid-week or weekend day over a 13-week season. AADT averages over all days of the year. Direct comparison requires accounting for this difference.

Rural and sparse areas

Rural areas have coverage but lower density. Transit in rural areas is limited — Replica models transit agencies with a minimum threshold of 500 daily boardings per route, so smaller agencies in smaller cities often lack transit coverage.


Land Use as Input

Land use is a key upstream input that: - Assigns buildings to home, work, and school locations (e.g., residents live in residential buildings, not single-use offices) - Defines possible destinations for discretionary trips (restaurants, retail, stadiums, etc.)

Joining land use to trip data

Land use download files include a FIPS code field. Join land use to trip activity data through this field.


Data Sources

Replica uses five categories of third-party data:

  1. Mobile location data — LBS data, vehicle in-dash GPS, POI data
  2. Consumer/resident data — Demographic data from public and private sources
  3. Built environment data — Land use, building footprints, transportation networks
  4. Economic activity data — Credit, debit, cash transactions
  5. Ground truth data — Auto/freight volumes, transit ridership

Data Access

  • BigQuery access may be available depending on subscription tier.
  • Data download files use .gz (gzip) compression.
  • Custom geographies can be uploaded as zipped shapefiles, KML, or GeoJSON.

BigQuery Schema

Places data lives in two CDS (Core Data Service) tables per region/season/day combination:

CDS Table Contents
cds-population One row per synthetic person. Demographics, household attributes, home/work/school locations. Includes CORE, DONUT, and VISITOR resident types. Excludes OTHER_RESIDENT_TYPE (synthetic freight trip-takers).
cds-denormalized-trips One row per trip. Pre-filtered to type = 'TRAVEL' and optionIndex = 'FIRST_OPTION'. Person attributes are denormalized into a nested person struct on each trip row.

Table naming

Tables follow this naming convention:

Project:  core-data-service-prod
Dataset:  {region}_{year}_{season}
Table:    {region}_{year}_{season}_{sample}_{day}_v{version}_trip

Example fully-qualified reference:

`core-data-service-prod.north_atlantic_2021_Q4`.`north_atlantic_2021_Q4_100pct_thursday_v1_trip`

Finding published tables

Look up the current production table IDs with:

SELECT *
FROM `model-159019.core_data.released_cdp_table_output`
WHERE output_name IN ('places_population', 'places_denormalized_trips')
  AND tag = 'prod'

Use the latest release for each region/day/season combination.


Trip Table Columns

These are the key columns on the denormalized trip table.

Column Type Description
id STRING Trip ID
travelPurpose STRING Trip purpose enum (see Enum Values below)
primaryTravelMode STRING Primary travel mode enum (see Enum Values below)
startPlace STRUCT Origin place. Access .id, .location.latitude, .location.longitude, .geometry
endPlace STRUCT Destination place. Same sub-fields as startPlace
distanceMeters FLOAT Trip distance in meters
startTime TIMESTAMP Trip departure time
endTime TIMESTAMP Trip arrival time
geometry GEOGRAPHY Full trip path (linestring)
travelSegments REPEATED RECORD Array of travel segments (see Travel Segments below)
person STRUCT Nested person demographics (see Person Struct below)
tours REPEATED RECORD Tour info; access tours[OFFSET(0)].type for tour type
land_use_l1 / land_use_l2 STRING Origin land use classification (level 1 and 2)
building_use_l1 / building_use_l2 STRING Origin building use classification
BLOCKID20_origin STRING 2020 Census Block FIPS at trip origin
BLOCKID20_destination STRING 2020 Census Block FIPS at trip destination
BLOCKID10_origin STRING 2010 Census Block FIPS at trip origin
BLOCKID10_destination STRING 2010 Census Block FIPS at trip destination
previousActivityType STRING Activity before this trip (e.g., HOME, WORK)
optionIndex STRING Always FIRST_OPTION in the trip table
type STRING Always TRAVEL in the trip table

Person Struct

Person demographics are nested on each trip row under person.*. When residentType = 'VISITOR', all fields except id and residentType are NULL.

Field Type Notes
person.age FLOAT Age in years
person.sex STRING MALE, FEMALE
person.race_ethnicity STRING e.g., WHITE_NOT_HISPANIC_OR_LATINO, BLACK_NOT_HISPANIC_OR_LATINO
person.employment STRING EMPLOYED, UNEMPLOYED, NOT_IN_LABOR_FORCE
person.income FLOAT Individual income
person.household.income FLOAT Household income
person.household.numVehicles STRING Vehicles available to the household
person.household.personIds ARRAY<STRING> Use ARRAY_LENGTH(person.household.personIds) for household size
person.wfh STRING Work-from-home status: REMOTE, IN_PERSON, EMPLOYED_NOT_WORKING, UNEMPLOYED_NOT_WORKING
person.residentType STRING CORE, DONUT, VISITOR (see Enum Values below)
person.commuteMode STRING ACS commute mode
person.industry_detailed STRING Detailed industry; fall back to person.industry if NULL
person.education STRING e.g., SOME_COLLEGE, BACHELORS_DEGREE
person.language STRING e.g., ENGLISH, SPANISH
person.building_type STRING e.g., SINGLE_FAMILY
person.tenure STRING e.g., OWNER, RENTER
person.school_grade_attending STRING e.g., NOT_ATTENDING_SCHOOL
person.person_fingerprint STRING Deterministic person ID for display
person.household_fingerprint STRING Deterministic household ID for display
person.household.housingUnit.geometry GEOGRAPHY Home location point
person.officeUnit.geometry GEOGRAPHY Work location point (NULL if no work/school location)

Travel Segments

Each trip contains an array of travelSegments. Unnest them to access segment-level detail:

SELECT
  activity.id AS trip_id,
  segment.mode,
  segment.transitRoute.id AS transit_route_id,
  segment.transitRoute.operator AS transit_agency,
  segment.vehicle.type AS vehicle_type
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'

Key segment fields:

Field (relative to segment) Type Description
mode STRING Segment-level mode (can differ from trip primaryTravelMode)
transitRoute.id STRING Transit route ID, e.g., "MTA New York City Transit:7"
transitRoute.operator STRING Transit agency name
transitRoute.type STRING Transit submode (e.g., SUBWAY, BUS)
transitRoute.line STRING Transit route/line name
vehicle.type STRING Vehicle type
vehicle.fuelType STRING Fuel type (e.g., ELECTRIC)
startPlace.id / endPlace.id STRING Segment origin/destination (e.g., transit stop IDs)
travelSegmentNetworkLinks REPEATED RECORD Network links traversed in this segment

Network links require a double unnest:

CROSS JOIN UNNEST(travelSegments) segment
CROSS JOIN UNNEST(segment.travelSegmentNetworkLinks) tsnl
-- then use: tsnl.networkLink.id

Enum Values

primaryTravelMode

Value Description
PRIVATE_AUTO Private vehicle (includes rental cars)
PUBLIC_TRANSIT Bus, rail, ferry, etc.
ON_DEMAND_AUTO TNC / taxi
COMMERCIAL Freight / commercial vehicle
WALKING Walking
BIKING Biking

travelPurpose

Value Description
HOME Returning home
WORK Commute to workplace
SCHOOL Travel to school
SHOPPING Shopping trip
EAT Dining out
SOCIAL Social visit
RECREATION Recreation / leisure
ERRANDS Errands
LODGING Hotel / lodging
REGION_DEPARTURE Airport / port-of-entry trip
COMMERCIAL Commercial / freight purpose
WORK_FROM_HOME WFH activity (not present in trips table -- use population table)

person.residentType

Value Description Person data available?
CORE Lives and/or works in the megaregion Yes
DONUT Lives in bordering counties Yes
VISITOR Overnight or same-day visitor No -- all demographics are NULL
OTHER_RESIDENT_TYPE Synthetic freight trip-takers Excluded from CDS tables entirely

Note on OTHER_RESIDENT_TYPE: These synthetic people exist only to carry COMMERCIAL and stage trips. They are filtered out of both CDS tables. Trips by these people (freight) will have no person data in the denormalized trips table.


Essential Query Patterns

The fundamental trip filter

Every trip query must include this filter. The raw table contains non-travel activities and alternative route options; this restricts to actual trips:

WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'

Count trips traversing each network link (road segment):

SELECT
  tsnl.networkLink.id AS link_id,
  COUNT(*) AS volume
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
CROSS JOIN UNNEST(segment.travelSegmentNetworkLinks) tsnl
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY link_id

Origin-destination flows

Count trips between geographic areas:

SELECT
  origin_geo.raw_id AS origin,
  dest_geo.raw_id   AS destination,
  COUNT(DISTINCT id) AS trips
FROM `{table}` activity
JOIN geos AS origin_geo
  ON ST_COVERS(origin_geo.geom, ST_GEOGPOINT(startPlace.location.longitude, startPlace.location.latitude))
JOIN geos AS dest_geo
  ON ST_COVERS(dest_geo.geom, ST_GEOGPOINT(endPlace.location.longitude, endPlace.location.latitude))
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY origin, destination

VMT (vehicle miles traveled)

SELECT SUM(segment.distanceMeters) / 1609.34 AS vmt_miles
FROM `{table}` activity
CROSS JOIN UNNEST(travelSegments) segment
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
  AND segment.mode IN ('PRIVATE_AUTO', 'ON_DEMAND_AUTO', 'COMMERCIAL')

Time bucketing

Trips have timestamps in UTC. Convert to local time and bucket by hour or quarter-hour:

-- Hourly
EXTRACT(HOUR FROM startTime AT TIME ZONE 'America/New_York') AS hour

-- Quarter-hourly (0-95, i.e., 96 fifteen-minute intervals per day)
TRUNC(
  (EXTRACT(HOUR FROM startTime AT TIME ZONE 'America/New_York') * 60
   + EXTRACT(MINUTE FROM startTime AT TIME ZONE 'America/New_York')) / 15
) AS qtr_hour

Common timezone values: America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Anchorage, Pacific/Honolulu.

Trip counts by mode and purpose

SELECT
  primaryTravelMode AS mode,
  travelPurpose AS purpose,
  person.residentType AS resident_type,
  COUNT(DISTINCT id) AS trips
FROM `{table}`
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
GROUP BY mode, purpose, resident_type

Select link analysis lets you pick specific roadway segments and see all trips that traverse them. This is especially useful for understanding the origin-destination profile of traffic on a particular corridor. For nationwide travel analysis, a "stitched" dataset exists that joins trip segments across megaregion boundaries, enabling cross-megaregion select link queries. Without stitching, trips that cross a megaregion border are truncated at the boundary. Use the stitched dataset when the selected links carry significant inter-regional traffic (e.g., interstate highways near megaregion borders).

POI visits (discretionary trips)

Count visits to specific points of interest, excluding non-discretionary purposes:

SELECT
  endPlace.id AS place_id,
  endPlace.name AS place_name,
  primaryTravelMode AS mode,
  travelPurpose AS purpose,
  COUNT(DISTINCT id) AS visits
FROM `{table}`
WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
  AND travelPurpose NOT IN ('HOME', 'WORK', 'SCHOOL', 'LODGING', 'REGION_DEPARTURE')
GROUP BY place_id, place_name, mode, purpose
ORDER BY visits DESC

Transit boardings by stop

Count boardings and alightings at transit stops, excluding transfers (same operator):

WITH transit_segments AS (
  SELECT
    segment.id,
    segment.startPlace.id AS origin_stop_id,
    segment.endPlace.id AS destination_stop_id,
    IFNULL(
      LAG(segment.transitRoute.operator) OVER (PARTITION BY activity.id ORDER BY segment.startTime)
        = segment.transitRoute.operator,
      FALSE
    ) AS is_transfer_boarding,
    IFNULL(
      LEAD(segment.transitRoute.operator) OVER (PARTITION BY activity.id ORDER BY segment.startTime)
        = segment.transitRoute.operator,
      FALSE
    ) AS is_transfer_alighting
  FROM `{table}` activity
  CROSS JOIN UNNEST(travelSegments) segment
  WHERE type = 'TRAVEL' AND optionIndex = 'FIRST_OPTION'
    AND primaryTravelMode = 'PUBLIC_TRANSIT'
    AND segment.transitRoute.id IS NOT NULL
)
SELECT
  origin_stop_id AS stop_id,
  COUNT(DISTINCT id) AS boardings
FROM transit_segments
WHERE NOT is_transfer_boarding
GROUP BY stop_id

Partial run scaling

Development builds use a fraction of households (typically 10%). When querying these tables, scale counts by 1 / fraction_of_households. The fraction is encoded in the table name (e.g., 10pct = 0.1). For 100pct tables (production), no scaling is needed.

100% vs 10% runs: A "100% run" is the complete Places run of the full population for a megaregion — this is the production dataset. A "10% run" is a calibrated run using 10% of the population, used as a faster QC step before committing to the full run. The typical pipeline is: baseline run, then 10% calibrated run (quick QC), then 100% complete run (full QC). When querying 10% tables, multiply trip counts by 10 to approximate full-population totals.


Data Config Hierarchy

Places configs follow a four-level hierarchy:

configs/
├── usa/{season}.yaml              # Nationwide: CDP inputs, versions, season dates
├── regions/{region}.yaml          # Region definition: counties, timezone, router settings
├── {region}/{season}/season.yaml  # Region+season: inherits from usa/{season}.yaml
└── {region}/{season}/{day}.yaml   # Day-level: Thursday (weekday) or Saturday (weekend)
  • USA config — publishing, CDP input versions, season specification (start/end dates)
  • Region config — counties, timezones, router infrastructure settings
  • Season config — combines USA + region via prototype inheritance
  • Day config — the build target for a full day of activity in a region

Templates for new regions/seasons are in configs/template/.

Building

arti build places/configs/{region}/{season}/{day}.yaml

Default baseline_fraction is 0.1 (10% sample for development builds).

Dependencies

Places depends on all upstream CDPs: - Geos — region boundaries - Land Use — places/POI definitions - Population — synthetic households and persons - Transportation Network — routers for trip routing