Skip to content

Querying Replica Data

Projects & Datasets

Project Contents
model-159019 Primary data warehouse — all CDPs, customer tables, internal tables
model-inrix INRIX speed/volume profiles, OSM conflation tables
Dataset Contents
model-159019.landuse Land use tables (buildings, parcels, POIs, customer exports)
model-159019.populationgen Population tables (households, persons)
model-159019.geos Geographic boundaries and region definitions
model-159019.places Places pipeline outputs
model-159019.mobility Mobility metrics (VMT, AADT)
model-159019.mini_kc Test region data cutouts
model-inrix.speed_profiles INRIX speed data and OSM conflation
model-inrix.volume_profiles INRIX volume data

Table Naming

  • Customer-facing (CDS): customer_{product}_{season} (e.g., customer_landuse_parcels_2024_Q4)
  • Internal: {descriptive_name} (no strict convention)
  • Partitioned: some tables are date-partitioned (POIs) or clustered by key columns
  • Season encoding: encoded in table names as YYYY_QN; same season across products = same modeled period

Geostore (Universal Geography)

The geostore (model-159019.geos.geostore) is the canonical table of geographic boundaries used for spatial joins throughout the platform — land use aggregation, trip summarization, demographic rollups, and every product's geographic filtering. It is the single shared dimension table for all products.

A companion table geostore_by_id has the same schema (minus surface_point, area_sq_miles, and centroid) but is clustered on id instead of layer_id, for fast lookups when you already know the geo ID.

Columns

Column Type Description
id INTEGER Internal geo identifier (primary key)
src_id INTEGER Source ID native to the data provider (e.g., FIPS code for census geographies)
layer_id INTEGER Identifies the geographic level — see the layer table below
geometry GEOGRAPHY Full polygon boundary
name TEXT Human-readable name (e.g., "King County", "Washington")
surface_point GEOGRAPHY A point guaranteed to lie on the polygon surface
area_sq_miles FLOAT Area of the geometry in square miles
centroid GEOGRAPHY Centroid point of the geometry

Standard Geo Layers (layer_id values)

There are 18 standard layers spanning two census vintages (Tiger 2010 and Tiger 2020). Each layer_id is an integer FK into the layer table.

Geographic Level Layer Name 2010 layer_id 2020 layer_id
Block Group BGRP 43 5425
Census Tract TRCT 6 5418
County CTY 3 6634
State ST 4 5373
CBSA (metro area) CBSA 15 5429
Place (city/town) STPLC 14 6622
School District STSCHOOL 104 6663
TAZ (Traffic Analysis Zone) TAZ 138 --
ZCTA (ZIP Code Tabulation Area) ZCTA 7 6608
Country COUNTRY 34 --

TAZ and Country currently exist only in the 2010 vintage. All other levels have both 2010 and 2020 variants. Custom geo layers uploaded by users are assigned dynamically and do not appear in this table.

FIPS Codes and BLOCKID Matching

For census geographies that carry FIPS codes (Block Group, Tract, County, State), the src_id column stores the FIPS code directly. These FIPS codes follow the standard hierarchical BLOCKID structure:

BLOCKID:  SS CCC TTTTTT B GGGG
          |  |   |      | |
          |  |   |      | +-- Block Group (1 digit, positions 12)
          |  |   |      +---- Block suffix (not used in geostore)
          |  |   +----------- Tract (6 digits, positions 5-10)
          |  +--------------- County (3 digits, positions 2-4)
          +------------------ State (2 digits, positions 0-1)

FIPS prefix lengths:

Length Level
2 State
5 County
11 Census Tract
12 Block Group
15 Full Block ID (BLOCKID10)

In practice this means you can match geos across levels by taking substrings of the FIPS code:

-- Find the county containing a given tract
SELECT county.*
FROM `model-159019.geos.geostore` county
JOIN `model-159019.geos.geostore` tract
  ON county.layer_id = 3          -- CTY_2010
  AND tract.layer_id = 6          -- TRCT_2010
  AND CAST(county.src_id AS STRING) = SUBSTR(CAST(tract.src_id AS STRING), 1, 5)

For layers that do not carry FIPS codes (CBSA, Place, School District, ZCTA, TAZ, Country), containment relationships are resolved spatially via ST_CONTAINS or through the pre-computed contained_geos table.

Contained Geos (Hierarchy Resolution)

The contained_geos table stores pre-computed parent-child containment (parent_id, child_layer_id, child_id). Use it to find all tracts within a county, all block groups within a CBSA, etc.:

SELECT child_id FROM contained_geos
WHERE parent_id IN ({parent_geo_ids}) AND child_layer_id = {target_layer_id}

How Each Product Connects to Geostore

Product Join Strategy SQL Pattern
Places FIPS substr on BLOCKID columns (no spatial join needed for BGRP/TRCT/CTY/ST) substr(BLOCKID10_origin, 0, N) = geostore.src_id
Places (non-FIPS layers) Spatial intersection on place geometry ST_Intersects(geo.geometry, st_geogpoint(lng, lat))
Mobility Two-stage spatial filter: WKT cutout then geostore intersect Stage 1: ST_Intersects(WKT_polygon, segment.geometry) Stage 2: ST_Intersects(geostore.geometry, segment.geometry)
Trends (trip count) Pre-joined tables by layer, or point spatial join ST_Covers(geo.geometry, cutout.point) WHERE layer_id = N
Land Use Centroid containment ST_CONTAINS(geo.geometry, entity.centroid)

Cross-Product Join Keys

Join Key Connects Direction Notes
stable_edge_id Places network links <-> Mobility AADT/speeds/volumes Bidirectional AADT is bidirectional by default; use aadt_directional (v2+) to align with Places per-direction counts
geostore.id / layer_id Any product <-> geographic aggregations Any Universal; every product resolves to geostore IDs
FIPS codes / BLOCKID Trips <-> geostore (census layers only) Places -> Geo substr(BLOCKID10, 0, N) = src_id; fastest geo join, no spatial ops
land_use_l1 / land_use_l2 Places trips <-> Land Use categories Places -> Land Use Denormalized on trip table as origin_land_use_l1, destination_land_use_l1, etc.
Season (table name) Aligns temporal scope across products All Encoded in table names as YYYY_QN; same season = same modeled period
geo (integer ID) Trends metrics <-> Trends metrics Trends internal Spend, trip count, VMT, WFH all share geostore integer IDs

Dimensions (How Analysts Slice Data)

Trip Dimensions (Places)

Dimension Field Transform Valid Values
Primary Mode primaryTravelMode CARPOOL -> AUTO_PASSENGER PRIVATE_AUTO, PUBLIC_TRANSIT, ON_DEMAND_AUTO, COMMERCIAL, WALKING, BIKING, AUTO_PASSENGER
Trip Purpose travelPurpose WORK_FROM_HOME -> HOME HOME, WORK, SCHOOL, LODGING, REGION_DEPARTURE, COMMERCIAL, SHOPPING, EAT, SOCIAL, RECREATION, ERRANDS
Start Hour startTime format_timestamp("%H", startTime, '{tz}') 00-23
Resident Type person.residentType lower() CORE, DONUT, VISITOR (visitors have null demographics)
Tour Type tours[0].type -- COMMUTE, UNDIRECTED, NONE
Day Type table name thursday = weekday, saturday = weekend weekday, weekend
Season table name Q1, Q2, Q3, Q4 quarterly

Person Dimensions (Places)

Dimension Field Notes
Age person.age Continuous; under-16 get special employment status
Sex person.sex --
Race/Ethnicity person.race_ethnicity --
Income person.income (individual), person.household.income (household) Continuous
Employment person.employment IF(age < 16, 'UNDER_16', employment)
Industry person.industry_detailed / person.industry NAICS 2-digit extracted via regex
Household Size person.household.personIds ARRAY_LENGTH(personIds)
Vehicles person.household.numVehicles --
Building Type person.building_type --
Tenure person.tenure OWNER, RENTER

Geography Dimensions (All Products)

  • Layer level: block group through state/country (see Standard Layers table)
  • Custom uploaded geos: arbitrary polygons with assigned layer_id
  • Census vintage: 2010 (primary for FIPS joins) vs 2020

Network Dimensions (Mobility)

Dimension Field Values
Road Class highway motorway, trunk, primary, secondary, tertiary, unclassified, residential, *_link variants
Heading heading 0-359 degrees; convert with geos.compass_direction(heading)
Direction forward (boolean), bidirectional Speeds have forward/backward; AADT is bidirectional by default
Street Name street_name Free text
Day of Week day MON-SUN (three-letter); weekday = MON-FRI
Dimension Encoding Notes
Week date_trunc(date, WEEK(SATURDAY)) Weeks run Saturday-Friday
Day of Week Modeled days only Weekday = Thursday only; Weekend = Saturday only (except Spend, which has all days)
Month date_trunc(date, MONTH) Monthly averages divide by month_day_count lookup

Land Use Dimensions

L1 Categories (10 values): residential, commercial, mixed_use, civic_institutional, industrial, transportation_utilities, agriculture, open_space, other, unknown

L2 Categories: single_family, multi_family, retail, office, non_retail_attraction, education, healthcare, civic_institutional, industrial, transportation_utilities, open_space, agriculture, other, unknown

Parcels/buildings carry up to three L2 values (landuse_l2_primary, _secondary, _tertiary).

Measures (What Analysts Count)

Places Measures

Measure SQL Expression
Trip count count(id)
Person count count(DISTINCT person.id)
Trip distance (mi) round(distanceMeters * 0.000621371, 1)
Trip duration (min) div(timestamp_diff(endTime, startTime, SECOND), 60)
VMT (segment) sum(ts.distanceMeters * 0.000621371) WHERE ts.mode = 'PRIVATE_AUTO'
VMT (network link) count(trip_id) * link_distance per link, prorated at geo boundaries
Network link volume count(id) grouped by stableEdgeId
Boarding/alighting count(DISTINCT trip.id) per transit stop

Mobility Measures

Measure SQL Expression Notes
AADT aadt (bidirectional), aadt_directional (v2+) Always filter aadt IS NOT NULL
Hourly volume round(avg(volume), 1) Pivoted to 24 columns per day group
Avg speed round(avg(speed_mph), 1) Quarter-hourly; pivoted to 96 columns
Free-flow speed free_flow_speed_mph From annual speeds table
Speed percentiles speed_p50_mph, speed_p85_mph, speed_p95_mph v2+ only
TMC count count Flat rows per intersection/movement/hour/day
Measure SQL Expression Notes
Trip count sum(trip_count) By mode: sum(mode_unnest.trip_count) after UNNEST
Trip count per capita sum(trip_count) / sum(residents) LEFT JOIN geo_demographics
VMT sum(vmt) Point-level, spatially joined to target geo
WFH share sum(persons_wfh_count) / sum(employed_residents) INNER JOIN geo_demographics
Spend (home) sum(estimated_spend_home) Filterable by replica_industry, spend_channel
Spend (merchant) sum(estimated_spend_merchant) Filterable by replica_industry

Land Use Measures

Measure SQL Expression Notes
Parcel area sum(parcel_sqft) Split across L2 categories for non-mixed_use
Building area sum(building_sqft) Retail allocation logic for multi-use buildings
Dwelling units sum(dwelling_units) Parcel table; parsed into single/multi/mixed
Stories num_stories Building table; drives retail allocation threshold
Residential units sum(res_num_units) Building table

Query Recipes

Spatial filter by region

SELECT t.*
FROM `model-159019.{dataset}.{table}` t
JOIN `model-159019.geos.region` r
  ON r.id = '{region_id}'
  AND ST_COVERS(r.geom, t.geom)

Percent sample (for mini_kc nationwide representation)

SELECT * FROM `{table}` TABLESAMPLE SYSTEM ({percent} PERCENT)

Filter by segment ID using OSM table

WITH osm_seg_ids AS (
  SELECT seg_id AS segment_col FROM `{osm_cutout}`
  UNION ALL
  SELECT compseg_id FROM `{osm_cutout}` WHERE compseg_id IS NOT NULL
)
SELECT t.*
FROM `{table}` t
INNER JOIN osm_seg_ids USING (segment_col)

Join on stable_edge_id to compare activity-based model volumes against annual average counts:

WITH places_volumes AS (
    SELECT tsnl.networkLink.id AS stable_edge_id, count(*) AS trip_volume
    FROM trip_table,
         UNNEST(travelSegments) AS ts,
         UNNEST(ts.travelSegmentNetworkLinks) AS tsnl
    WHERE {trip_filters}
    GROUP BY stable_edge_id
),
mobility_aadt AS (
    SELECT stable_edge_id, aadt, street_name, highway
    FROM aadt_table
    WHERE aadt IS NOT NULL
)
SELECT m.stable_edge_id, m.street_name, m.highway,
       p.trip_volume, m.aadt
FROM mobility_aadt m
JOIN places_volumes p ON p.stable_edge_id = m.stable_edge_id

Use aadt_directional (v2+, 2024_Q4 onward) to align directionality with Places per-direction counts.

Trip Generation by Land Use Category

Land use L1/L2 are denormalized on the trip table. No spatial join needed:

SELECT origin_land_use_l1,
       CASE WHEN origin_land_use_l1 = 'mixed_use' THEN 'mixed_use'
            ELSE ifnull(origin_building_use_l2, 'unknown') END AS land_use,
       count(id) AS trips
FROM trip_table
WHERE {filters}
GROUP BY 1, 2
ORDER BY trips DESC

OD Flows at a Custom Geography Level

Use geostore spatial joins for both origin and destination:

WITH custom_geos AS (
    SELECT id, name, geometry FROM geostore WHERE layer_id = {custom_layer_id}
),
trips AS (
    SELECT id, startPlace.location.longitude AS o_lng, startPlace.location.latitude AS o_lat,
           endPlace.location.longitude AS d_lng, endPlace.location.latitude AS d_lat
    FROM trip_table WHERE {filters}
)
SELECT o.name AS origin, d.name AS destination, count(t.id) AS trips
FROM trips t
JOIN custom_geos o ON ST_Intersects(o.geometry, ST_GEOGPOINT(t.o_lng, t.o_lat))
JOIN custom_geos d ON ST_Intersects(d.geometry, ST_GEOGPOINT(t.d_lng, t.d_lat))
GROUP BY 1, 2
ORDER BY trips DESC

For standard census layers, prefer the faster FIPS join: substr(BLOCKID10_origin, 0, N) = geostore.src_id.

Trends provides weekly/monthly time series; Places provides a single-season detailed snapshot. Pair them by aligning on the same geography and season:

-- Trends: weekly trip count trend for a county
SELECT date_trunc(partition_key, WEEK(SATURDAY)) AS week, sum(trip_count) AS trips
FROM mobility_geo__layer_3__v1_0_0
WHERE origin IN ({county_geo_ids})
  AND partition_key BETWEEN '2024-01-01' AND '2024-12-31'
  AND EXTRACT(DAYOFWEEK FROM partition_key) = 5  -- Thursday only
GROUP BY week

-- Places: detailed mode/purpose breakdown for the same county, same season
SELECT IF(primaryTravelMode='CARPOOL','AUTO_PASSENGER',primaryTravelMode) AS mode,
       count(id) AS trips
FROM {region}_2024_Q2_thursday_trip_table
WHERE substr(BLOCKID10_origin, 0, 5) IN ({county_fips})
GROUP BY mode

The Trends query shows how volumes change over time; the Places query shows what those trips look like in a representative season.

Business Rules & Gotchas

Rule Detail
Visitor demographic nulling When residentType = 'VISITOR', all person demographic columns are NULL. Fuel type is also UNKNOWN_FUEL_TYPE for visitors.
OTHER_RESIDENT_TYPE exclusion Synthetic people for freight/commercial vehicles; excluded from cds-population. Their trips exist but have no demographics.
Fundamental trip filter type='TRAVEL' AND optionIndex='FIRST_OPTION' is pre-applied to the denormalized trips table. Do not re-apply it.
CARPOOL -> AUTO_PASSENGER Applied on primaryTravelMode and commuteMode in both filters and output.
WORK_FROM_HOME -> HOME Applied on travelPurpose and previousActivityType. WFH activities are NOT in the trips table.
AADT bidirectional by default The aadt column sums both directions. Directional columns only exist on v2+ (2024_Q4+).
Trends Saturday week truncation date_trunc(date, WEEK(SATURDAY)) -- weeks run Saturday through Friday.
Speed profile 2/3 coverage threshold Segments without sufficient GPS probe data are excluded from quarterly hourly speed results.
Trends models Thu/Sat only Trip count, VMT, WFH are modeled on representative days: weekday = Thursday (DAYOFWEEK=5), weekend = Saturday (DAYOFWEEK=7). Spend has all 7 days.
Monthly averaging Monthly aggregation divides by month_day_count lookup table, not simple sum.
UNNEST deduplication After UNNESTing travel segments or network links, GROUP BY trip.id, person.id to avoid double-counting.
Employment age override IF(person.age < 16, 'UNDER_16', person.employment)
VMT distance proration When a network link crosses a geo boundary, distance is prorated by ST_Length(intersection) / ST_Length(link).

GCS Paths

Published CDP artifacts are stored in Google Cloud Storage:

Bucket Contents
gs://core_data_products/{product}/{season}/{version}/{region}.yaml Published DataConfigs
gs://core_data_products/landuse/pois/{tag}/{region}.yaml Published POI configs
gs://core_data_products/population/ingestion/{version}/{region}.yaml Population ingestion configs
gs://core_data_products/landuse/ingestion/{version}/{region}.yaml Land use ingestion configs