Querying Replica Data¶
Projects & Datasets¶
| Project | Contents |
|---|---|
model-159019 |
Primary data warehouse — all CDPs, customer tables, internal tables |
model-inrix |
INRIX speed/volume profiles, OSM conflation tables |
| Dataset | Contents |
|---|---|
model-159019.landuse |
Land use tables (buildings, parcels, POIs, customer exports) |
model-159019.populationgen |
Population tables (households, persons) |
model-159019.geos |
Geographic boundaries and region definitions |
model-159019.places |
Places pipeline outputs |
model-159019.mobility |
Mobility metrics (VMT, AADT) |
model-159019.mini_kc |
Test region data cutouts |
model-inrix.speed_profiles |
INRIX speed data and OSM conflation |
model-inrix.volume_profiles |
INRIX volume data |
Table Naming¶
- Customer-facing (CDS):
customer_{product}_{season}(e.g.,customer_landuse_parcels_2024_Q4) - Internal:
{descriptive_name}(no strict convention) - Partitioned: some tables are date-partitioned (POIs) or clustered by key columns
- Season encoding: encoded in table names as
YYYY_QN; same season across products = same modeled period
Geostore (Universal Geography)¶
The geostore (model-159019.geos.geostore) is the canonical table of geographic boundaries used for spatial joins throughout the platform — land use aggregation, trip summarization, demographic rollups, and every product's geographic filtering. It is the single shared dimension table for all products.
A companion table geostore_by_id has the same schema (minus surface_point, area_sq_miles, and centroid) but is clustered on id instead of layer_id, for fast lookups when you already know the geo ID.
Columns¶
| Column | Type | Description |
|---|---|---|
id |
INTEGER | Internal geo identifier (primary key) |
src_id |
INTEGER | Source ID native to the data provider (e.g., FIPS code for census geographies) |
layer_id |
INTEGER | Identifies the geographic level — see the layer table below |
geometry |
GEOGRAPHY | Full polygon boundary |
name |
TEXT | Human-readable name (e.g., "King County", "Washington") |
surface_point |
GEOGRAPHY | A point guaranteed to lie on the polygon surface |
area_sq_miles |
FLOAT | Area of the geometry in square miles |
centroid |
GEOGRAPHY | Centroid point of the geometry |
Standard Geo Layers (layer_id values)¶
There are 18 standard layers spanning two census vintages (Tiger 2010 and Tiger 2020). Each layer_id is an integer FK into the layer table.
| Geographic Level | Layer Name | 2010 layer_id | 2020 layer_id |
|---|---|---|---|
| Block Group | BGRP |
43 | 5425 |
| Census Tract | TRCT |
6 | 5418 |
| County | CTY |
3 | 6634 |
| State | ST |
4 | 5373 |
| CBSA (metro area) | CBSA |
15 | 5429 |
| Place (city/town) | STPLC |
14 | 6622 |
| School District | STSCHOOL |
104 | 6663 |
| TAZ (Traffic Analysis Zone) | TAZ |
138 | -- |
| ZCTA (ZIP Code Tabulation Area) | ZCTA |
7 | 6608 |
| Country | COUNTRY |
34 | -- |
TAZ and Country currently exist only in the 2010 vintage. All other levels have both 2010 and 2020 variants. Custom geo layers uploaded by users are assigned dynamically and do not appear in this table.
FIPS Codes and BLOCKID Matching¶
For census geographies that carry FIPS codes (Block Group, Tract, County, State), the src_id column stores the FIPS code directly. These FIPS codes follow the standard hierarchical BLOCKID structure:
BLOCKID: SS CCC TTTTTT B GGGG
| | | | |
| | | | +-- Block Group (1 digit, positions 12)
| | | +---- Block suffix (not used in geostore)
| | +----------- Tract (6 digits, positions 5-10)
| +--------------- County (3 digits, positions 2-4)
+------------------ State (2 digits, positions 0-1)
FIPS prefix lengths:
| Length | Level |
|---|---|
| 2 | State |
| 5 | County |
| 11 | Census Tract |
| 12 | Block Group |
| 15 | Full Block ID (BLOCKID10) |
In practice this means you can match geos across levels by taking substrings of the FIPS code:
-- Find the county containing a given tract
SELECT county.*
FROM `model-159019.geos.geostore` county
JOIN `model-159019.geos.geostore` tract
ON county.layer_id = 3 -- CTY_2010
AND tract.layer_id = 6 -- TRCT_2010
AND CAST(county.src_id AS STRING) = SUBSTR(CAST(tract.src_id AS STRING), 1, 5)
For layers that do not carry FIPS codes (CBSA, Place, School District, ZCTA, TAZ, Country), containment relationships are resolved spatially via ST_CONTAINS or through the pre-computed contained_geos table.
Contained Geos (Hierarchy Resolution)¶
The contained_geos table stores pre-computed parent-child containment (parent_id, child_layer_id, child_id). Use it to find all tracts within a county, all block groups within a CBSA, etc.:
SELECT child_id FROM contained_geos
WHERE parent_id IN ({parent_geo_ids}) AND child_layer_id = {target_layer_id}
How Each Product Connects to Geostore¶
| Product | Join Strategy | SQL Pattern |
|---|---|---|
| Places | FIPS substr on BLOCKID columns (no spatial join needed for BGRP/TRCT/CTY/ST) | substr(BLOCKID10_origin, 0, N) = geostore.src_id |
| Places (non-FIPS layers) | Spatial intersection on place geometry | ST_Intersects(geo.geometry, st_geogpoint(lng, lat)) |
| Mobility | Two-stage spatial filter: WKT cutout then geostore intersect | Stage 1: ST_Intersects(WKT_polygon, segment.geometry) Stage 2: ST_Intersects(geostore.geometry, segment.geometry) |
| Trends (trip count) | Pre-joined tables by layer, or point spatial join | ST_Covers(geo.geometry, cutout.point) WHERE layer_id = N |
| Land Use | Centroid containment | ST_CONTAINS(geo.geometry, entity.centroid) |
Cross-Product Join Keys¶
| Join Key | Connects | Direction | Notes |
|---|---|---|---|
stable_edge_id |
Places network links <-> Mobility AADT/speeds/volumes | Bidirectional | AADT is bidirectional by default; use aadt_directional (v2+) to align with Places per-direction counts |
geostore.id / layer_id |
Any product <-> geographic aggregations | Any | Universal; every product resolves to geostore IDs |
| FIPS codes / BLOCKID | Trips <-> geostore (census layers only) | Places -> Geo | substr(BLOCKID10, 0, N) = src_id; fastest geo join, no spatial ops |
land_use_l1 / land_use_l2 |
Places trips <-> Land Use categories | Places -> Land Use | Denormalized on trip table as origin_land_use_l1, destination_land_use_l1, etc. |
| Season (table name) | Aligns temporal scope across products | All | Encoded in table names as YYYY_QN; same season = same modeled period |
geo (integer ID) |
Trends metrics <-> Trends metrics | Trends internal | Spend, trip count, VMT, WFH all share geostore integer IDs |
Dimensions (How Analysts Slice Data)¶
Trip Dimensions (Places)¶
| Dimension | Field | Transform | Valid Values |
|---|---|---|---|
| Primary Mode | primaryTravelMode |
CARPOOL -> AUTO_PASSENGER |
PRIVATE_AUTO, PUBLIC_TRANSIT, ON_DEMAND_AUTO, COMMERCIAL, WALKING, BIKING, AUTO_PASSENGER |
| Trip Purpose | travelPurpose |
WORK_FROM_HOME -> HOME |
HOME, WORK, SCHOOL, LODGING, REGION_DEPARTURE, COMMERCIAL, SHOPPING, EAT, SOCIAL, RECREATION, ERRANDS |
| Start Hour | startTime |
format_timestamp("%H", startTime, '{tz}') |
00-23 |
| Resident Type | person.residentType |
lower() |
CORE, DONUT, VISITOR (visitors have null demographics) |
| Tour Type | tours[0].type |
-- | COMMUTE, UNDIRECTED, NONE |
| Day Type | table name | thursday = weekday, saturday = weekend | weekday, weekend |
| Season | table name | Q1, Q2, Q3, Q4 | quarterly |
Person Dimensions (Places)¶
| Dimension | Field | Notes |
|---|---|---|
| Age | person.age |
Continuous; under-16 get special employment status |
| Sex | person.sex |
-- |
| Race/Ethnicity | person.race_ethnicity |
-- |
| Income | person.income (individual), person.household.income (household) |
Continuous |
| Employment | person.employment |
IF(age < 16, 'UNDER_16', employment) |
| Industry | person.industry_detailed / person.industry |
NAICS 2-digit extracted via regex |
| Household Size | person.household.personIds |
ARRAY_LENGTH(personIds) |
| Vehicles | person.household.numVehicles |
-- |
| Building Type | person.building_type |
-- |
| Tenure | person.tenure |
OWNER, RENTER |
Geography Dimensions (All Products)¶
- Layer level: block group through state/country (see Standard Layers table)
- Custom uploaded geos: arbitrary polygons with assigned
layer_id - Census vintage: 2010 (primary for FIPS joins) vs 2020
Network Dimensions (Mobility)¶
| Dimension | Field | Values |
|---|---|---|
| Road Class | highway |
motorway, trunk, primary, secondary, tertiary, unclassified, residential, *_link variants |
| Heading | heading |
0-359 degrees; convert with geos.compass_direction(heading) |
| Direction | forward (boolean), bidirectional |
Speeds have forward/backward; AADT is bidirectional by default |
| Street Name | street_name |
Free text |
| Day of Week | day |
MON-SUN (three-letter); weekday = MON-FRI |
Time Dimensions (Trends)¶
| Dimension | Encoding | Notes |
|---|---|---|
| Week | date_trunc(date, WEEK(SATURDAY)) |
Weeks run Saturday-Friday |
| Day of Week | Modeled days only | Weekday = Thursday only; Weekend = Saturday only (except Spend, which has all days) |
| Month | date_trunc(date, MONTH) |
Monthly averages divide by month_day_count lookup |
Land Use Dimensions¶
L1 Categories (10 values): residential, commercial, mixed_use, civic_institutional, industrial, transportation_utilities, agriculture, open_space, other, unknown
L2 Categories: single_family, multi_family, retail, office, non_retail_attraction, education, healthcare, civic_institutional, industrial, transportation_utilities, open_space, agriculture, other, unknown
Parcels/buildings carry up to three L2 values (landuse_l2_primary, _secondary, _tertiary).
Measures (What Analysts Count)¶
Places Measures¶
| Measure | SQL Expression |
|---|---|
| Trip count | count(id) |
| Person count | count(DISTINCT person.id) |
| Trip distance (mi) | round(distanceMeters * 0.000621371, 1) |
| Trip duration (min) | div(timestamp_diff(endTime, startTime, SECOND), 60) |
| VMT (segment) | sum(ts.distanceMeters * 0.000621371) WHERE ts.mode = 'PRIVATE_AUTO' |
| VMT (network link) | count(trip_id) * link_distance per link, prorated at geo boundaries |
| Network link volume | count(id) grouped by stableEdgeId |
| Boarding/alighting | count(DISTINCT trip.id) per transit stop |
Mobility Measures¶
| Measure | SQL Expression | Notes |
|---|---|---|
| AADT | aadt (bidirectional), aadt_directional (v2+) |
Always filter aadt IS NOT NULL |
| Hourly volume | round(avg(volume), 1) |
Pivoted to 24 columns per day group |
| Avg speed | round(avg(speed_mph), 1) |
Quarter-hourly; pivoted to 96 columns |
| Free-flow speed | free_flow_speed_mph |
From annual speeds table |
| Speed percentiles | speed_p50_mph, speed_p85_mph, speed_p95_mph |
v2+ only |
| TMC count | count |
Flat rows per intersection/movement/hour/day |
Trends Measures¶
| Measure | SQL Expression | Notes |
|---|---|---|
| Trip count | sum(trip_count) |
By mode: sum(mode_unnest.trip_count) after UNNEST |
| Trip count per capita | sum(trip_count) / sum(residents) |
LEFT JOIN geo_demographics |
| VMT | sum(vmt) |
Point-level, spatially joined to target geo |
| WFH share | sum(persons_wfh_count) / sum(employed_residents) |
INNER JOIN geo_demographics |
| Spend (home) | sum(estimated_spend_home) |
Filterable by replica_industry, spend_channel |
| Spend (merchant) | sum(estimated_spend_merchant) |
Filterable by replica_industry |
Land Use Measures¶
| Measure | SQL Expression | Notes |
|---|---|---|
| Parcel area | sum(parcel_sqft) |
Split across L2 categories for non-mixed_use |
| Building area | sum(building_sqft) |
Retail allocation logic for multi-use buildings |
| Dwelling units | sum(dwelling_units) |
Parcel table; parsed into single/multi/mixed |
| Stories | num_stories |
Building table; drives retail allocation threshold |
| Residential units | sum(res_num_units) |
Building table |
Query Recipes¶
Spatial filter by region¶
SELECT t.*
FROM `model-159019.{dataset}.{table}` t
JOIN `model-159019.geos.region` r
ON r.id = '{region_id}'
AND ST_COVERS(r.geom, t.geom)
Percent sample (for mini_kc nationwide representation)¶
Filter by segment ID using OSM table¶
WITH osm_seg_ids AS (
SELECT seg_id AS segment_col FROM `{osm_cutout}`
UNION ALL
SELECT compseg_id FROM `{osm_cutout}` WHERE compseg_id IS NOT NULL
)
SELECT t.*
FROM `{table}` t
INNER JOIN osm_seg_ids USING (segment_col)
Places Trip Volumes vs Mobility AADT on the Same Links¶
Join on stable_edge_id to compare activity-based model volumes against annual average counts:
WITH places_volumes AS (
SELECT tsnl.networkLink.id AS stable_edge_id, count(*) AS trip_volume
FROM trip_table,
UNNEST(travelSegments) AS ts,
UNNEST(ts.travelSegmentNetworkLinks) AS tsnl
WHERE {trip_filters}
GROUP BY stable_edge_id
),
mobility_aadt AS (
SELECT stable_edge_id, aadt, street_name, highway
FROM aadt_table
WHERE aadt IS NOT NULL
)
SELECT m.stable_edge_id, m.street_name, m.highway,
p.trip_volume, m.aadt
FROM mobility_aadt m
JOIN places_volumes p ON p.stable_edge_id = m.stable_edge_id
Use aadt_directional (v2+, 2024_Q4 onward) to align directionality with Places per-direction counts.
Trip Generation by Land Use Category¶
Land use L1/L2 are denormalized on the trip table. No spatial join needed:
SELECT origin_land_use_l1,
CASE WHEN origin_land_use_l1 = 'mixed_use' THEN 'mixed_use'
ELSE ifnull(origin_building_use_l2, 'unknown') END AS land_use,
count(id) AS trips
FROM trip_table
WHERE {filters}
GROUP BY 1, 2
ORDER BY trips DESC
OD Flows at a Custom Geography Level¶
Use geostore spatial joins for both origin and destination:
WITH custom_geos AS (
SELECT id, name, geometry FROM geostore WHERE layer_id = {custom_layer_id}
),
trips AS (
SELECT id, startPlace.location.longitude AS o_lng, startPlace.location.latitude AS o_lat,
endPlace.location.longitude AS d_lng, endPlace.location.latitude AS d_lat
FROM trip_table WHERE {filters}
)
SELECT o.name AS origin, d.name AS destination, count(t.id) AS trips
FROM trips t
JOIN custom_geos o ON ST_Intersects(o.geometry, ST_GEOGPOINT(t.o_lng, t.o_lat))
JOIN custom_geos d ON ST_Intersects(d.geometry, ST_GEOGPOINT(t.d_lng, t.d_lat))
GROUP BY 1, 2
ORDER BY trips DESC
For standard census layers, prefer the faster FIPS join: substr(BLOCKID10_origin, 0, N) = geostore.src_id.
Trends Time Series Paired with Places Seasonal Snapshot¶
Trends provides weekly/monthly time series; Places provides a single-season detailed snapshot. Pair them by aligning on the same geography and season:
-- Trends: weekly trip count trend for a county
SELECT date_trunc(partition_key, WEEK(SATURDAY)) AS week, sum(trip_count) AS trips
FROM mobility_geo__layer_3__v1_0_0
WHERE origin IN ({county_geo_ids})
AND partition_key BETWEEN '2024-01-01' AND '2024-12-31'
AND EXTRACT(DAYOFWEEK FROM partition_key) = 5 -- Thursday only
GROUP BY week
-- Places: detailed mode/purpose breakdown for the same county, same season
SELECT IF(primaryTravelMode='CARPOOL','AUTO_PASSENGER',primaryTravelMode) AS mode,
count(id) AS trips
FROM {region}_2024_Q2_thursday_trip_table
WHERE substr(BLOCKID10_origin, 0, 5) IN ({county_fips})
GROUP BY mode
The Trends query shows how volumes change over time; the Places query shows what those trips look like in a representative season.
Business Rules & Gotchas¶
| Rule | Detail |
|---|---|
| Visitor demographic nulling | When residentType = 'VISITOR', all person demographic columns are NULL. Fuel type is also UNKNOWN_FUEL_TYPE for visitors. |
| OTHER_RESIDENT_TYPE exclusion | Synthetic people for freight/commercial vehicles; excluded from cds-population. Their trips exist but have no demographics. |
| Fundamental trip filter | type='TRAVEL' AND optionIndex='FIRST_OPTION' is pre-applied to the denormalized trips table. Do not re-apply it. |
| CARPOOL -> AUTO_PASSENGER | Applied on primaryTravelMode and commuteMode in both filters and output. |
| WORK_FROM_HOME -> HOME | Applied on travelPurpose and previousActivityType. WFH activities are NOT in the trips table. |
| AADT bidirectional by default | The aadt column sums both directions. Directional columns only exist on v2+ (2024_Q4+). |
| Trends Saturday week truncation | date_trunc(date, WEEK(SATURDAY)) -- weeks run Saturday through Friday. |
| Speed profile 2/3 coverage threshold | Segments without sufficient GPS probe data are excluded from quarterly hourly speed results. |
| Trends models Thu/Sat only | Trip count, VMT, WFH are modeled on representative days: weekday = Thursday (DAYOFWEEK=5), weekend = Saturday (DAYOFWEEK=7). Spend has all 7 days. |
| Monthly averaging | Monthly aggregation divides by month_day_count lookup table, not simple sum. |
| UNNEST deduplication | After UNNESTing travel segments or network links, GROUP BY trip.id, person.id to avoid double-counting. |
| Employment age override | IF(person.age < 16, 'UNDER_16', person.employment) |
| VMT distance proration | When a network link crosses a geo boundary, distance is prorated by ST_Length(intersection) / ST_Length(link). |
GCS Paths¶
Published CDP artifacts are stored in Google Cloud Storage:
| Bucket | Contents |
|---|---|
gs://core_data_products/{product}/{season}/{version}/{region}.yaml |
Published DataConfigs |
gs://core_data_products/landuse/pois/{tag}/{region}.yaml |
Published POI configs |
gs://core_data_products/population/ingestion/{version}/{region}.yaml |
Population ingestion configs |
gs://core_data_products/landuse/ingestion/{version}/{region}.yaml |
Land use ingestion configs |