Analytics Engineering dbt Dimensional Modeling Data Engineering SQL

Analytics Engineering: Dimensional Modeling & dbt in Practice

A practitioner's guide to analytics engineering, covering how dimensional modeling, dbt, and orchestration come together to build reliable, scalable data platforms.

Analytics engineering sits somewhere between data engineering and analytics. You’re not building dashboards, and you’re not managing infrastructure. You’re designing the semantic layer — the part that makes both of those things actually work.

This post covers patterns I rely on every day: dimensional modeling, dbt, orchestration, and how they come together in practice.

What this post covers

Why analytics engineering matters · Dimensional modeling fundamentals (star schema, facts vs. dimensions) · dbt project structure and layering · Orchestration patterns · Testing and data quality · Real-world lessons.

What Is Analytics Engineering?

dbt Labs popularized the term, but the discipline existed long before the tooling. Analytics engineering means owning the transformation layer — the logic that turns raw, messy source data into clean, tested datasets that people actually trust.

role_comparison

Data Engineer

Builds infrastructure, ingestion pipelines, and storage. Owns the “how data gets here.”

Analytics Engineer

Designs transformation logic, dimensional models, and data contracts. Owns “what the data means.”

Data Analyst

Explores data, builds dashboards, and communicates insights. Owns “what the data tells us.”

In practice, these roles blur together constantly. But if I had to pick where the highest-leverage work happens, it’s the transformation layer. Get the models right and everything downstream gets easier.

Dimensional Modeling: The Foundation

Dimensional modeling is a way of structuring data that’s optimized for querying and understanding, not for write efficiency. Ralph Kimball formalized it in the 1990s, and despite everything that’s changed in data tooling since, the core ideas hold up surprisingly well.

The central concept is the star schema — a fact table in the middle, dimension tables around it.

star_schema

Star schema: a central fact table (fct_orders) surrounded by descriptive dimension tables

Facts vs. Dimensions

The distinction is simple but critical:

FFact Tables

Store measurable events (orders, clicks, payments)
Contain foreign keys to dimension tables
Grow continuously, often billions of rows
Aggregatable: SUM, COUNT, AVG make sense
Named with prefix fct_

DDimension Tables

Store descriptive context (who, what, where, when)
Contain natural keys and human-readable attributes
Change slowly, typically thousands to millions of rows
Used for filtering, grouping, and labeling
Named with prefix dim_

What makes this pattern so effective is that analysts don’t need to know how the source systems work. They join the fact table to whatever dimensions they need, and the query almost reads like plain English. Revenue by category and channel? One join across fct_orders, dim_product, and dim_channel.

All the messy work happened upstream. No 200-line CTEs against raw tables, no string parsing, no business logic buried in WHERE clauses.

dbt: The Transformation Layer

dbt is what makes analytics engineering scalable. Write your transformations as SELECT statements, version them in Git, test them automatically, and keep the documentation right next to the logic.

Project Structure

A well-structured dbt project follows a clear layering pattern:

dbt_project/models/

models/

├── staging/

├── _stg__sources.yml

├── stg_ecommerce__orders.sql

├── stg_ecommerce__customers.sql

└── stg_stripe__payments.sql

├── intermediate/

├── int_payments_pivoted_to_orders.sql

└── int_orders_pivoted_to_customers.sql

└── marts/

├── core/

├── fct_orders.sql

├── dim_customer.sql

└── dim_product.sql

└── finance/

└── orders.sql

Each layer has a clear responsibility:

Staging stg_[source]__[entity]

1:1 mapping with source tables. Renaming, type casting, filtering soft deletes. This is your source of truth boundary. Organized by source system, not by business domain.

Intermediate int_[entity]s_[verb]s

Purpose-built transformation steps: pivoting, aggregating, joining. These are not exposed to end users. Think of them as verbs: pivoted, aggregated_to_user, joined.

Marts fct_ / dim_

Business-defined entities at their unique grain. Organized by business domain (core, finance, marketing). This is where the star schema lives and where analysts query directly.

How Models Flow

Each layer feeds the next. Staging models are deliberately boring: a 1:1 mapping of a source table with cleaned-up column names, correct types, and soft deletes filtered out. Their job is to absorb upstream chaos so nothing downstream breaks when a source system decides to rename a column.

model_flow

stg

Staging model

Reads from raw source, renames columns to snake_case, casts to correct types, filters deleted records. No business logic. The goal is to be boring.

↓

int

Intermediate model

Joins staging models, applies business logic like sessionization and deduplication. These are internal building blocks, never queried directly by analysts.

↓

fct

Fact table

The final business-facing model. Joins intermediate data with dimension keys, adds derived measures and flags. This is what stakeholders query.

The fact table is where business logic crystallizes — dimension keys for star schema joins, calculated measures like revenue and margin, and flags like is_first_order or is_returned that analysts use constantly.

Testing & Data Quality

Testing is one of dbt’s most underrated features. Every model should have at minimum:

Essential Tests

unique

Primary key has no duplicates. If this fails, your joins will fan out and produce incorrect aggregations.

not_null

Critical columns are never null. Null foreign keys create orphaned rows that silently disappear from reports.

accepted_values

Enum columns only contain expected values. Catches upstream schema changes before they corrupt your models.

relationships

Foreign keys actually exist in the referenced table. Referential integrity is not enforced by most warehouses.

My rule: if a test would have caught a real incident, write it. I’ve seen dashboards report wrong revenue because a staging model silently started producing duplicates after a source system migration. A unique test on the primary key would have caught it before anyone noticed.

Orchestration

dbt handles the transformation, but something still needs to schedule it, watch it, and deal with failures. That’s where orchestration comes in.

daily_pipeline

06:00

Extract Sources

→

06:15

Transform

→

06:30

Test & Validate

→

06:45

Notify & Refresh

Typical daily pipeline: extract from sources, transform, validate with tests, then refresh dashboards and notify the team

The key principle is simple: tests run after every transformation, every time. If something fails, the pipeline stops before refreshing any dashboards. Stakeholders never see bad data.

The Modern Data Stack in Practice

Here’s how it all fits together in a real production setup:

modern_data_stack

Sources

Databases · APIs · Event streams · Third-party platforms

↓

Ingestion & Orchestration

Schedules extraction, monitors pipeline health, handles retries and alerting

↓

Transformation

Staging → Intermediate → Marts (dimensional models)

↓

Cloud Warehouse

Columnar storage, auto-scaling compute, separation of storage and compute

↓

Consumption

Dashboards · BI tools · Notebooks · ML feature stores · Downstream APIs

Lessons from Production

A few patterns that have consistently paid off across the projects I’ve worked on:

Model for the business, not the source

Source systems are optimized for writing. Your dimensional models should be optimized for reading and understanding. Don’t replicate the source schema. Reshape it into something an analyst can query without a data dictionary.

Staging is a firewall

When a source system changes (and it will), only your staging layer should break. If a column rename propagates all the way to your dashboards, your architecture has a single point of failure. Staging absorbs upstream chaos.

Test what matters, not everything

100% test coverage sounds great but creates noise. Focus on the business-critical invariants: primary key uniqueness, referential integrity on important joins, and accepted values on status fields that drive downstream logic.

Documentation is part of the model

When a new team member asks “what does is_completed_order mean?”, the answer should live in the schema file, not in someone’s head. Documentation that lives with the code actually gets maintained.

Naming conventions are architecture

Prefixes like stg_, int_, fct_, dim_ aren’t just cosmetic. They encode the DAG structure, the materialization strategy, and the access level. A new developer should be able to understand a model’s purpose from its name alone.

Why This Matters

Analytics engineering is what makes data trustworthy at scale. Without it, you end up with a different version of revenue in every dashboard, analysts writing enormous CTEs to work around missing models, and stakeholders who’ve stopped trusting the numbers.

With a solid semantic layer:

Analysts self-serve without needing to understand source systems
Dashboards reflect one source of truth, not competing interpretations
Quality issues get caught before they reach stakeholders
New features build on clean, tested foundations instead of fragile workarounds

If you’re building a data platform and the transformation layer feels like an afterthought — that’s probably the highest-leverage place to invest next.

Dimensional Modeling · dbt · Star Schema · Data Quality · Orchestration