Live Demo #3: Trusted Data for AI-- Building Reliable Pipelines with dbt
AI systems are only as reliable as the data pipelines feeding them. This session explores how dbt addresses the core challenges of data trust: testing, versioning, semantic context, and auditability. We'll walk through these capabilities practically, including a live demo of the dbt MCP Server.
Speaker
Summary
dbt Labs: Trusted Data for AI — Building Reliable Pipelines with dbt
Speaker: Hicham Babahmed, Solutions Architect, dbt Labs Date: March 12, 2026 Event: Paris — Market Data x AI (Finteda / FactSet)
dbt Overview
dbt (data build tool) started 10 years ago as an open-source project focused on making data transformation easier and less chaotic. The core idea: bring software engineering best practices — testing, documentation, versioning — to data engineering.
Over time, dbt realized the transformation process generates rich metadata, which enables observability, cataloging, data mesh, and the semantic layer. The product has evolved from a small open-source tool to a full platform covering:
- Transformation — SQL-based, on top of any data platform/warehouse - Orchestration — Based on metadata - Observability & Catalog — Powered by transformation metadata - Semantic Layer — Originally built for BI tools, now critical for AI use cases - Data Mesh — Including data contracts and access controls
Three Pillars of AI Readiness
1. Data Quality
When companies first started building AI applications, they realized data quality was poor. dbt's emphasis on testing and documentation throughout the data engineering lifecycle ensures data quality is maintained. Tests are not just decorative — they enforce contracts that impact production code.
2. Semantic Layer (Unified KPI Definitions)
Different departments often have different definitions for the same KPIs (e.g., "revenue" from accounting vs. marketing). LLMs hallucinate when faced with ambiguous definitions. dbt's semantic layer unifies metrics, KPIs, and dimensions across the organization.
Key advantage: Defining the semantic layer near the code (not in a BI/AI tool) makes debugging much easier.
3. Code Understanding (Fusion Engine + LSP)
dbt's latest engine, Fusion, includes a Language Server Protocol (LSP) implementation. This means SQL is understood as code (not just strings), providing full context — lineage, relationships, dependencies — to AI tools like Claude Code.
The dbt MCP Server
Since dbt handles the entire transformation layer and acts as a control plane across multiple data warehouses, it can serve as a single source of truth for AI agents. The dbt MCP Server (open source, available on GitHub) connects dbt directly to any LLM, providing:
- Access to all models, metrics (semantic layer), model details, parents, lineage - Full metadata generated through the transformation pipeline - The dbt catalog (performance, execution frequency, recommendations, test coverage — e.g., flagging that only 85% of models are tested)
Live Demo: Credit Risk Report via Claude + dbt MCP
Setup
Hicham connected dbt to Claude via the MCP server and asked: "I'm preparing a quarterly credit risk report. What models are available, what do they contain, how are they structured? And show me a dashboard."
What the Agent Did
1. Connected to the banking project in dbt 2. Retrieved all metrics (semantic layer), models, model details, parents, and lineage 3. Generated a credit risk dashboard 4. When asked "Can I trust this data?" — produced a full Basel II audit readiness report including: - Data sources and project details (4 staging, 3 intermediate, 6 mart models) - Number of model tests and unit tests, including 1 contract-enforced model - Test results: how many passed, how many missed - Data freshness verification - Methodology compliance (NPL, probability of default, etc.) - Data lineage and traceability - All sourced from dbt metadata and the dbt catalog
Data Contracts
dbt supports data contracts defined via YAML configuration:
- Schema enforcement (contract enforced: true) - Type validation and tests as enforceable contracts - Public/private model access (role-based) - The LLM only has access to whatever is permissioned in dbt
Key value for finance: Regulatory compliance and auditability. Data consumers don't need to go into dbt — they can access all lineage, quality, and compliance information through the AI-generated reports.
dbt in Practice
A dbt project consists of pure SQL transformations with CTEs, organized in layers (staging → intermediate → marts). dbt automatically generates:
- Data lineage graphs - A browsable catalog with full metadata - Test coverage reporting and recommendations
Q&A:
Q: Can we use local/self-hosted LLMs with the dbt MCP server? A: Yes. Since it's an MCP server, you can run it locally or remotely and connect it to whatever LLM you're using. The dbt MCP server is open source — you choose the model. dbt provides the core framework; you plug in whatever agent or model you want.
Moderator's closing note: There is growing momentum in the industry toward meta-agents — a broader trend the event did not tackle in depth given the technical focus of the session.
Full Transcript
Hicham Babahmed: Are we AI ready? I think that we all want to know that and to be that. And that's why I wanted to talk about it today, because I believe that in order to be AI ready we need some things in place — because the time is going so fast, the technology is evolving so fast, and at the same time we have so many regulations that we need to respect.
Before starting I need to present myself, of course. So I am Hicham, I'm working as a Solutions Architect at dbt Labs. And before that — who knows dbt? Two, three, four, five people.
Audience Member: [raises hand]
Hicham Babahmed: Damn, that's good. So that's why I was supposing that not many people knew dbt in the finance world. So I started also by doing a small presentation.
dbt started 10 years ago. I'm not going into details. We started as an open source company and the main idea is to make the transformation easier, because the transformation was kind of made in a chaotic way in data engineering. We didn't have many best practices like in software engineering. And we wanted to bring some uniqueness and a way of doing things. That's how we started, by this aspect of doing only the transformation.
And at some point we realized that, like I said earlier, it was kind of chaotic. We wanted to bring some best practices, and the best practices of software engineering were existing for many decades. So what we added to it are the testing, documentation, versioning, and so on. And at that point we realized that through the whole transformation that we were running, there are a lot of metadata coming out of it. And that metadata can give us something very much needed in our field, which is the whole observability, catalog, and so on.
So our product evolved in the last 10 years from a small open source product to a whole platform where we are doing the transformation on top of the whole data platforms or data warehouses. And it's giving us the capability of having the orchestration done directly on it. And based on the metadata, we are also doing, in the best way possible, the observability, the catalog, the mesh — because we hear a lot about mesh — and the semantic layer.
The semantic layer is something that we started working on some years ago because we saw a huge value on it, especially with the BI tools. But it was actually the right timing, because now we are also using it in a lot of AI use cases.
To be AI ready, we talk a lot about many aspects, and for me I wanted just to state three facts before going into the demo — that's one of my favorite parts.
The first one is data quality. When we first started talking about AI, we realized that the data quality was kind of shit. Sorry for the word, but it's the reality. And what we had on dbt is the aspect that we wanted to push — and we were pushing — toward the testing, the documentation, in the whole data engineering lifecycle. And that gave us this aspect of being sure that the data quality is really good. This is the first thing that we had on dbt.
The second thing is what we realized: inside companies, we have different departments, and every department has a definition of their KPIs. And that's something that was kind of hard to do within an LLM, because an LLM will check something and will hallucinate because they have different definitions. Should I check the revenue that is coming from, for example, the accounting department or the marketing department? And that's something that we wanted to address through the semantic layer. So while defining the semantic layer, we are unifying the definition of the KPIs, of the metrics, of the dimensions inside our whole process. And while many are asking, should I actually do that near the code or directly on a BI or AI tool — the differentiation is that if you define it near the code, then the debugging is way easier on that aspect, which makes the whole flow much easier.
And the last thing that we wanted to also add is not only the fact of having context or having the data quality, but also having an understanding of the code. And our latest engine, which is called Fusion, actually has the capability of the LSP — Language Server Protocol. And it gives us something that is really important: it's understanding SQL and understanding what is running behind it. So now we are not only seeing the code as a string, but we have this code understanding, which means that what we are sending to the AI in order to get the information out — like Claude Code, if you are using Claude Code, or whatever — we are being sure that we are sending all the context that comes from the code as well.
So we are talking a lot about the MCP server, and that's exactly what I'm showing you today. The idea is that since we are doing the whole transformation layer inside dbt and having this control plane, whatever data platform you are using — because we are seeing also many companies using two, three different data warehouses — and this context sometimes gets lost. So when it's defined inside dbt, it can get the different aspects of it and then communicate directly to the LLM. And this is what I will show you today.
So many of you asked the question, how does dbt look like actually? And this is how it looks like. It's purely SQL transformation with simple CTEs that will at the end generate for you directly a lineage. We can also check the catalog with the whole information about it. So it's a purely data engineering process from the different layers — from the staging, the intermediate, and then the marts.
I'm not going to go into details for every model, but I tried to create a repository for finance actually. And especially for someone not coming from the finance world, it was really interesting and I learned many things.
Now what I wanted to do is — I actually connected my dbt directly to Claude through the MCP server. And in here what I'm asking is that I am preparing the quarterly credit risk report. And they want to know what are the different models that are available for me, what do they contain, how they are structured. And also I want to see a dashboard.
What is really happening here is that this prompt, directly through the MCP, will be communicating with the dbt platform. And in that aspect it will check everything. But I will show you a little bit more in detail how it works.
So in here I am connecting it to the banking project, as you can see, and I allowed it to do everything. It's not always the best way to do it, but I did this because I don't have any sensitive data in it. What is happening here is that it can check everything in terms of metrics — and when I talk about metrics, it means the semantic layer. It can get all the models, the model details, the parents, and so on. So it's getting all the information that is parked inside dbt, and with that it means the whole metadata that we generate through the whole transformation pipeline.
And I will get back to see... I thought it would have moved a little bit. Yeah. So here what is happening is that it's telling me — it's spinning, right?
Audience Member: Yeah.
Hicham Babahmed: That's the risk of it. But that's why I love live demos.
So yeah, what is happening — let's try it again. I want a dashboard for the credit risk. And just a quick insight about it: what happened is that before starting and before coming to the demo, my usage on my other Claude account hit the limit. So actually I needed to find another way to show you the demo. I know — so that's why it's kind of taking some time.
But what is happening here, just to give you an insight, is that it's checking everything that is happening in dbt. And with that it means the dbt catalog that we have, and that I can also show you — in this view, having the performance, for example, how often the models are executed, what are the different recommendations that I have. For example, only 85% of the models are tested, and so on. And it has access to everything.
And especially what I find really interesting in a field like finance is that there is a lot of regulatory requirements, which means that we need to know the auditing behind it and we need to be sure that this dashboard that is shown now is actually the right one.
So now I have the dashboard, but the main question is — can I trust this actually? And I will just ask it. Because sometimes the LLMs tend to show really fancy things. But the main idea is to be sure that I can trust this and I can use it. I don't want to go into a meeting and say some numbers and everyone looks at me and they're like, hey, what does it even mean? So this is the idea behind it — to create that trust.
Yeah, so like you said, my made-up data — that's what I meant when I said that I prepared this just for it.
So since we talked also about data mesh and dbt Mesh, what we have is also the capability of defining the data contracts inside dbt. What does it mean? So — sorry for that, I love live demos. That's the beauty of it.
What I was saying about the capability of defining the data contracts inside dbt is something that you can do with whatever model you're using. You can also define if a model is public or private, in order to be sure that the different departments don't have access to it. And if someone has access to, for example, one role-based capability, it will be just for that. Which means that also the LLM will only have access to whatever is permissioned in dbt.
And now what is happening is that it will be checking the different integrations, it will check also the models, and I will show you at the same time how it actually looks like.
So when we talk about data contracts, it's not something complicated. We define it with a YAML file. And as you can see here, I have the config, I have the contract, I enforced it with "true," which means that all the types here and also the tests are defined as contracts that need to be respected. And in this case we are being sure that those tests are not just written for the beauty of it — they have an impact on the code that we are going to be putting in production and the code that is going to be consumed afterward by the different data consumers.
So that's exactly what it's showing me now — it's telling me that the data contract is enforced.
What I wanted actually to show you as well is the fact of getting a credit risk report made for me. And what I did at the end is I also asked if I can trust this data, because this time it worked. And what it created for me is this whole credit risk report with the Basel II audit readiness.
It's telling me, okay, it's coming from this project. dbt is running with the different sources — four staging, three intermediate, six mart models — with the number of model tests and the unit tests as well, and one contract-enforced model. And it's giving me all the information about it, which means the assessment, how many tests were passed, how many were missed — which is really important for me, because when I talk about auditability, we need also to be sure that the freshness was respected and so on.
And this is everything that was taken from the whole metadata that was generated on dbt and the dbt catalog. And I can go through the whole details, but we are having also the methodology compliance — in this case, because I had some NPL defined, the probability of default, and so on. And I have the data lineage and traceability that were defined in this report as well.
And everything is coming directly from dbt. So for the data consumers, they don't need to go into dbt to check it — they can access that data through the AI-generated reports. And that's it. Thank you.
[Applause]
Q&A
Moderator: Thank you. We can take like two questions. The first one is there.
Audience Member: Can we still use a local LLM? For example, if we have a self-hosted model, is there a certain interface we can use with your platform?
Hicham Babahmed: Since we are using the MCP server, you can use the MCP server locally or remotely. And you can connect it to whatever LLM you're using. The dbt MCP server is open source — you can just Google it and you will find the repository. From our side, we provide a core framework, then you plug in whatever agent or model you want. We keep the choice for you to choose whatever model you want to use.
Moderator: Okay, thanks. One more question?
[No further questions.]
Moderator: Okay, thanks. Interesting stuff. Also from Laurent, there is an underlying point we are not going to tackle today because it's not technical, but there is a lot of momentum going into meta-agents. That's where the whole industry is going.