October 7, 2025Microsoft Fabric

Episode 308 – Fabric September 2025 Part 3: The Dataflow Gen 2 Performance Overhaul

Fully recovered from last week’s bus-hit feeling, Jason joined John to finish the September 2025 marathon with Data Factory’s crown jewel—dataflows Gen 2 engine improvements promising 10x speedups (with asterisks). Between dissecting modern query evaluator benchmarks, celebrating two-tier pricing models, renaming data pipelines to just “pipelines,” and teaching Jason’s son to always comment code unlike a certain Canadian, they finally reached the finish line after three episodes covering one month’s updates.

Weekend Recovery & Dublin Workshop Preview

Jason emerged from a convalescing weekend—”literally from the time I hung up with you Friday afternoon… I climbed into bed… did not emerge until this morning pretty much”—just in time for Jewish holidays, two band competitions, and hopefully Sam’s cross country meet.

John spent the weekend building a dining room table with live edge ash for a kid moving out again—”first time we build something of that scale.”

The September announcements already influenced December’s Dublin workshop planning at European SharePoint Conference. “There’s a lot of stuff that I think is going to be really interesting, especially we’re going to spend some time talking about today,” Jason teased.

SharePoint list visualization’s deprecation would be complete by then: “Can’t wait to take those two slides out of the deck… never have to talk about that thing again.”

John’s garden kept producing despite waiting for killer frost, and Carolina Reapers filled large freezer bags ready for fermentation before Africa—Jason’s kids would be “very excited” about hot sauce delivery in December.

Faisal’s Top-Level Post & Performance Skepticism

Faisal Mohamood’s comprehensive Data Factory post led the announcements—he’s “ultimately in charge of all the data integration story for Fabric.”

Jason opened with measured skepticism: “There’s a bunch of numbers. It’s tough for me to believe. I’m always a skeptic… if his numbers are right… are they an across the board thing?”

John confirmed: “Like a 10x performance improvement for dataflows, right?”

“That’s what it’s sounding like, and it’s true in some cases, in some situations,” Jason qualified. “That’s what I’m curious about—what are those situations?”

The pricing story mattered too: “For folks who are worried about price, there’s some good story about that.”

Preview-Only Steps: The “Don’t Load” Equivalent

Accelerate dataflows Gen 2 authoring with preview-only steps (preview) introduced Power BI’s “don’t load into data model” concept for dataflow development.

John explained: “I want to add a step, a query step to my overall query, but I don’t want that to actually be evaluated when the dataflow runs in production.”

Use cases: filtering to specific data subsets during development (“keep only the top 100 rows”), zeroing in on problematic records without runtime performance penalties.

“You’re not going to have to go back and delete it and add it and delete it and add it every time you need access to it.”

The feature particularly benefits file system sources—a gear icon appears for adding preview-only steps when connecting to non-engine-based data.

Jason clarified engine terminology: “You’re talking about something that you’re issuing a query to that’s going to process that query for you, right?”

“Data flow is going to have to process the query versus an engine that can process the query,” John confirmed.

Jason’s shorthand: “Dumb data sources… don’t have a brain sitting on top of it that you’re issuing a request to.”

File-based systems, non-foldable sources—places where Power Query must evaluate locally rather than pushing computation to remote engines.

Modern Query Evaluator: Deconstructing the 10x Claim

Modern query evaluator for dataflows Gen 2 (preview) delivered the foundation for performance claims—requiring explicit opt-in via options tab.

Jason parsed the graph: “Out of the gates… the modern query evaluator based upon the graph that they’re showing is only about a 2.5x improvement. Then there’s an additional component that adds parallelization for execution. That’s where you get the 10x improvement.”

Supported connectors (preview):

Azure Blob Storage
Azure Data Lake Storage
Fabric Lakehouse (files vs. structured tables unclear)
Fabric Warehouse (puzzling given warehouse isn’t file-based)
OData
Power Platform Dataflow
SharePoint
Web

John’s confusion about warehouse: “I’m not 100% sure… warehouse is still Delta. It’s the storage mechanism… so it still may be a file thing.”

The distinction matters—warehouse and Lakehouse shouldn’t qualify as “dumb data sources” given their engine capabilities, yet they’re listed.

“These are all Microsoft sources… but not OData,” John noted the pattern.

More connectors coming “soon”—the file-based/non-foldable heuristic helps predict future support.

Parallelized execution for dataflows Gen 2 (preview) completed the 10x story through partition computing—same connector list (minus warehouse for parallelization, oddly).

The New York green taxi benchmark: loading 12 parquet files from 2023 dropped from 1.5 hours to 25 minutes when loading into Fabric Warehouse.

“So it is calling out warehouse,” Jason noted the example inconsistency with supported connector lists.

John connected dots: “Combine that with that new execution engine, I bet you that’s how we get to 10x.”

The performance progression graph showed:

Dataflows Gen 1 (baseline)
Dataflows Gen 2 with no enhancements
Dataflows Gen 2 with modern query evaluator (~2.5x)
Dataflows Gen 2 with modern evaluator + parallelized execution (~10x)

“I’m excited to test this out. I’ve got some people queued up this week to start some testing,” Jason announced, hoping “it’s all really fully available.”

Variable Libraries, Parameters & Incremental Refresh

Variable libraries in dataflow Gen 2 (preview)—covered extensively in Episodes 306-307—finally extended beyond pipelines.

Public parameters in dataflows Gen 2 (GA) graduated from preview, with API parameter discovery (preview) enabling programmatic viewing of available parameters.

Incremental refresh to Lakehouse (GA) as a dataflow Gen 2 destination reached production readiness.

Jason nearly missed it: “How it fits into the bigger picture is bolded. This one is not.”

New Destinations & Schema Support

New dataflow Gen 2 data destinations (GA):

Lakehouse files as CSV: Writing to files folder, not tables
SharePoint: Similar CSV output pattern
Snowflake: Listed in preview section despite GA heading confusion

Schema support for Lakehouse, Fabric SQL, and Data Warehouse destinations eliminated the default-schema-only limitation—toggle “navigate using full hierarchy” to access multiple schemas.

Copilot Features Reach GA

Natural language to custom columns with Copilot (GA) evolved from “query by example”—John’s assessment: “It was… it’s better with Copilot.”

Explain query steps with Copilot (GA) graduated alongside custom column generation.

Jason’s appreciation: “This feature I really do… you know why, John? Because you suck at commenting your code.”

John’s defense: “Your code shouldn’t have to comment your code.”

The teaching moment: “I’ve just been teaching my son some of this data science stuff… I was like, ‘Mr. John never comments his code. Always comment your code.’ Because one of the things he got in trouble for… was not commenting his code… he lost points.”

Jason’s verdict: “Comment your code, kids. People who want to work well with others…”

John’s retort: “I don’t know.”

Copilot in modern get data experience (preview) lets users describe entire flow intentions before building—Copilot walks through construction process.

Jason’s Power Automate-induced skepticism: “I’ve seen this similar thing over in the create flow actions in Power Automate, and it’s never done right for me… burned me a couple of times. I spent way too much time trying rather than just doing what I knew needed to be done.”

He trusted Fabric’s implementation more but urged caution: “It has nothing to do with the Power BI and Fabric team. It’s simply because I’ve gotten burned on the Power Automate side… which by the way, owns Copilot Studio.”

John’s framing: “Think of it as a really high-end version of IntelliSense.”

Two-Tier Pricing: The Long-Running Dataflow Relief

Two-tier pricing model for dataflow Gen 2 addressed cost concerns dramatically:

First 10 minutes: 12 CU (reduced from 16 CU) After 10 minutes: 1.5 CU (90% reduction)

“This affects a lot of concerns about long-running dataflows and their cost,” John explained.

Jason confirmed the old rate: “It used to be 16 CU across the board.”

John’s implications: “If you’ve got a lot of long-running dataflows, you should see a really big improvement… maybe you could drop your fabric capacity.”

Self-correction: “Nobody wants me to say that, but if your capacity has been scaled out because you’ve got a whole pile of dataflows, you might be able to drop it down or use that for something more productive like… notebooks.”

“Can’t argue with you there, John.”

The pricing logic: most runtime involves data movement rather than processing. “I doubt very much… you’re not actually consuming a lot of resources… it doesn’t actually cost us as much, so we don’t need to charge people as much.”

Jason agreed: “The bulk of the query run that’s talking to an engine, it’s probably happening at the front end, right?”

“Yeah, exactly. Or at the back end… either way.”

John’s assessment: “Dataflows has got some major improvements and I think there’s a lot more coming… they’ve put an infrastructure under there that’s going to help dataflows move along. That has been the knock on dataflows—how much overhead they basically use.”

Pipelines: The Great Renaming

Data pipelines renamed to pipelines—eliminating confusion with deployment pipelines (allegedly).

“To make things a little less confused so they’re not confused obviously with deployment pipelines,” John explained. “We’ve got pipelines now and deployment pipelines versus data pipelines and deployment pipelines.”

His assessment: “Which to my mind was clear, but that’s another story I had to say.”

Screenshots throughout the post showed UI language updates—”just know if you just see the word pipeline, it means the data factory pipelines.”

Email and Teams activities (GA) with preview UI refinements—legacy Office 365 activities got facelifts despite remaining “legacy.”

Jason’s observation: “Why do you know they’re legacy activities? Well, because they’re still called Office 365.”

John: “So are the new ones.”

“Yeah, I understand that, John, but why are they in preview? Well, partly because they do not support CI/CD.”

“That would do it.”

New experience for parameters surfaced pipeline parameters more prominently—”just a little easier to work with parameters, bring them to the fore.”

Evaluate expression experience enabled debugging pipeline expressions without full runs.

API parameter detection for dataflow activity improved visibility into parameters when pipelines call dataflows.

Function activity with user-defined functions (GA) graduated from preview.

Up to 20 schedules per pipeline eliminated the single-schedule limitation—”a very big deal.”

On-premises and VNet gateway support extended to invoke pipeline and semantic model refresh activities.

Jason emphasized the semantic model refresh significance: “That to me is bigger, honestly.”

Copy Job Activity Confusion

Copy job activity in pipelines (preview) created taxonomic puzzles.

John’s confusion: “We’ve taken originally the copy activity from pipelines, externalized it, made it its own thing in Fabric, its own item—the copy job—added capabilities to it. And now we’re bringing that back into pipeline as a different activity.”

The distinction: copy activity (original pipeline feature) vs. copy job (standalone item) vs. copy job activity (new pipeline activity leveraging copy job capabilities).

“I think we’ve got some confusion we’re going to have to be navigating with all of that.”

Invoke pipeline activity (GA) reached milestone for orchestrating pipelines within workflows.

Workspace identity support eliminated manual credential management—”leverage the workspace identity and you’re good to go.”

John: “Service principals for all intents and purposes.”

Variable libraries in pipelines (GA) and Databricks job activity (GA) completed the pipeline updates.

Apache Airflow, Copy Job & Connectivity

Apache Airflow job appeared as top-level section—new DAG-building ease, CI/CD support.

“We know nothing about this… we’re going to gloss over it,” Jason admitted.

Copy job (the standalone item) gained:

Change data feed detection for Lakehouse tables: Real-time destination updates via CDC
Variable library parameterization for connections (preview): Dev/test/prod workflows
Merge to Snowflake: Insert/update/delete CDC from Azure SQL, SQL Server, SQL MI, Lakehouse tables

“I’m a fan of this, John. I think I’m going to be using this a decent amount,” Jason predicted, citing fast copy capabilities.

Simplified copy assistant brought copy job’s wizard experience to pipeline copy activities.

Connectivity updates spanned both Data Factory and dataflows Gen 2:

Salesforce and Salesforce Service Cloud connectors
Upsert into Delta tables with Lakehouse connector (preview): Separate button for upsert functionality
Delta column mapping and deletion vector support
VARCHAR(MAX) and table creation in Data Warehouse (preview)
DB2 connector improvements for package collection
Snowflake connector role specification
PostgreSQL connector Entra ID support

Jason: “That becomes a big one.”

Mirroring Expansion

New mirroring sources:

Google BigQuery (preview)
Oracle (preview)
Azure SQL Managed Instance (GA)

Firewall support via VNet or on-premises data gateways enabled mirroring behind security boundaries.

Jason encountered new acronym: “OPDG—first time I’ve seen that acronym… on-premises data gateway… I translated it pretty quick on the fly.”

Azure SQL Database mirroring workspace identity authentication continued the service principal elimination theme.

Jason’s caveat: “You will not be able to use granular permissions if you’re using workspace identity, folks. You do still need to think about that and architect for it.”

John: “Like a proxy.”

VS Code Extensions & The Power BI Tease

Visual Studio Code extensions (GA) enabled Fabric interaction directly from editors—critical for AI-assisted coding workflows.

“Using VS Code, especially if you want to start to use some of that vibe coding… have AIs evaluate what you’re seeing or generate schemas… plug your VS code directly to Fabric,” John explained.

They’d covered it previously across other sections—”just gets its own section here.”

At 33 minutes, they wrapped the three-episode marathon: “We’ve never done a three-part episode for a long post before.”

Power BI coverage awaited later that day, including Laura Graham Brown’s calendaring feature callout that Jason hadn’t tested yet.

Jason’s team message from earlier captured the release’s essence: “Not planet-shifting stuff, but instead making fixes to things that were neither bad nor just okay for a while and adding lots more functionality.”

The September 2025 feature summary delivered incremental excellence—dataflow engine overhauls, pricing relief, variable library ubiquity, workspace identity proliferation, and countless connector expansions. No single feature rivaled Fabric’s launch, but the cumulative effect transformed workflows for teams finally able to trust dataflows’ performance and cost models.

Links:

Subscribe: SoundCloud | iTunes | Spotify | TuneIn | Amazon Music

Tags: 10x performance, copy job, Data Factory, dataflows gen 2, Fabric September 2025, Mirroring, modern query evaluator, parallelized execution, performance improvement, pipelines rename, Power Query, two-tier pricing, upsert, Variable Libraries, workspace identity

One Reply to “Episode 308 – Fabric September 2025 Part 3: The Dataflow Gen 2 Performance Overhaul”

Pingback: Episode 309 - Power BI September 2025 Feature Summary: GA Everything and Time Gets Intelligent - BIFocal Podcast