Building MLOps for Traditional ML in Microsoft Fabric
Description
Discover how Microsoft Fabric enables a modern Machine Learning Operation for Traditional ML. Attendees will observe a practical approach to building ML pipelines using Notebooks, Lakehouses, Pipelines, Environments, and Experiments. We'll also explore custom spark pools, data manipulation tools, and machine learning libraries.
Key Takeaways
- All Tools, One Platform
- Separation of Concerns (SoC)
- Numerous transformations
- Extract External Data
- Explicit Column Mapping
- Many checks per column
- Difficult to maintain
My Notes
Action Items
- [ ]
Resources & Links
Slides
Building MLOps
for Traditional ML
in Microsoft Fabric
Nicholas Leonhard
Machine Learning Engineer
Connect
on LinkedIn:
https://www.linkedin.com/in/
nicholas-leonhard-21a972ab
What is MLOps?
ML Operation
ML Pipeline
ML
What is Machine Learning?
• Model a relationship
Price
House Size
Price
House Size
Statistical
Methods
+
Computational
Techniques
Machine
Learning
Machine Learning Lifecycle Pipeline
Data
Collection
Monitoring
Deployment
Preprocessing
Feature
Engineering
Evaluation
Model
Training
ML Operation
Tracking Servers
Compute Clusters
ML Pipeline
ML
Virtual Environments
ML Operation
ML Pipeline
ML
Before Fabric
Limitations
• Not a server
• Technical overhead
• Zero redundancy
Microsoft Fabric
• All Tools, One Platform
• Reduced Complexity
• Scalability
• Microsoft Ecosystem
MLOps on Fabric
Basic Framework
x1 Lakehouse
x1 Pipeline
x6 Notebooks
x1 Environment
x2 Experiments
Lakehouse
• Separation of Concerns (SoC)
External
Databases
ML Pipeline
External
Databases
EDA
Lakehouse
ML Pipeline
Development
Lakehouse
• Separation of Concerns (SoC)
• Reference Tables
11.0101
Computer Science
CIP Crosswalk
Lakehouse
• Separation of Concerns (SoC)
• Reference Tables
• Store Predictions
current_predictions
prediction_history
Lakehouse
• Separation of Concerns (SoC)
• Reference Tables
• Store Predictions
• Unstructured Data
• Configuration files
• CSV files
Data for ML
• Subset of clean data
• Numerous transformations
• Feature selection
• Type casting
• Encoding
• Feature scaling
• Imputation
Simple Transformations
Upstream
Downstream
Complex Transformations
External
Databases
Lakehouse
ML Pipeline
External
Databases
= Copy data activity
Lakehouse
ML Pipeline
Copy data activity
• Extract External Data
• Custom SQL Query
• String Interpolation
Copy data activity
• Extract External Data
• Custom SQL Query
• String Interpolation
• Explicit Column Mapping
Inferred Schema
Target Schema
STUDENT_ID
String
STUDENT_ID
String
ACT_MATH
String
ACT_MATH
Int32
Data Validation
• Many checks per column
• Difficult to maintain
Column Registry
• YAML file
• Column metadata
Structural Validation
Distributional Validation
Basic Framework
x1 Lakehouse
x1 Pipeline
x6 Notebooks
x2 Experiments
x1 Environment
ML Pipeline
• Software
• Python
• Package
.py ml_pipeline
.py preprocessing
.py modeling
.py main
.py evaluation
.py validation
Spark Job Definition
Notebook
Hybrid
.py main
main
main
.py preprocessing
preprocessing
.py preprocessing
.py modeling
modeling
.py modeling
.py evaluation
evaluation
.py evaluation
.py validation
validation
.py validation
.py main
.py preprocessing
.py modeling
.py evaluation
.py validation
Environment (Default)
Runtime
Libraries
• OS Mariner 2.0
• Apache Spark 3.5
• Delta Lake 3.2
• Python 3.11
• scikit-learn 1.2.2
• xgboost 2.0.3
• shap 0.42.1
Compute
• Node size: medium
• # of Nodes: 1-10
• Autoscale: enabled
Machine Learning Libraries
• Single-node libraries
• Scikit-Learn
• XGBoost
• Multi-node libraries
• Synapse ML
• MLlib
Environment (Custom)
Runtime
Libraries
• OS Mariner 2.0
• Apache Spark 3.5
• Delta Lake 3.2
• Python 3.11
• scikit-learn 1.6.0
• xgboost 3.0.0
• shap 0.48.0
Compute
• Node size: large
• # of Nodes: 1
• Autoscale: disabled
Spark Job Definition
.py main
.py preprocessing
.py modeling
.py evaluation
Environment (Default)
Runtime
Libraries
Compute
• OS Mariner 2.0
• Apache Spark 3.5
• Delta Lake 3.2
• Python 3.11
• scikit-learn 1.2.2
• xgboost 2.0.3
• shap 0.42.1
• Node size: medium
• # of Nodes: 1-10
• Autoscale: enabled
.py validation
.py main
.py preprocessing
.py modeling
.py evaluation
.py validation
.py main
.py preprocessing
.py modeling
.py evaluation
.py validation
.py main
.py preprocessing
.py modeling
.py evaluation
.py validation
This will create a .crc file in your lakehouse
Spark Job Definition
Notebook
Hybrid
.py main
main
main
.py preprocessing
preprocessing
.py preprocessing
.py modeling
modeling
.py modeling
.py evaluation
evaluation
.py evaluation
.py validation
validation
.py validation
Notebook Approach
• Notebooks in production
• “Primary coding item”
• Convenient
• Use %run to import modules into main
Spark Job Definition
Notebook
Hybrid
.py main
main
main
.py preprocessing
preprocessing
.py preprocessing
.py modeling
modeling
.py modeling
.py evaluation
evaluation
.py evaluation
.py validation
validation
.py validation
Hybrid Approach
• Best of both worlds
• Leverage agentic coding tools
• Claude Code
• GitHub Copilot
• Perform better on .py files
Experiment Tracking
• Train/test multiple different models
• Analyze each “run”
• Compare different runs
Experiment
XGB Run
Run name Start time
Duration
Accuracy
3/19/26
3:13
0.76
XGB Run 3/18/26
4:01
0.78
RF Run
Experiment
XGB Run
Run name Start time
Duration
Accuracy
XGB Run 3/20/26
4:25
0.81
RF Run
3/19/26
3:13
0.76
XGB Run 3/18/26
4:01
0.78
RF Run
Experiment
XGB Run
Run name Start time
Duration
Accuracy
3/20/26
3:11
0.79
XGB Run 3/20/26
4:25
0.81
RF Run
3/19/26
3:13
0.76
XGB Run 3/18/26
4:01
0.78
RF Run
RF Run
Experiment
XGB Run
Run name Start time
Duration
Accuracy
3/20/26
3:11
0.79
XGB Run 3/20/26
4:25
0.81
RF Run
3/19/26
3:13
0.76
XGB Run 3/18/26
4:01
0.78
RF Run
RF Run
Model
Version 2
Version 1
Basic Framework
x1 Lakehouse
x1 Pipeline
x6 Notebooks
x1 Environment
x2 Experiments
External
Database
Custom
Environment
YAML
registry
.py
preprocessing
.py
modeling
.py
evaluation
.py
validation
External
Database
enrollment
Custom
Environment
YAML
registry
.py
preprocessing
.py
modeling
.py
evaluation
.py
validation
External
Database
enrollment
preprocessed_data
Preprocess
Data
Custom
Environment
YAML
registry
.py
preprocessing
.py
modeling
.py
evaluation
.py
validation
External
Database
enrollment
Preprocess
Data
preprocessed_data
Main
Custom
Environment
YAML
registry
.py
preprocessing
.py
modeling
.py
evaluation
.py
validation
External
Database
enrollment
Preprocess
Data
Custom
Environment
preprocessed_data
Main
Training
Experiments
Production
Experiments
YAML
registry
.py
preprocessing
.py
modeling
.py
evaluation
.py
validation
External
Database
enrollment
Preprocess
Data
preprocessed_data
Main
current_predictions
prediction_history
Custom
Environment
Training
Experiments
Production
Experiments
YAML
registry
.py
preprocessing
.py
modeling
.py
evaluation
.py
validation
External
Database
enrollment
Preprocess
Data
preprocessed_data
Main
current_predictions
prediction_history
Custom
Environment
Training
Experiments
Production
Experiments
YAML
registry
.py
preprocessing
.py
modeling
.py
evaluation
.py
validation