Azure-Data-Factory-Mapping-Data-Flow-Workshop

Use this repository for hands-on training of Azure Data Factory Mapping Data Flows capabilities.

What is Azure Data Factory?

Azure Data Factory is Azure's cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management.

❕ Another product within Azure, Azure Synapse Pipelines, is mostly synonomous with Azure Data Factory. While this repository uses Azure Data Factory for demonstration purposes, the lessons and concepts can be applied to Azure Synapse Pipelines as well.

What are Azure Data Factory mapping data flows?

This repository focuses on the mapping data flows feature within Azure Data Factory. Mapping data flows allow data engineers to develop data transformation logic without writing code (visual ETL). The resulting data flows are executed as activities within Azure Data Factory pipelines that use scaled-out Apache Spark clusters. Data flow activities can be operationalized using existing Azure Data Factory scheduling, control, flow, and monitoring capabilities.

Mapping data flows provide an entirely visual experience with no coding required. Your data flows run on ADF-managed execution clusters for scaled-out data processing. Azure Data Factory handles all the code translation, path optimization, and execution of your data flow jobs.

🤔 Prerequisites

An Azure account with an active subscription. Note: If you don't have access to an Azure subscription, you may be able to start with a free account.
You must have the necessary privileges within your Azure subscription to create resources, perform role assignments, register resource providers (if required), etc.

🧪 Lab Environment Setup

Lab Environment

📚 Learning Modules

Create Integration Runtime
Create Linked Services
Two Ways to do a Basic Copy
Joins
Slowly Changing Dimensions
Change Data Capture Storage to SQL (module planned)
Medallion Architecture: Bronze Layer
Medallion Architecture: Silver Layer
Medallion Architecture: Gold Layer
Medallion Architecture: Consumption Layer
Troubleshooting
Best Practices

📚 Optional Learning Modules

↥ back to top

📚 Medallion Architecture

In a medallion architecture, data is organized into layers:

Bronze Layer: Holds raw data.
Silver Layer: Contains cleansed data.
Gold Layer: Stores aggregated data that's useful for business analytics.
Consumption Layer: Applications and data integrations read from the gold layer and may optionally create versions of the data that are purpose-built for their use case. This layer may reside within a transactional database used by the application, another analytical storage repository, or built as an API or another technology.

In this model, data is democratized so that all or most services that work with a dataset connect to a single underlying data source to ensure consistency. Integrated, row-level security is typically built in to allow for maxium data asset re-use.

In this lab, the following concepts by layer are present:

Bronze Layer
- Data Ingestion
- Data Retention Policy
Silver Layer
- De-duplication
- Data quality assertions
- Cast data types
- Joins
- Reroute errors
- Schema drift
Gold Layer
- Calculated value(s)
- Given that the source includes both general and confidential attributes, the data is sinked twice, once for consumption of general data and once for consumption of confidential data.
  - Sink for general sensitivity: selected attributes that are confirmed to be available for general use are included using explicit column.
  - Sink for confidential sensitivity: all attributes are passed through using schema drift and auto-mapping.
Consumption Layer
- Read gold layer, and sink aggregate dataset with a new calculated column

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
data_to_be_staged		data_to_be_staged
environment		environment
images		images
modules		modules
pipeline_templates		pipeline_templates
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure-Data-Factory-Mapping-Data-Flow-Workshop

What is Azure Data Factory?

What are Azure Data Factory mapping data flows?

🤔 Prerequisites

🧪 Lab Environment Setup

📚 Learning Modules

📚 Optional Learning Modules

📚 Medallion Architecture

🔗 References

🔗 Workshop URL

About

Releases

Packages

Contributors 5

adhazel/Azure-Data-Factory-Mapping-Data-Flow-Workshop

Folders and files

Latest commit

History

Repository files navigation

Azure-Data-Factory-Mapping-Data-Flow-Workshop

What is Azure Data Factory?

What are Azure Data Factory mapping data flows?

🤔 Prerequisites

🧪 Lab Environment Setup

📚 Learning Modules

📚 Optional Learning Modules

📚 Medallion Architecture

🔗 References

🔗 Workshop URL

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages