The difference between pipeline and data flow in Azure Data Factory
- Sean Liu
- Oct 20, 2024
- 2 min read
I got confused with the concept of a pipeline and data flow when I just started to create ETL jobs in Azure Data Factory (ADF). Both of them are formed of several steps and they look so similar. In this article, I will describe what the difference is between a pipeline and a data flow.
What are pipelines?
Pipelines can include a variety of activities, such as data movement, data transformation, and control flow activities.
For example, the following picture shows a pipeline which includes two activities, a get metadata and a foreach.
Engineers use pipelines for orchestration and scheduling. Pipelines allow engineers to manage and coordinate the execution of various activities, such as copying data, running stored procedures, or executing data flows.
What are data flows?
Data Flows include transformations like joins, aggregations, and filtering. They run on scaled-out Apache Spark clusters managed by ADF.
Engineers use data flows for data transformation. Data flows enable engineers to perform complex data transformations without writing code, using a visual interface.
For example, the following picture shows a data flow which contains 5 steps. The task of the data flow is to read a csv file, do some transformations and save it to the destination.
There are many transformations available in data flows, such as Join, Union, Derived Column, Select, Stringify, Filter, Sort, etc. ADF categorizes transformations into 6 types: multiple inputs/outputs, schema modifier, formatters, row modifier, flowlets, and destination.
How to use data flows in pipelines
In a pipeline, you can just drag and drop the data flow activity in the editor and select the data flow you have created. Please note that data flows are run on Spark and the compute size will affect the computing power and the cost.
Summary
Pipelines are designed for orchestrating and managing workflows, while data flows are designed for transforming data in workflows.
Reference
Comments