Use parameters in Azure Data Factory pipelines
- Sean Liu
- Oct 19, 2024
- 2 min read
When data engineers are handling multiple files or sources, it is quite common to use parameters in data pipelines. With parameters, data engineers can avoid hard-coding the file names or source names in data pipelines, allowing them to build just one pipeline for the same procedure and reuse it on multiple files or sources. In this article, I will demonstrate how we can use parameters in Azure Data Factory.
The purpose of the data pipeline
The task is to copy multiple JSON files from a folder to another and format all files to CSV.
Source
JSON files in a Azure Blob Storage container.
Destination
CSV files in another folder in the same container.
Step by step
Create a dataset which represents the source folder. I already created a linked service in data factory. The linked service is to connect the Azure blob storage container showed above. In the file path setting, I left the file name empty so I can read all files in the raw/Verde folder in the data pipeline.
Get File List
Use the Get Metadata activity to get the file list from the abs_json_raw_Verde dataset.
Loop over JSON files
In the ForEach activity, I set the items as the output of the previous step. Then the ForEach will iterate over those items.
The items expression is:
@activity('GetFIleList').output.childItems
Create a dataset to represent each JSON file.
In the file path setting, I use an expression in order to dynamically read the filename from the items.
Create a dataset to represent each CSV file
Copy JSON files to the destination and format as CSV
Inside the ForEach activity, the copy data activity copies JSON file to CSV file.
In the source setting, the dataset contains a filename property which passes the item name from the ForEach activity to the abs_json_raw_Verde dataset.
In the sink setting, the dataset contains a destfilename property which passes an expression from the ForEach activity to the abs_csv_cleansed_Verde_param dataset.
The expression is to get the file name without the ".json" part and add ".csv" in the end.
@concat(
substring(
item().name,
0,
sub(length(item().name), 5)
),
'.csv'
)
All Done!
Review
With the use of parameters in a data pipeline, we can easily iterate through multiple files in a single pipeline run without the need to manually run the pipeline for each file.
It is quite helpful when we want to modify the pipeline and run it for all the files.
Comments