Use AI to speed up data transformation in Power Query
- Sean Liu
- Oct 15, 2024
- 4 min read
Data transformation can be a time-consuming process, especially when dealing with unfamiliar data or encountering multiple issues within a table. However, with the help of AI, this task can be significantly expedited. In this article, we will explore how AI can serve as your analytical assistant to streamline the data transformation procedure.
Import the data into Power Query
The first thing is to get data from the data source. I imported an order table which includes some columns like Id, unit price, shipping cost, customer Id...etc.
Copy some data
You may have plenty of data in the table you are going to transform. In this step, I would suggest that you only copy the top 10 rows. The reason is that we are going to paste this data into ChatGPT. The more data you copy, the more expenses you incur.
The steps to copy top 10 rows
Click the "Keep Top Rows"
Enter the number you like and click OK.
Copy the entire table Please make sure you click "OK" in the previous step; otherwise, you may copy plenty of data.
Ask AI for suggestions on data transformation
I use ChatGPT as the AI assistant. You may have your preferred one like Perplexity or Claude, which are also good AI chatbot.
In the chat, you can ask AI to optimize the data and give you some suggestions on Power Query, then paste the data you just copy. Please note that the data will look unstructured. Don't worry. AI is still able to understand the structure.
Useful suggestions given by AI
You will get AI-generated suggestions like the following response after clicking run.
I would say AI does give me some good suggestions, like:
Check columns like `Column26` and `Column27` for missing values. You can remove these columns if they contain unnecessary data or fill missing data if critical.
Make sure the date formats are consistent across `Order Date` and `Ship Date`.
Add new columns that help with analysis, like `Delivery Time` (difference between `Ship Date` and `Order Date`), or `Profit Margin` if it's derived from available data.
Besides, AI also provides some performance tips like query folding and incremental load, which are advanced features in Power Query.
The whole response generated by AI:
Optimizing your data in Power Query involves a few key steps to ensure data consistency, improve processing time, and enhance model performance. Below are some tips and a step-by-step guide on how you can optimize your data using Power Query:
### Data Cleaning and Transformation Tips:
1. **Remove Duplicates:**
- Ensure that you don't have any duplicate rows unless they represent legitimate transactions.
2. **Handle Missing Values:**
- Check columns like `Column26` and `Column27` for missing values. You can remove these columns if they contain unnecessary data or fill missing data if critical.
3. **Data Types:**
- Ensure all your columns have the correct data types. For example, dates should be in a date format, numeric fields should be in number or decimal format, and categorical data should be set as text.
4. **Consistent Formatting:**
- Standardize text data by trimming spaces, fixing casing inconsistencies, and removing any non-printable characters.
5. **Date Standardization:**
- Make sure the date formats are consistent across `Order Date` and `Ship Date`.
6. **Calculated Columns:**
- If relevant, add new columns that help with analysis, like `Delivery Time` (difference between `Ship Date` and `Order Date`), or `Profit Margin` if it's derived from available data.
7. **Filter Data:**
- You may want to filter your data to include only relevant records (e.g., based on a specific time period or region).
8. **Remove Unnecessary Columns:**
- Drop columns that don't contribute to your model or analysis goals. Columns like `Postal Code` or `Order ID` may be relevant depending on analysis needs, but ensure they contribute meaningfully.
### Power Query Steps:
1. **Load Data:**
- Open Power Query Editor and load your dataset.
2. **Remove Duplicates:**
- Select all columns, right-click on the selection, and choose "Remove Duplicates."
3. **Fix Data Types:**
- Go to the "Home" tab and choose "Detect Data Type" or manually set types for each column.
4. **Standardize Text:**
- Use "Transform" tab options such as "Trim", "Clean", and "Format" (uppercase, lowercase, etc.) to clean text data.
5. **Fill/Replace Missing Values:**
- Use the "Transform" tab to replace missing values in critical fields using "Replace Values" or "Fill" to carry forward last known values.
6. **Column Removal:**
- Right-click on columns like `Column26` and `Column27` and select "Remove Columns" if they have no meaningful data.
7. **Add New Columns:**
- Create new columns using custom calculations, such as `Delivery Time = [Ship Date] - [Order Date]`.
8. **Summarize/Group Data:**
- If needed, use the "Group By" option to aggregate data such as summarizing profits or sales by region or product category.
9. **Data Filtering:**
- In the Power Query Editor, filter records based on specific conditions (e.g., a range of order dates or specific product categories).
10. **Close & Load:**
- Once all transformations are applied, click on "Close & Load" to apply changes and load the data back into your model.
### Performance Tips:
- **Query Folding:** Ensure as much of the data processing as possible is done at the source if applicable (e.g., server-side operations for SQL databases).
- **Incremental Load:** If dealing with large datasets, consider incremental load strategies to process only new or modified records.
- **Use Parameters:** For dynamically filtering data, use parameters instead of hard-coded values, especially for reusability and flexibility.
Implement these steps to efficiently prepare your data for analysis or machine learning models, ensuring that your dataset is clean and optimized for processing.
Conclusion
As an experienced Power BI developer, I can do all the steps without any helps from AI but it does take me much more time to check the data and think about the steps.
Honestly, I didn't even have a look at those columns I have in the table before asking AI for suggestions. With help from AI, I can have some quick actions and insights about the data.
I would say AI does help me speed up the data transformation process.
Comments