Skip to content

Commit 750b4b2

Browse files
authored
Merge pull request MicrosoftDocs#2176 from sama-zaki/live
Updating Ingest exported Microsoft Dataverse data with Azure Data Factory Documentation
2 parents a48aba6 + 4757a8d commit 750b4b2

10 files changed

+46
-87
lines changed
Loading
Loading
Loading

powerapps-docs/maker/data-platform/export-to-data-lake-data-adf.md

Lines changed: 46 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Ingest Microsoft Dataverse data with Azure Data Factory | MicrosoftDocs"
33
description: Learn how to use Azure Data Factory to create dataflows, transform, and run analysis on Dataverse data
44
ms.custom: ""
5-
ms.date: 07/29/2020
5+
ms.date: 03/22/2021
66
ms.reviewer: "matp"
77
author: sabinn-msft
88
ms.service: powerapps
@@ -30,9 +30,9 @@ After exporting data from Dataverse to Azure Data Lake Storage Gen2 with the Exp
3030

3131
This article shows you how to perform the following tasks:
3232

33-
1. Generate a manifest.json from the existing model.json in the Data Lake Storage Gen2 storage account that holds the exported data.
33+
1. Set the Data Lake Storage Gen2 storage account with the Dataverse data as a *source* in a Data Factory dataflow.
3434

35-
2. Set the Data Lake Storage Gen2 storage account with the Dataverse data as a *source* in a Data Factory dataflow.
35+
2. Transform the Dataverse data in Data Factory with a dataflow.
3636

3737
3. Set the Data Lake Storage Gen2 storage account with the Dataverse data as a *sink* in a Data Factory dataflow.
3838

@@ -41,78 +41,27 @@ This article shows you how to perform the following tasks:
4141
## Prerequisites
4242

4343
This section describes the prerequisites necessary to ingest exported Dataverse data with Data Factory.
44-
44+
4545
### Azure roles
4646

4747
The user account that's used to sign in to Azure must be a member of the
4848
*contributor* or *owner* role, or an *administrator* of the Azure subscription.
4949
To view the permissions that you have in the subscription, go to the [Azure portal](https://portal.azure.com/), select your username in the upper-right corner, select **...**, and then select **My permissions**. If you have access to multiple subscriptions, select the appropriate one. To create and manage child resources for Data Factory in the Azure portal—including datasets, linked services, pipelines, triggers, and integration runtimes—you must belong to the *Data Factory Contributor* role at the resource group level or above.
5050

5151
### Export to data lake
52-
53-
This article assumes that you've already exported Dataverse data by using the [Export to Data Lake service](export-to-data-lake.md).
52+
This guide assumes that you've already exported Dataverse data by using the [Export to Data Lake service](export-to-data-lake.md).
5453

5554
In this example, account table data is exported to the data lake.
5655

57-
## Generate the manifest.json from the model.json
58-
59-
1. Go to [this GitHub repository](https://github.com/t-sazaki/ConvertModelJsonToManifestOriginal) and download it to your computer.
60-
61-
2. Go to ConvertModelJsonToManifest-master/ConvertModelJsonToManifest-master/ConvertModelJsonToManifest.sln.
62-
63-
3. Right-click to select the file, and then open it in Visual Studio. If you don't have Visual Studio, you can follow this article to install it: [Install Visual Studio](/visualstudio/install/install-visual-studio?view=vs-2019&preserve-view=true).
64-
65-
4. Go to **Project** > **Manage NuGet Packages**, and ensure that the
66-
following NuGet packages are installed:
67-
68-
- Microsoft.CommonDataModel.ObjectModel
69-
70-
- Microsoft.CommonDataModel.ObjectModel.Adapter.Adls
71-
72-
- Microsoft.IdtableModel.Clients.ActiveDirectory
73-
74-
- Newtonsoft.Json
75-
76-
- NLog
77-
78-
5. If you're missing the Common Data Model packages or they're unavailable, you can add them by following these steps:
79-
80-
1. Select the gear icon to access package settings.
81-
![Package settings gear icon](media/package-settings-gear.png "Package settings gear icon")
82-
83-
2. Select **+** in the pop-up window to add a new package source.
84-
![Add a new package](media/add-new-package.png "Add a new package")
85-
86-
3. Configure the new package source, and then select **OK**:
56+
### Azure Data Factory
8757

88-
1. For **Name**, enter **CDM**.
89-
90-
2. For **Source**, enter **https[]()://commondatamodel.pkgs.visualstudio.com/_packaging/CDM/nuget/v3/index.json**.
91-
92-
4. Make sure that the package source is set to **All**.
93-
94-
8. In Program.cs, fill in the storage container information on line 26, as indicated here:
95-
96-
1. Replace **your-storage-account** by substituting the name of your storage account.
97-
![Your storage account substitution](media/your-storage-account.png "Your storage account substitution")
98-
99-
1. Replace **your-folder-name** with the folder containing the model.json file. Go to your storage account **Overview** > **Storage Explorer** > **Containers**, and then select the correct folder name. 
100-
![Replace your folder name](media/replace-your-folder-name.png "Replace your folder name")
101-
102-
1. Replace the access key with the access key for this storage account. Go to your storage account, and on the left panel under **Settings**, select **Access Keys**. Select **Copy** to copy the access key and replace it in the code.
103-
104-
9. Optionally, you can change the name of the manifest file as indicated in the code comments.
105-
106-
10. Run the code, and refresh your storage container to find the new manifest, table, resolved table, and config files.
107-
108-
> [!NOTE]
109-
> If there are changes made to the metadata of the table, you must delete the generated files from the Data Lake and regenerate an updated manifest file by running the code again. It is recommended that you maintain the same name of the manifest file, so there is no need to update any Azure Data Factory dataflows or pipelines.
58+
This guide assumes that you've already created a data factory under the same subscription and resource group as the storage account containing the exported Dataverse data.
11059

11160
## Set the Data Lake Storage Gen2 storage account as a source
11261

113-
1. Open [Azure Data Factory](https://ms-adf.azure.com/home?factory=%2Fsubscriptions%2Fd410b7d3-02af-45c8-895e-dc27c5b35342%2FresourceGroups%2Fsama%2Fproviders%2FMicrosoft.DataFactory%2Ffactories%2Fadfathena), and then select **Create data flow**.
62+
1. Open [Azure Data Factory](https://ms-adf.azure.com/en-us/datafactories) and select the data factory that is on the same subscription and resource group as the storage account containing your exported Dataverse data. Then select **Create data flow** from the home page.
11463

115-
2. Turn on **Data flow debug** mode. This might take up to 10 minutes, but you
64+
2. Turn on **Data flow debug** mode and select your preferred time to live. This might take up to 10 minutes, but you
11665
can proceed with the following steps.
11766

11867
![Dataflow debug mode](media/data-flow-debug.png "Dataflow debug mode")
@@ -121,7 +70,7 @@ In this example, account table data is exported to the data lake.
12170

12271
![Add source](media/add-source.png "Add source")
12372

124-
4. Under **Source settings**, do the following<!--Suggested. It's "configure the following options" here and "select the following options" in the next procedure, but these are a combination of entering and selecting.-->:
73+
4. Under **Source settings**, do the following:
12574

12675
- **Output stream name**: Enter the name you want.
12776
- **Source type**: Select **Common Data Model**.
@@ -130,50 +79,60 @@ In this example, account table data is exported to the data lake.
13079

13180
5. Under **Source options**, do the following:
13281

133-
- **Metadata format**: Select **Manifest**.
134-
- **Root ___location**: In the first box (**Container**), enter the container name. In the second box (**Folder path**), enter **/**.
135-
- **Manifest file**: Leave the first box (**table path**) blank, and in the second box (**Manifest name (default)**), enter the first part of the manifest file name, such as *test.manifest.cdm.json* **/** *test*).
82+
- **Metadata format**: Select **Model.json**.
83+
- **Root ___location**: Enter the container name in the first box (**Container**) or **Browse** for the container name and select **OK**.
84+
- **Entity**: Enter the table name or **Browse** for the table.
85+
86+
![Source options](media/source-options.png "Source options")
87+
88+
6. Check the **Projection** tab to ensure that your schema has been imported sucessfully. If you do not see any columns, select **Schema options** and check the **Infer drifted column types** option. Configure the formatting options to match your data set then select **Apply**.
13689

137-
![Source options, part one](media/source-options.png "Source options, part one")
90+
7. You can view your data in the **Data preview** tab to ensure the Source creation was complete and accurate.
13891

139-
- **Schema linked service**: Select the same storage container as the source settings.
140-
- **Container**: Enter the container name.
141-
- **Corpus folder**: Leave blank.
142-
- **table**: Enter text in the format **/*table*Res.cdm.json/*table***, replacing *table* with the table name you want, such as account.
92+
## Transform your Dataverse data
93+
After setting the exported Dataverse data in the Data Lake Storage Gen2 storage account as a source in the Data Factory dataflow, there are many possibilities for transforming your data. More information: [Azure Data Factory](/azure/data-factory/introduction)
14394

144-
![Source options, part two](media/source-options-two.png "Source options, part two")
95+
Follow these instructions to create a rank for the each row by the *revenue* of the account.
14596

146-
## Set the Data Lake Storage Gen2 storage account
97+
1. Select **+** in the lower-right corner of the previous transformation, and then search for and select **Rank**.
14798

148-
After setting the exported Dataverse data in the Data Lake Storage Gen2 storage account as a source in the Data Factory dataflow, there are many possibilities for transforming your data. More information: [Azure Data Factory](/azure/data-factory/introduction)
99+
2. On the **Rank settings** tab, do the following:
100+
- **Output stream name**: Enter the name you want, such as *Rank1*.
101+
- **Incoming Stream**: Select the source name you want. In this case, the source name from the previous step.
102+
- **Options**: Leave the options unchecked.
103+
- **Rank column**: Enter the name of the rank column generated.
104+
- **Sort conditions**: Select the *revenue* column and sorty by *Descending* order.
149105

150-
Ultimately, you must set a sink for your dataflow. Follow these instructions to set the Data Lake Storage Gen2 storage account with the data exported by the Export to Data Lake service as your sink.
106+
![Configure the Rank settings tab](media/configure-rank.png "Configure the Rank settings tab")
151107

152-
1. Select **+** in the lower-right corner, and then search for and select **Sink**.
108+
3. You can view your data in the **data preview** tab where you will find the new *revenueRank* column at the right-most position.
109+
110+
## Set the Data Lake Storage Gen2 storage account as a sink
111+
Ultimately, you must set a sink for your dataflow. Follow these instructions to place your transformed data as a Delimited Text file in the Data Lake.
112+
113+
1. Select **+** in the lower-right corner of the previous transformation, and then search for and select **Sink**.
153114

154115
2. On the **Sink** tab, do the following:
155116

156117
- **Output stream name**: Enter the name you want, such as *Sink1*.
157-
- **Incoming stream**: Select the source name you want.
158-
- **Sink type**: Select **Common Data Model**.
118+
- **Incoming stream**: Select the source name you want. In this case, the source name from the previous step.
119+
- **Sink type**: Select **DelimitedText**.
159120
- **Linked service**: Select your Data Lake Storage Gen2 storage container that has the data you exported by using the Export to Data Lake service.
160121

161122
![Configure the Sink tab](media/configure-sink.png "Configure the Sink tab")
162123

163124
3. On the **Settings** tab, do the following:
164125

165-
- **Schema linked service**: Select the final destination storage container.
166-
- **Container**: Enter the container name.
167-
- **Corpus folder**: Enter **/**
168-
- **table**: Enter text in the format **/*table*Res.cdm.json/*table***, replacing *table* with the table name you want, such as account.
169-
170-
![Configure the sink Settings tab, part one](media/configure-settings.png "Configure the sink Settings tab, part one")
126+
- **Folder path**: Enter the container name in the first box (**File system**) or **Browse** for the container name and select **OK**.
127+
- **File name option**: Select **output to single file**.
128+
- **Output to single file**: Enter a file name, such as *ADFOutput*
129+
- Leave all other default settings.
171130

172-
- **Root Location**: In the first box (**Container**), enter the container name. In the second box (**Folder path**), enter **/**.
173-
- **Manifest file**: Leave the first box (**table path**) blank, and in the second box (**Manifest name (default)**), enter the first part of the manifest file name, such as *test.manifest.cdm.json / test*.
174-
- **Format type**: Select your file format preference.
131+
![Configure the sink Settings tab](media/configure-settings.png "Configure the sink Settings tab")
132+
133+
3. On the **Optimize** tab, set the **Partition option** to **Single partition**.
175134

176-
![Configure the sink Settings tab, part two](media/configure-settings-two.png "Configure the sink Settings tab, part two")
135+
4. You can view your data in the **data preview** tab.
177136

178137
## Run your dataflow
179138

@@ -197,4 +156,4 @@ Ultimately, you must set a sink for your dataflow. Follow these instructions to
197156
[Analyze Dataverse data in Azure Data Lake Storage Gen2 with Power BI](export-to-data-lake-data-powerbi.md)
198157

199158

200-
[!INCLUDE[footer-include](../../includes/footer-banner.md)]
159+
[!INCLUDE[footer-include](../../includes/footer-banner.md)]
Loading
Binary file not shown.
Loading
Loading
Binary file not shown.
Loading

0 commit comments

Comments
 (0)