|
| 1 | +--- |
| 2 | +title: "Ingest Common Data Service data with Azure Data Factory | MicrosoftDocs" |
| 3 | +ms.custom: "" |
| 4 | +ms.date: 07/29/2020 |
| 5 | +ms.reviewer: "matp" |
| 6 | +author: sabinn-msft |
| 7 | +ms.service: powerapps |
| 8 | +ms.suite: "" |
| 9 | +ms.tgt_pltfrm: "" |
| 10 | +ms.topic: "article" |
| 11 | +applies_to: |
| 12 | + - "powerapps" |
| 13 | +ms.assetid: |
| 14 | +ms.author: "matp" |
| 15 | +manager: "kvivek" |
| 16 | +search.audienceType: |
| 17 | + - maker |
| 18 | +search.app: |
| 19 | + - PowerApps |
| 20 | + - D365CE |
| 21 | +--- |
| 22 | + |
| 23 | +# Ingest exported Common Data Service data with Azure Data Factory |
| 24 | + |
| 25 | +After exporting data from Common Data Service to Azure Data Lake Storage Gen2 with the Export to Data Lake service, you can use Azure Data Factory to create dataflows, transform your data, and run analysis. |
| 26 | + |
| 27 | +This article shows you how to perform the following tasks: |
| 28 | + |
| 29 | +1. Generate a manifest.json from the existing model.json in the Data Lake Storage Gen2 storage account that holds the exported data. |
| 30 | + |
| 31 | +2. Set the Data Lake Storage Gen2 storage account with the Common Data Service data as a *source* in a Data Factory dataflow. |
| 32 | + |
| 33 | +3. Set the Data Lake Storage Gen2 storage account with the Common Data Service data as a *sink* in a Data Factory dataflow. |
| 34 | + |
| 35 | +4. Run your dataflow by creating a pipeline. |
| 36 | + |
| 37 | +## Prerequisites |
| 38 | + |
| 39 | +This section describes the prerequisites necessary to ingest exported Common Data Service data with Data Factory. |
| 40 | + |
| 41 | +### Azure roles |
| 42 | + |
| 43 | +The user account that's used to sign in to Azure must be a member of the |
| 44 | +*contributor* or *owner* role, or an *administrator* of the Azure subscription. |
| 45 | +To view the permissions that you have in the subscription, go to the [Azure portal](https://portal.azure.com/), select your username in the upper-right corner, select **...**, and then select **My permissions**. If you have access to multiple subscriptions, select the appropriate one. To create and manage child resources for Data Factory in the Azure portal—including datasets, linked services, pipelines, triggers, and integration runtimes—you must belong to the *Data Factory Contributor* role at the resource group level or above. |
| 46 | + |
| 47 | +### Export to data lake |
| 48 | + |
| 49 | +This article assumes that you've already exported Common Data Service data by using the [Export to Data Lake service](export-to-data-lake.md). |
| 50 | + |
| 51 | +In this example, account entity data is exported to the data lake. |
| 52 | + |
| 53 | +## Generate the manifest.json from the model.json |
| 54 | + |
| 55 | +1. Go to [this GitHub repository](https://github.com/t-sazaki/ConvertModelJsonToManifestOriginal) and download it to your computer. |
| 56 | + |
| 57 | +2. Go to ConvertModelJsonToManifest-master/ConvertModelJsonToManifest-master/ConvertModelJsonToManifest.sln. |
| 58 | + |
| 59 | +3. Right-click to select the file, and then open it in Visual Studio. If you don't have Visual Studio, you can follow this article to install it: [Install Visual Studio](/visualstudio/install/install-visual-studio?view=vs-2019). |
| 60 | + |
| 61 | +4. Go to **Project** > **Manage NuGet Packages**, and ensure that the |
| 62 | + following NuGet packages are installed: |
| 63 | + |
| 64 | + - Microsoft.CommonDataModel.ObjectModel |
| 65 | + |
| 66 | + - Microsoft.CommonDataModel.ObjectModel.Adapter.Adls |
| 67 | + |
| 68 | + - Microsoft.IdentityModel.Clients.ActiveDirectory |
| 69 | + |
| 70 | + - Newtonsoft.Json |
| 71 | + |
| 72 | + - NLog |
| 73 | + |
| 74 | +5. If you're missing the Common Data Model packages or they're unavailable, you can add them by following these steps: |
| 75 | + |
| 76 | + 1. Select the gear icon to access package settings. |
| 77 | +  |
| 78 | + |
| 79 | + 2. Select **+** in the pop-up window to add a new package source. |
| 80 | +  |
| 81 | + |
| 82 | +6. Configure the new package source, and then select **OK**: |
| 83 | + |
| 84 | + 1. For **Name**, enter **CDM**. |
| 85 | + |
| 86 | + 2. For **Source**, enter **https[]()://commondatamodel.pkgs.visualstudio.com/_packaging/CDM/nuget/v3/index.json**. |
| 87 | + |
| 88 | +7. Make sure that the package source is set to **All**. |
| 89 | + |
| 90 | +8. In Program.cs, fill in the storage container information on line 26, as indicated here: |
| 91 | + |
| 92 | + 1. Replace <span><b>your-storage-account.dfs.core.windows.net</b></span> by substituting the name of your storage account. |
| 93 | +  |
| 94 | + |
| 95 | + 1. Replace **your-folder-name** with the folder containing the model.json file. Go to your storage account **Overview** > **Storage Explorer** > **Containers**, and then select the correct folder name. |
| 96 | +  |
| 97 | + |
| 98 | + 1. Replace the access key with the access key for this storage account. Go to your storage account, and on the left panel under **Settings**, select **Access Keys**. Select **Copy** to copy the access key and replace it in the code. |
| 99 | + |
| 100 | +9. Optionally, you can change the name of the manifest file as indicated in the code comments. |
| 101 | + |
| 102 | +10. Run the code, and refresh your storage container to find the new manifest, entity, resolved entity, and config files. |
| 103 | + |
| 104 | +## Set the Data Lake Storage Gen2 storage account as a source |
| 105 | + |
| 106 | +1. Open [Azure Data Factory](https://ms-adf.azure.com/home?factory=%2Fsubscriptions%2Fd410b7d3-02af-45c8-895e-dc27c5b35342%2FresourceGroups%2Fsama%2Fproviders%2FMicrosoft.DataFactory%2Ffactories%2Fadfathena), and then select **Create data flow**. |
| 107 | + |
| 108 | +2. Turn on **Data flow debug** mode. This might take up to 10 minutes, but you |
| 109 | + can proceed with the following steps. |
| 110 | + |
| 111 | +  |
| 112 | + |
| 113 | +3. Select **Add Source.** |
| 114 | + |
| 115 | +  |
| 116 | + |
| 117 | +4. Under **Source settings**, do the following<!--Suggested. It's "configure the following options" here and "select the following options" in the next procedure, but these are a combination of entering and selecting.-->: |
| 118 | + |
| 119 | + - **Output stream name**: Enter the name you want. |
| 120 | + - **Source type**: Select **Common Data Model**. |
| 121 | + - **Linked Service**: Select the storage account from the drop-down menu, and then link a new service by providing your subscription details and leaving all default configurations. |
| 122 | + - **Sampling**: If you want to use all your data, select **Disable**. |
| 123 | + |
| 124 | +5. Under **Source options**, do the following: |
| 125 | + |
| 126 | + - **Metadata format**: Select **Manifest**. |
| 127 | + - **Root ___location**: In the first box (**Container**), enter the container name. In the second box (**Folder path**), enter **/**. |
| 128 | + - **Manifest file**: Leave the first box (**Entity path**) blank, and in the second box (**Manifest name (default)**), enter the first part of the manifest file name, such as *test.manifest.cdm.json* **/** *test*). |
| 129 | + |
| 130 | +  |
| 131 | + |
| 132 | + - **Schema linked service**: Select the same storage container as the source settings. |
| 133 | + - **Container**: Enter the container name. |
| 134 | + - **Corpus folder**: Leave blank. |
| 135 | + - **Entity**: Enter text in the format **/*entity*Res.cdm.json/*entity***, replacing *entity* with the entity name you want, such as account. |
| 136 | + |
| 137 | +  |
| 138 | + |
| 139 | +## Set the Data Lake Storage Gen2 storage account |
| 140 | + |
| 141 | +After setting the exported Common Data Service data in the Data Lake Storage Gen2 storage account as a source in the Data Factory dataflow, there are many possibilities for transforming your data. More information: [Azure Data Factory](/azure/data-factory/introduction) |
| 142 | + |
| 143 | +Ultimately, you must set a sink for your dataflow. Follow these instructions to set the Data Lake Storage Gen2 storage account with the data exported by the Export to Data Lake service as your sink. |
| 144 | + |
| 145 | +1. Select **+** in the lower-right corner, and then search for and select **Sink**. |
| 146 | + |
| 147 | +2. On the **Sink** tab, do the following: |
| 148 | + |
| 149 | + - **Output stream name**: Enter the name you want, such as *Sink1*. |
| 150 | + - **Incoming stream**: Select the source name you want. |
| 151 | + - **Sink type**: Select **Common Data Model**. |
| 152 | + - **Linked service**: Select your Data Lake Storage Gen2 storage container that has the data you exported by using the Export to Data Lake service. |
| 153 | + |
| 154 | +  |
| 155 | + |
| 156 | +3. On the **Settings** tab, do the following: |
| 157 | + |
| 158 | + - **Schema linked service**: Select the final destination storage container. |
| 159 | + - **Container**: Enter the container name. |
| 160 | + - **Corpus folder**: Enter **/** |
| 161 | + - **Entity**: Enter text in the format **/*entity*Res.cdm.json/*entity***, replacing *entity* with the entity name you want, such as account. |
| 162 | + |
| 163 | +  |
| 164 | + |
| 165 | + - **Root Location**: In the first box (**Container**), enter the container name. In the second box (**Folder path**), enter **/**. |
| 166 | + - **Manifest file**: Leave the first box (**Entity path**) blank, and in the second box (**Manifest name (default)**), enter the first part of the manifest file name, such as *test.manifest.cdm.json / test*. |
| 167 | + - **Format type**: Select your file format preference. |
| 168 | + |
| 169 | +  |
| 170 | + |
| 171 | +## Run your dataflow |
| 172 | + |
| 173 | +1. In the left pane under **Factory Resources**, select **+**, and then select **Pipeline**. |
| 174 | + |
| 175 | +  |
| 176 | + |
| 177 | +2. Under **Activities**, select **Move & Transform**, and then drag **Data flow** to the workspace. |
| 178 | + |
| 179 | +3. Select **Use existing data flow**, and then select the dataflow that you |
| 180 | + created in the previous steps. |
| 181 | + |
| 182 | +4. Select **Debug** from the command bar. |
| 183 | + |
| 184 | +5. Let the dataflow run until the bottom view shows that is has been completed. This might take a few minutes. |
| 185 | + |
| 186 | +6. Go to the final destination storage container, and find the transformed entity data file. |
| 187 | + |
| 188 | +### See also |
| 189 | + |
| 190 | +[Analyze Common Data Service data in Azure Data Lake Storage Gen2 with Power BI](export-to-data-lake-data-powerbi.md) |
0 commit comments