Skip to content

Commit eb27c8a

Browse files
authored
Merge pull request #8701 from MicrosoftDocs/JasonHQX-patch-4
Update azure-synapse-link-delta-lake.md
2 parents d320684 + e56b69e commit eb27c8a

File tree

1 file changed

+15
-9
lines changed

1 file changed

+15
-9
lines changed

powerapps-docs/maker/data-platform/azure-synapse-link-delta-lake.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: jasonhuang
66
ms.reviewer: matp
77
ms.service: powerapps
88
ms.topic: how-to
9-
ms.date: 06/26/2023
9+
ms.date: 09/27/2023
1010
ms.custom: template-how-to
1111
---
1212
# Export Dataverse data in Delta Lake format
@@ -25,6 +25,12 @@ provides the following information and shows you how to perform the following ta
2525
> - For the Dataverse configuration, append-only is enabled by default to export CSV data in `appendonly` mode. But the delta lake table will have an in-place update structure because the delta lake conversion comes with a periodic merge process.
2626
> - There are no costs incurred with the creation of Spark pools. Charges are only incurred once a Spark job is executed on the target Spark pool and the Spark instance is instantiated on demand. These costs are related to the usage of Azure Synapse workspace Spark and are billed monthly. The cost of conducting Spark computing mainly depends on the time interval for incremental update and the data volumes. More information: [Azure Synapse Analytics pricing](https://azure.microsoft.com/pricing/details/synapse-analytics/)
2727
> - It's important to take these additional costs into consideration when deciding to use this feature as they are not optional and must be paid in order to continue using this feature.
28+
>
29+
> [!NOTE]
30+
> The Azure Synapse Link status in Power Apps (make.powerapps.com) reflects the delta lake conversion state:
31+
> - `Count` shows the number of records in the delta lake table.
32+
> - `Last synchronized on` Datetime represents the last successful conversion timestamp.
33+
> - `Sync status` is shown as **active** once the delta lake conversion completes.
2834
2935
## What is Delta Lake?
3036

@@ -38,15 +44,15 @@ Apache Parquet is the baseline format for Delta Lake, enabling you to leverage t
3844
- **Reliability**: Delta Lake provides ACID transactions, ensuring data consistency and reliability even in the face of failures or concurrent access.
3945
- **Performance**: Delta Lake leverages the columnar storage format of Parquet, providing better compression and encoding techniques, which can lead to improved query performance compared to query CSV files.
4046
- **Cost-effective**: The Delta Lake file format is a highly compressed data storage technology that offers significant potential storage savings for businesses. This format is specifically designed to optimize data processing and potentially reduce the total amount of data processed or running time required for on-demand computing.
41-
- **Data protection compliance**: Delta Lake with Synapse Link provides tools and features including soft-delete and hard-delete to comply various data privacy regulations, including General Data Protection Regulation (GDPR).
47+
- **Data protection compliance**: Delta Lake with Azure Synapse Link provides tools and features including soft-delete and hard-delete to comply various data privacy regulations, including General Data Protection Regulation (GDPR).
4248

43-
## How Delta Lake works with Synapse Link for Dataverse?
49+
## How Delta Lake works with Azure Synapse Link for Dataverse?
4450

45-
When setting up an Azure Synapse Link for Dataverse, you can enable the **export to Delta Lake** feature and connect with a Synapse workspace and Spark pool. Synapse Link exports the selected Dataverse tables in CSV format at designated time intervals, processing them through a Delta Lake conversion Spark job. Upon the completion of this conversion process, CSV data is cleaned up for storage saving. Additionally, a series of maintenance jobs are scheduled to run on a daily basis, automatically performing compaction and vacuuming processes to merge and clean up data files to further optimize storage and improve query performance.
51+
When setting up an Azure Synapse Link for Dataverse, you can enable the **export to Delta Lake** feature and connect with a Synapse workspace and Spark pool. Azure Synapse Link exports the selected Dataverse tables in CSV format at designated time intervals, processing them through a Delta Lake conversion Spark job. Upon the completion of this conversion process, CSV data is cleaned up for storage saving. Additionally, a series of maintenance jobs are scheduled to run on a daily basis, automatically performing compaction and vacuuming processes to merge and clean up data files to further optimize storage and improve query performance.
4652

4753
## Prerequisites
4854

49-
- Dataverse: You must have the Dataverse **system administrator** security role. Additionally, tables you want to export via Synapse Link must have the **Track changes** property enabled. More information: [Advanced options](create-edit-entities-portal.md#advanced-options)
55+
- Dataverse: You must have the Dataverse **system administrator** security role. Additionally, tables you want to export via Azure Synapse Link must have the **Track changes** property enabled. More information: [Advanced options](create-edit-entities-portal.md#advanced-options)
5056
- Azure Data Lake Storage Gen2: You must have an Azure Data Lake Storage Gen2 account and **Owner** and **Storage Blob Data Contributor** role access. Your storage account must enable **Hierarchical namespace** and **public network access** for both initial setup and delta sync. **Allow storage account key access** is required only for the initial setup.
5157
- Synapse workspace: You must have a Synapse workspace and **Owner** role in access control(IAM) and the **Synapse Administrator** role access within the Synapse Studio. The Synapse workspace must be in the same region as your Azure Data Lake Storage Gen2 account. The storage account must be added as a linked service within the Synapse Studio. To create a Synapse workspace, go to [Creating a Synapse workspace](/azure/synapse-analytics/get-started-create-workspace).
5258
- A Spark Pool in the connected Azure Synapse workspace with **Apache Spark Version 3.1** using this [recommended Spark Pool configuration](#recommended-spark-pool-configuration). For information about how to create a Spark Pool, go to [Create new Apache Spark pool](/azure/synapse-analytics/quickstart-create-apache-spark-pool-portal#create-new-apache-spark-pool).
@@ -67,9 +73,9 @@ This configuration can be considered a bootstrap step for average use cases.
6773

6874
1. Sign into [Power Apps](https://make.powerapps.com/?utm_source=padocs&utm_medium=linkinadoc&utm_campaign=referralsfromdoc) and select the environment you want.
6975
1. On the left navigation pane, select **Azure Synapse Link**. [!INCLUDE [left-navigation-pane](../../includes/left-navigation-pane.md)]
70-
1. On the command bar select **+ New link**
76+
1. On the command bar, select **+ New link**
7177
1. Select **Connect to your Azure Synapse Analytics workspace**, and then select the **Subscription**, **Resource group**, and **Workspace name**.
72-
1. Select **Use Spark pool for processing**, and then select the pre-created **Spark pool** and **Storage account**.
78+
1. Select **Use Spark pool for processing**, and then select the precreated **Spark pool** and **Storage account**.
7379
:::image type="content" source="media/synapse-link-usesparkpool.png" alt-text="Azure Synapse Link for Dataverse configuration that includes spark pool.":::
7480

7581
1. Select **Next**.
@@ -84,14 +90,14 @@ This configuration can be considered a bootstrap step for average use cases.
8490

8591
## View your data from Synapse workspace
8692

87-
1. Select the Azure Synapse link you want, and then select **Go to Azure Synapse Analytics workspace** on the command bar.
93+
1. Select the Azure Synapse Link you want, and then select **Go to Azure Synapse Analytics workspace** on the command bar.
8894
1. Expand **Lake Databases** on the left pane, select **dataverse-***environmentNameorganizationUniqueName*,
8995
and then expand **Tables**. All Parquet tables are listed and available for analysis with the naming convention
9096
*DataverseTableName.* **(Non_partitioned Table)**.
9197

9298
## View your data from Azure Data Lake Storage Gen2
9399

94-
1. Select the Azure Synapse link you want, and then select **Go to Azure data lake** on the command
100+
1. Select the Azure Synapse Link you want, and then select **Go to Azure data lake** on the command
95101
bar.
96102
1. Select the **Containers** under **Data Storage**.
97103
1. Select **dataverse-* **environmentName-organizationUniqueName*. All parquet files are stored in the

0 commit comments

Comments
 (0)