Skip to content

Commit 9ee1a82

Browse files
JoanneHendricksonVesaJuvonen
authored andcommitted
Create export-amr-api.md (SharePoint#4459)
NEW content New Export API (Asynchronous Metadata Read API).
1 parent 89f5578 commit 9ee1a82

File tree

1 file changed

+252
-0
lines changed

1 file changed

+252
-0
lines changed

docs/apis/export-amr-api.md

Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
---
2+
title: "Sharepoint Migration Export (Asynchronous Metadata Read) API"
3+
ms.reviewer:
4+
ms.author: jhendr
5+
author: JoanneHendrickson
6+
manager: pamgreen
7+
audience: ITPro
8+
ms.topic: article
9+
ms.prod: sharepoint-server-itpro
10+
localization_priority: Priority
11+
ROBOTS: NOINDEX, NOFOLLOW
12+
ms.collection:
13+
- Strat_SP_gtc
14+
- SPMigration
15+
description: "Sharepoint Migration Export (Asynchronous Metadata Read) API"
16+
---
17+
18+
# Sharepoint Migration Export (Asynchronous Metadata Read) API
19+
20+
21+
## Overview
22+
The goal of the new Migration Asynchronous Metadata Read API is to reduce the number of CSOM calls, reduce throttling, and improve overall migration performance. Instead of calling thousands of CSOM calls to query information from SPO, the new Migration Asynchronous Metadata Read can return the same amount of data in a single read.
23+
24+
When the new Sharepoint Migration Export (Asynchronous Metadata Read) API performs a read operation of a provided URL, the Microsoft backend software aggregates all the information into a designated manifest. The ISV can read back from the manifest and parse the metadata without sending thousands of CSOM calls individually.
25+
26+
This document targets ISVs and any third-party vendors/developers who are developing and maintaining a migration tool.
27+
28+
29+
30+
### Background:
31+
Currently, the SharePoint Online Migration API, [CreateMigrationJob](https://docs.microsoft.com/en-us/sharepoint/dev/apis/migration-api-overview), lets your migration tool efficiently migrate large amounts data to SharePoint Online. However, the lack of an official API to read content from SharePoint Online means that these tools must rely on CSOM function calls to perform individual metadata read operations.
32+
33+
</br>
34+
Large numbers of CSOM calls increase the likelihood of throttling which impacts migration performance and customer experience. Ineffective CSOM usage results in large SQL round trip per function calls that can potentially bring down the database and impact its reliability.
35+
36+
A migration performance study identified four areas where CSOM calls are heavily used:
37+
- **Incremental migration** relies on CSOM calls to retrieve the SharePoint online (SPO) content. It compares it with the source ___location to determine if there have been any changes to the content and whether to proceed with migration.
38+
- **Structure creation** leverages CSOM calls for site, webpart and navigation creation.
39+
- **After migration verification** is done when migration is completed and is used to ensure the source and destination file metadata matches.
40+
- **Permission settings** are CSOM function calls made getting user permission information.
41+
42+
## Sharepoint Migration Export (Asynchronous Metadata Read) API
43+
44+
The Sharepoint Migration Export (Asynchronous Metadata Read) API aims to reduce the CSOM calls in areas: incremental migration, after migration verification and permission settings.
45+
46+
>[!Note]
47+
>The first version of the Sharepoint Migration Export (Asynchronous Metadata Read) API supports files, folders, lists, list items, and the document library. Permissions are expected to be covered in a subsequent version.
48+
49+
Key supported features:
50+
51+
- Ability to read up to 1 million items with a single API call. For more information, see Limitations.
52+
- Incremental migration feature support returning of item changed since last query with changeToken feature
53+
- Ability to include a rich set of metadata per item
54+
- Ability to return only top-level structure without subfolders or children.
55+
56+
More detailed information about the features and the API description is covered in the section below.
57+
58+
The new Migration Asynchronous Read API is:
59+
60+
```csharp
61+
62+
public SPAsyncReadJobInfo CreateSPAsyncReadJob(
63+
Uri rootObjectUri,
64+
SPAsyncReadOptions readOptions,
65+
EncryptionOption encryptionOption,
66+
string azureContainerManifestUri,
67+
string azureQueueReportUri)
68+
69+
```
70+
71+
The API is made up of five input parameters and one output structure field.
72+
73+
74+
75+
## Input Parameters
76+
77+
### URL
78+
79+
The full path URL lets your migration tool to specify the root URL path of the SharePoint list, files/folder document library to be read. By default, the server-side code will read and return all the metadata of files, folders and root objects including subfolders and their children content.
80+
81+
*Example:*
82+
This document library URL, https://<spam><spam>www.contoso.com/Shared%20Document<spam>, will be read back for metadata of any files or folders that live under the root URL.
83+
84+
<spam><spam>https://www.contoso.com/Shared%20Documents/FolderA/<spam><spam>, will be read back for children metadata in FolderA.
85+
86+
#### readOptions Flag
87+
The read asynchronous function will include the SPAsyncReadOptions structure which covers the optional flags to allow the user to specify version and security setting on the site level more is described below.
88+
89+
IncludeVersions{ get; set; }
90+
91+
If set, this indicates all the files and list item version history is to be included in the export operation. If absent, only the most recent version is provided.
92+
<br>
93+
<br>
94+
95+
IncludeSecurity{ get; set; }
96+
97+
This flag indicates whether to include all user or group information from a site. By default, it assumes the security is not set, hence no user or group information is provided.
98+
99+
100+
public bool IncludeDirectDescendantsOnly { get; set;}
101+
102+
If specified only the top level metadata item is read back. Example: The root URL contains file A and folder B. If this flag is specified, the manifest returns only file A and folder B metadata. It will not return any metadata included inside folder B.
103+
104+
The use case for this function: The ISV can issue a default read to retrieve the top-level items and then issue multiple *CreateSPAsyncReadJob* to read back all the sub folder content in parallel to improve throughput.
105+
106+
107+
public bool IncludeExtendedMetadata { get; set; }
108+
109+
This flag indicates whether to return the extended set of metadata content of object query. By default this option is off and only basic content is provided (e.g. names, URL, author, modifier, dates) . Turning this flag on provides all the metadata content; however, it will also impact the performance as query will take longer.
110+
111+
Recommendation is to keep the default for file share migration, but consider setting this flag on for Sharepoint on-prem or other more complex migration.
112+
113+
public string StartChangeToken { get; set; }
114+
115+
This option applies to input URL of list or document library only.
116+
117+
One of the key CSOM contributor is incremental migration. ChangeToken idea is introduced to reduce the unnecessary CSOM calls. If StartChangeToken is not specified, the CreateSPAsyncReadJob will query and read back all the items specified by the API function. Once specified with the ChangeToken value, only the item changed since last query is returned.
118+
119+
During incremental migration, instead of query everything again, by populating StartChangeToken with the change token received from the CurrentChangeToken output in returning job info, createSPAsyncReadJob then returns only the items that got changed since the specified StartChangeToken, reducing the overall CSOM calls.
120+
121+
Below is a sample of how the *startChangeToken* might work. This example uses the optional feature setting for initial call and the parameter setting for incremental passes.
122+
123+
![AMR flow](media/async-read-api-flow.png)
124+
125+
#### Invalid Value
126+
127+
If an invalid value is detected, other than NULL, an error will be generated, and the operation will be terminated.
128+
129+
#### encryptionOption
130+
131+
This is an optional parameter. If it is specified, the AES256CBCKey is used to encrypt output files and queue messages. Otherwise, there is no encryption.
132+
133+
For more information, see [EncryptionOption Class](https://docs.microsoft.com/en-us/dotnet/api/microsoft.sharepoint.client.encryptionoption).
134+
135+
136+
#### azureContainerManifestUri
137+
138+
The valid URL including SAS token for accessing the Azure Blob Storage Container which contains the block blobs for the manifest and other package describing XML files. This ___location will also be used for the log output response. The SAS token must have been created with only Read, List and Write permissions or the asynchronous metadata read job will fail. The SAS token should at least have a lifetime that starts at from no later than when the job was submitted, until a reasonable time for successful import to have concluded.
139+
140+
#### azureQueueReportUri
141+
The valid URL including SAS token for accessing the user provided Azure Queue used for returning notifications of asynchronous metadata read job progress. If this value is not null and proper access is granted in the SAS token in this URI, it will be used for real time status update. The SAS token must have been created with Add permissions or the migration job will be unable to add events to the queue.
142+
143+
Once accepted, the job ID will be written to the notification queue if it was provided and access is valid. The notification queue can be used for multiple migration jobs at the same time, as each job will identify itself in values sent back to the notification queue.
144+
145+
146+
## Output Parameters
147+
148+
### CurrentChangeToken
149+
150+
public string CurrentChangeToken { get; set; }
151+
152+
This function returns the changeToken associates with this query. By specifying this changeToken in the input field with subsequent read, the API will return only items changed since this last query.
153+
154+
#### Manifest Output
155+
156+
After the asyncMigrationRead function finishes execution, the final manifest will be placed in the container specified, with naming convention of “<jobid>/<filename>”. Manifest export package structure will be like the *createMigration* Import Package structure. The general output structure is summarized in table below.
157+
158+
159+
|**XML file**|**Schema File**|**Description**|
160+
|:-----|:-----|:-----|
161+
|ExportSettings.XML|DeploymentExportSettings Schema|ExportSettings.XML does the following:</br></br>- Contains the export settings specified by using the SPExportSettings class and other classes that are part of the content migration object model. </br></br>- Ensures that the subsequent export process (at the migration target site) enforces the directives specified in the export settings.</br></br>- Maintains a catalog of all objects exported to the migration package.|
162+
|LookupListMap.XML|DeploymentLookupListMap Schema|Provides validation for the LookupListMap.XML file exported into the content migration package. LookupListMap.XML maintains a simple lookup list that records SharePoint list item (list item to list item) references.|
163+
|Manifest.XML|DeploymentManifest Schema|Provides validation for the Manifest.xml file that is exported into the content migration package.Provides a comprehensive manifest containing listings of both the contents and the structure of the destination site (E.g. SPO) . |
164+
|Requirements.XML|DeploymentRequirements Schema|"Provides validation for the Requirements.xml file exported into the content migration package. Requirements.xml maintains list of deployment requirements in the form of installation requirements on the migration target, such as feature definitions, template versions, Web Part assemblies, and language packs."|
165+
|RootObjectMap.XML|DeploymentRootObjectMap Schema|"Provides validation for the RootObjectMap.xml file exported into the content migration package.RootObjectMap.xml maintains a list of mappings of secondary (dependent) objects, which allows the import phase of the migration operation to correctly place the dependent objects relative to the locations of the root object mappings."|
166+
|SystemData.XML|DeploymentSystemData Schema|Provides validation for the SystemData.xml file exported into the content migration package.SystemData.xml does the following: Collects a variety of low-level system data. Records the number and names of Manifest.xml files (in cases where the migration uses multiple manifests).|
167+
|UserGroupMap.XML|DeploymentUserGroupMap Schema|Provides validation for the UserGroup.xml file exported into the content migration package. UserGroup.xml maintains a list of users and user security groups with respect to access security and permissions.|
168+
|ViewFormsList.XML|DeploymentViewFormsList Schema|Provides validation for the ViewFormsList.xml file exported into the content migration package.ViewFormsList.xml maintains a list of Web Parts and tracks whether each is a view or form.|
169+
170+
#### JobQueueUri
171+
172+
public Uri JobQueueUri { get; set; }
173+
174+
The reporting features is the same as createMigrationJob. Logging will be provided to track the status of the asynchronous read. In additional, the log will provide an estimate number of items to be read per url after scan through the database and a rough estimate for your tools.
175+
In terms of blob queue permission and settings, all access will be by default and the same as when the ISV called ProvisionMigrationContainer during the createMigrationJob.
176+
177+
#### EncryptionKey:
178+
public byte[] EncryptionKey { get; set; }</br></br>
179+
It returns the AES256CBC encryption key used to decrypt the message in azureManifest container and azureReport Queue.
180+
181+
|**Output parameter**|**Description**|
182+
|:-----|:-----|
183+
|JobID/GUID|Return a unique Job ID associated with this asynchronous read|
184+
|AzureContainerManifest|Return the URL for accessing the async read manifest|
185+
|JobQueueUri|URL for accessing Azure queue used for returning notification of migration job process|
186+
|EncryptionKey|AES256CBC encryption key used to decrypt messages from job/manifest queue|
187+
188+
## Set up Guidelines
189+
The following provides high level guidelines for implementing the asynchronous metadata migration function. This documentation does not go into details on how to interact with SharePoint RESTful service. It is assumed that the ISV has prior knowledge and will be able to access the target website with proper permission. </br>,</br>For more information on how to access the Sharepoint website, refer to [Get to Know the SharePoint Rest Service](https://docs.microsoft.com/en-us/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service).
190+
191+
1. Install and update the latest Microsoft.SharePointOnline.CSOM version. The minimum version requirement is V16.1.8600 or later.
192+
2. ISVs figure out the folder, document library or files of interested to be query and issued with CreateSPAsyncReadJob function.
193+
3. Once successfully created, query the job status using the *jobQueueUri*. It provides the job process status and any error logging. After job completion, parse the Manifest to retrieve the metadata.
194+
195+
### Sharepoint Migration Export (Asynchronous Metadata Read) API Example
196+
197+
#### Scenario: Large FileShare with nested files/folders
198+
199+
Suggestion:
200+
201+
1. Issue CreateSPAsyncReadJob:</br>
202+
a. URL = root URL (e.g. <spam><spam>www.contoso.com/my-resource-document<spam><spam>)</br>
203+
b. Optional Flag: IncludeDirectDescendantsOnly(true)
204+
205+
2. For each of the sub folders, issues createSPAsyncReadJob , for example if there are sub folder A and B</br>
206+
a. Issue CreateSPAsyncReadJob with URL = root URL (e.g. <spam><spam>www.contoso.com/my-resource-document/a<spam><spam>) </br>
207+
b. Issue CreateSPAsyncReadJob with URL = root URL (e.g. <spam><spam>www.contoso.com/my-resource-document/b<spam><spam>)
208+
209+
210+
#### Scenario: Tenant to tenant or large Sharepoint Migration
211+
212+
1. Issue CreateSPAsyncReadJob: </br>
213+
a. URL = root URL (e.g. <spam><spam>www.contoso.com/my-resource-item<spam><spam>)</br>
214+
b. Optional Flag: IncludeDirectDescendantsOnly(true) , IncludeFullMetadata(true)
215+
216+
217+
#### Scenario: Incremental Migration of FileShare for a sub folder
218+
219+
1. Issue CreateSPAsyncReadJob:</br>
220+
a. URL = root URL (e.g. <spam><spam>www.contoso.com/my-resource-document/a<spam><spam>)</br>
221+
b. Remembered the CurrentChangeToken
222+
223+
2. After some time, the software wishes to perform incremental migration. Issue CreateSPAsyncReadJob with following term:</br>
224+
a. URL = root URL (e.g. <spam><spam>www.contoso.com/my-resource-document/a<spam><spam>)</br>
225+
b. Optional Flag: StartChangeToken(CurrentChangeToken)
226+
227+
228+
## Limitations
229+
<a name="limitations"> </a>
230+
231+
By default, each URL supports up to 1 million limits. At the start of the migration, the asynchronous read migration function will check. If more than 1 million is detected an error will be thrown. Multiple versions of a single file will count as one. This limit may be changed in the future.
232+
233+
**Sharepoint Migration Export (Asynchronous Metadata Read) API Limitations**</br>
234+
235+
236+
|**Type**|**SharePoint Online Limit**|**Recommended limit</br>for async read**|**Description**|
237+
|:-----|:-----|:-----|:-----|
238+
|Lists|30 million items|1 million|Per list URL, the migration read will process up to 1 million rea|
239+
|Document Library|30 million files/folders|1 million|Per list URL, the migration job will process up to 1 million reads|
240+
|Users|2 million per site collection|1 million|Per site collection. This is only supported in a future permission setting.|
241+
242+
243+
## Performance Expectation
244+
The preliminary performance test provides a rough estimate of more than 200 items per second throughput. This does not account for any potential throttle over the network. If the read asynchronous function fails to reach the server due to throttling, then performance will be impacted. At the start of read asynchronous migration, the server calculates the number of objects to confirm that it is within the 1 million object limit, hence there is an overhead.
245+
246+
For single read query or small items read (e.g. hundreds of items), it is faster to use Graph API or RESTful/CSOM query as the asynchronous read metadata will have the overhead cost.
247+
248+
However, one of the key performance benefits of the asynchronous metadata read is the ability to balance the server-side load and the backend query is much more efficient than individual CSOM load reducing your chance of throttling.
249+
250+
251+
252+

0 commit comments

Comments
 (0)