Skip to content

Commit cb73691

Browse files
Merge pull request SharePoint#9575 from andrewconnell/spe-refresh
Fix formatting, word choice, & add SPE tutorials to TOC
2 parents 6ccc77a + acdc2cb commit cb73691

File tree

3 files changed

+143
-143
lines changed

3 files changed

+143
-143
lines changed
Lines changed: 101 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Document Processing with Azure Cognitive Services
3-
description: Enabling document processing with Azure Cognitive Services
3+
description: Enabling document processing with Azure Cognitive Services.
44
ms.date: 02/26/2024
55
ms.localizationpriority: high
66
---
@@ -15,90 +15,89 @@ You have already learned how to use webhooks with [the application](/training/mo
1515

1616
To set up automatic AI processing with your current SharePoint application upon a change in your container, you need to follow [Using Webhooks](./using-webhooks.md) and then:
1717

18-
1. Get the delta changes of the container. You are currently able to get the notification whenever there is any change in our container and will now get the files that are added or updated.
19-
1. Call Azure Cognitive Services’s Document Intelligence service API. You will need to create an Azure AI resource to use the API to extract the fields from an image and get the extracted files. You may store them as shown in this tutorial or you may process them as you like.
18+
1. Get the delta changes of the container. You're currently able to get the notification whenever there's any change in our container and will now get the files that are added or updated.
19+
1. Call Azure Cognitive Services’s Document Intelligence service API. You'll need to create an Azure AI resource to use the API to extract the fields from an image and get the extracted files. You might store them as shown in this tutorial or you might process them as you like.
2020
![document processing schema](../images/Document-Processing.png)
2121

2222
> [!TIP]
23-
> To learn more about the Graph APIs used in this tutorial, see [Track changes for a Drive](/graph/api/driveitem-delta), [Get a DriveItem resource](/graph/api/driveitem-get), and [Upload or replace the contents of a DriveItem](/graph/api/driveitem-put-content).
24-
23+
> To learn more about the Microsoft Graph APIs used in this tutorial, see [Track changes for a Drive](/graph/api/driveitem-delta), [Get a DriveItem resource](/graph/api/driveitem-get), and [Upload or replace the contents of a DriveItem](/graph/api/driveitem-put-content).
2524
2625
## Get the delta changes of a container
2726

2827
Open **GraphProvider.ts** and implement the method `getDriveChanges` to get the list of changed items.
2928

30-
```ts
29+
```typescript
3130
public static async getDriveChanges(driveId: string): Promise<any[]> {
32-
let changedItems: any[] = [];
33-
const driveDeltaBasePath: string = `/drives/${driveId}/items/root/delta`;
34-
let driveDeltaTokenParams: string = "";
35-
let hasMoreChanges: boolean = true;
36-
try{
37-
do {
38-
if (this.changeTokens.has(driveId)) {
39-
driveDeltaTokenParams = `?token=${this.changeTokens.get(driveId)}`
40-
}
41-
const response = await this.graphClient.api(driveDeltaBasePath + driveDeltaTokenParams).get();
42-
changedItems.push(...response.value);
43-
if (response['@odata.nextLink']) {
44-
const token = new URL(response['@odata.nextLink']).searchParams.get('token');
45-
this.changeTokens.set(driveId, token);
46-
} else {
47-
hasMoreChanges = false;
48-
const token = new URL(response['@odata.deltaLink']).searchParams.get('token');
49-
this.changeTokens.set(driveId, token);
50-
}
51-
console.log(this.changeTokens.get(driveId));
52-
} while (hasMoreChanges);
53-
}
54-
catch(err){
55-
console.log(err);
56-
}
57-
return changedItems;
31+
let changedItems: any[] = [];
32+
const driveDeltaBasePath: string = `/drives/${driveId}/items/root/delta`;
33+
let driveDeltaTokenParams: string = "";
34+
let hasMoreChanges: boolean = true;
35+
try{
36+
do {
37+
if (this.changeTokens.has(driveId)) {
38+
driveDeltaTokenParams = `?token=${this.changeTokens.get(driveId)}`
39+
}
40+
const response = await this.graphClient.api(driveDeltaBasePath + driveDeltaTokenParams).get();
41+
changedItems.push(...response.value);
42+
if (response['@odata.nextLink']) {
43+
const token = new URL(response['@odata.nextLink']).searchParams.get('token');
44+
this.changeTokens.set(driveId, token);
45+
} else {
46+
hasMoreChanges = false;
47+
const token = new URL(response['@odata.deltaLink']).searchParams.get('token');
48+
this.changeTokens.set(driveId, token);
49+
}
50+
console.log(this.changeTokens.get(driveId));
51+
} while (hasMoreChanges);
52+
}
53+
catch(err){
54+
console.log(err);
55+
}
56+
return changedItems;
5857
}
5958
```
6059

6160
Implement the method `getDriveItem` to fetch a file from a container.
6261

63-
```ts
62+
```typescript
6463
public static async getDriveItem(driveId: string, itemId: string): Promise<any> {
65-
return await this.graphClient.api(`/drives/${driveId}/items/${itemId}`).get();
64+
return await this.graphClient.api(`/drives/${driveId}/items/${itemId}`).get();
6665
}
6766
```
6867
6968
Create a new file **ReceiptProcessor.ts** and implement a method `processDrive`.
7069
71-
```ts
70+
```typescript
7271
export abstract class ReceiptProcessor {
7372

74-
public static async processDrive(driveId: string): Promise<void> {
75-
const changedItems = await GraphProvider.getDriveChanges(driveId);
76-
for (const changedItem of changedItems) {
77-
try {
78-
const item = await GraphProvider.getDriveItem(driveId, changedItem.id);
79-
const extension = this.getFileExtension(item.name);
80-
if (this.SUPPORTED_FILE_EXTENSIONS.includes(extension.toLowerCase())) {
81-
console.log(item.name);
82-
const url = item["@microsoft.graph.downloadUrl"];
83-
const receipt = await this.analyzeReceiptStream(await this.getDriveItemStream(url));
84-
const receiptString = JSON.stringify(receipt, null, 2)
85-
const fileName = this.getFileDisplayName(item.name) + "-extracted-fields.json";
86-
const parentId = item.parentReference.id;
87-
await GraphProvider.addDriveItem(driveId, parentId, fileName, receiptString);
88-
}
89-
} catch (error) {
90-
console.log(error);
91-
}
73+
public static async processDrive(driveId: string): Promise<void> {
74+
const changedItems = await GraphProvider.getDriveChanges(driveId);
75+
for (const changedItem of changedItems) {
76+
try {
77+
const item = await GraphProvider.getDriveItem(driveId, changedItem.id);
78+
const extension = this.getFileExtension(item.name);
79+
if (this.SUPPORTED_FILE_EXTENSIONS.includes(extension.toLowerCase())) {
80+
console.log(item.name);
81+
const url = item["@microsoft.graph.downloadUrl"];
82+
const receipt = await this.analyzeReceiptStream(await this.getDriveItemStream(url));
83+
const receiptString = JSON.stringify(receipt, null, 2)
84+
const fileName = this.getFileDisplayName(item.name) + "-extracted-fields.json";
85+
const parentId = item.parentReference.id;
86+
await GraphProvider.addDriveItem(driveId, parentId, fileName, receiptString);
9287
}
93-
88+
} catch (error) {
89+
console.log(error);
90+
}
9491
}
95-
```
92+
}
93+
}
94+
```
9695

9796
At this point if you restart the app along with tunneling and subscription, you should see the recently added/updated files listed in the console.
9897

9998
## Call Azure Cognitive Services' Document Intelligence service API
10099

101-
To use the Azure Cognitive Services Document Intelligence APIs, you need to create a Multi-Service or Document Intelligence resource for Azure AI Service. Follow the tutorials below to create the resource:
100+
To use the Azure Cognitive Services Document Intelligence APIs, you need to create a Multi-Service or Document Intelligence resource for Azure AI Service. Refer to the following tutorials to create the resource:
102101

103102
- [Quickstart: Create a multi-service resource for Azure AI services](/azure/ai-services/multi-service-resource?tabs=windows&pivots=azportal)
104103
- [Get started with Document Intelligence](/azure/ai-services/document-intelligence/quickstarts/get-started-sdks-rest-api?view=doc-intel-3.1.0&viewFallbackFrom=form-recog-3.0.0&preserve-view=true&pivots=programming-language-javascript)
@@ -107,74 +106,74 @@ After this step, you should have an endpoint and a key ready to use.
107106

108107
Now open **ReceiptProcessor.ts** to create method `dac` to store the Azure Cognitive Services credentials.
109108

110-
```ts
109+
```typescript
111110
private static dac = new DocumentAnalysisClient(
112-
`${process.env["DAC_RESOURCE_ENDPOINT"]}`,
113-
new AzureKeyCredential(`${process.env["DAC_RESOURCE_KEY"]}`)
111+
`${process.env["DAC_RESOURCE_ENDPOINT"]}`,
112+
new AzureKeyCredential(`${process.env["DAC_RESOURCE_KEY"]}`)
114113
);
115114
```
116115

117116
Create method `getDriveItemStream`.
118117

119-
```ts
118+
```typescript
120119
private static async getDriveItemStream(url: string): Promise<Readable> {
121-
const token = GraphProvider.graphAccessToken;
122-
const config: AxiosRequestConfig = {
123-
method: "get",
124-
url: url,
125-
headers: {
126-
"Authorization": `Bearer ${token}`
127-
},
128-
responseType: 'stream'
129-
};
130-
const response = await axios.get<Readable>(url, config);
131-
return response.data;
132-
}
120+
const token = GraphProvider.graphAccessToken;
121+
const config: AxiosRequestConfig = {
122+
method: "get",
123+
url: url,
124+
headers: {
125+
"Authorization": `Bearer ${token}`
126+
},
127+
responseType: 'stream'
128+
};
129+
const response = await axios.get<Readable>(url, config);
130+
return response.data;
131+
}
133132
```
134133

135-
Create method `analyzeReceiptStream` to get the OCR fields through Azure Cognitive Services processing. Here we are taking the `prebuilt-invoice` model, but other models can be chosen.
134+
Create method `analyzeReceiptStream` to get the OCR fields through Azure Cognitive Services processing. Here we're taking the `prebuilt-invoice` model, but other models can be chosen.
136135

137-
```ts
136+
```typescript
138137
private static async analyzeReceiptStream(stream: Readable): Promise<any> {
139-
140-
const poller = await this.dac.beginAnalyzeDocument("prebuilt-invoice", stream, {
141-
onProgress: ({ status }) => {
142-
console.log(`status: ${status}`);
143-
},
144-
});
145-
146-
const {
147-
documents: [result] = [],
148-
} = await poller.pollUntilDone();
149-
150-
const fields = result?.fields;
151-
this.removeUnwantedFields(fields);
152-
return fields;
153-
}
138+
const poller = await this.dac.beginAnalyzeDocument("prebuilt-invoice", stream, {
139+
onProgress: ({ status }) => {
140+
console.log(`status: ${status}`);
141+
},
142+
});
143+
144+
const {
145+
documents: [result] = [],
146+
} = await poller.pollUntilDone();
147+
148+
const fields = result?.fields;
149+
this.removeUnwantedFields(fields);
150+
return fields;
151+
}
154152
```
155153

156154
Create method `removeUnwantedFields` to remove the undesirable fields in Azure Cognitive Services’s response.
157155

158-
```ts
156+
```typescript
159157
private static removeUnwantedFields(fields: any) {
160-
for (const prop in fields) {
161-
if (prop === 'boundingRegions' || prop === 'content' || prop === 'spans') {
162-
delete fields[prop];
163-
}
164-
if (typeof fields[prop] === 'object') {
165-
this.removeUnwantedFields(fields[prop]);
166-
}
167-
}
158+
for (const prop in fields) {
159+
if (prop === 'boundingRegions' || prop === 'content' || prop === 'spans') {
160+
delete fields[prop];
161+
}
162+
if (typeof fields[prop] === 'object') {
163+
this.removeUnwantedFields(fields[prop]);
168164
}
165+
}
166+
}
169167
```
170168

171169
Finally, open **GraphProvider.ts** to add the `addDriveItem` method at the end of the `GraphProvider` class.
172170

173-
```ts
171+
```typescript
174172
public static async addDriveItem(driveId: string, parentId: any, fileName: string, receiptString: string) {
175-
await this.graphClient.api(`/drives/${driveId}/items/${parentId}:/${fileName}:/content`).put(receiptString);
176-
}
173+
await this.graphClient.api(`/drives/${driveId}/items/${parentId}:/${fileName}:/content`).put(receiptString);
174+
}
177175
```
178176

179-
Now, restart the demo app and setup the tunneling using ngrok and delta change subscription on the container again.
177+
Now, restart the demo app and set up the tunneling using ngrok and delta change subscription on the container again.
178+
180179
If you add/update any file (supported formats: 'JPEG', 'JPG', 'PNG', 'BMP', 'TIFF', 'PDF') in this container, you should see a new JSON file created and containing the fields extracted from the file.

0 commit comments

Comments
 (0)