Skip to content

Commit eeb3f42

Browse files
committed
new KB - Summarizing the Text Content of PDF Documents using Text Analytics with Azure AI services
1 parent 5fd44ac commit eeb3f42

File tree

5 files changed

+153
-5
lines changed

5 files changed

+153
-5
lines changed

introduction.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@ Telerik Document Processing features the following libraries:
3030

3131
|Library|Description|
3232
|----|----|
33-
| [RadPdfProcessing]({%slug radpdfprocessing-overview%}) ![Pdf](images/dpl-pdf.png)|A processing library that allows you to create, import, and export PDF documents from your code. You can use it in any web or desktop .NET application without relying on third-party software like Adobe Acrobat.|
34-
|[RadSpreadProcessing]({%slug radspreadprocessing-overview%}) ![Spread](images/dpl-spread.png)|A powerful library that enables you to create applications with native support for spreadsheet documents. With RadSpreadProcessing, you can create spreadsheets from scratch, modify existing documents or convert between the most common spreadsheet formats. You can save the generated workbook to a local file, stream, or stream it to the client browser.|
35-
|[RadSpreadStreamProcessing]({%slug radspreadstreamprocessing-overview%}) ![SpreadStream](images/dpl-spread.png)|Spread streaming is a document processing paradigm that allows you to create or read big spreadsheet documents with great performance and minimal memory footprint. The key for the memory efficiency is that the spread streaming library writes the spreadsheet content directly to a stream without creating and preserving the spreadsheet document model in memory.|
36-
|[RadWordsProcessing]({%slug radwordsprocessing-overview%}) ![Words](images/dpl-words.png)|A processing library that allows you to create, modify and export documents to a variety of formats. Through the API, you can access each element in the document and modify, remove it or add a new one. The generated content you can save as a stream, as a file, or sent it to the client browser.|
37-
|[RadZipLibrary]({%slug radziplibrary-overview%}) ![Zip](images/dpl-zip.png)| It allows you to compress and combine files in ZIP archives, browse and extract files from existing ZIP archives and compress streams for easy file shipping and reduced storage space.|
33+
|![Pdf](images/dpl-pdf.png) [RadPdfProcessing]({%slug radpdfprocessing-overview%})|A processing library that allows you to create, import, and export PDF documents from your code. You can use it in any web or desktop .NET application without relying on third-party software like Adobe Acrobat.|
34+
|![Spread](images/dpl-spread.png) [RadSpreadProcessing]({%slug radspreadprocessing-overview%})|A powerful library that enables you to create applications with native support for spreadsheet documents. With RadSpreadProcessing, you can create spreadsheets from scratch, modify existing documents or convert between the most common spreadsheet formats. You can save the generated workbook to a local file, stream, or stream it to the client browser.|
35+
|![SpreadStream](images/dpl-spread.png) [RadSpreadStreamProcessing]({%slug radspreadstreamprocessing-overview%})|Spread streaming is a document processing paradigm that allows you to create or read big spreadsheet documents with great performance and minimal memory footprint. The key for the memory efficiency is that the spread streaming library writes the spreadsheet content directly to a stream without creating and preserving the spreadsheet document model in memory.|
36+
|![Words](images/dpl-words.png) [RadWordsProcessing]({%slug radwordsprocessing-overview%})|A processing library that allows you to create, modify and export documents to a variety of formats. Through the API, you can access each element in the document and modify, remove it or add a new one. The generated content you can save as a stream, as a file, or sent it to the client browser.|
37+
|![Zip](images/dpl-zip.png) [RadZipLibrary]({%slug radziplibrary-overview%})| It allows you to compress and combine files in ZIP archives, browse and extract files from existing ZIP archives and compress streams for easy file shipping and reduced storage space.|
3838

3939
## Key Features
4040

knowledge-base/extract-text-from-pdf.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,5 @@ Follow the steps:
4747
- [RadPdfProcessing]({%slug radpdfprocessing-overview%})
4848
- [OcrFormatProvider]({%slug radpdfprocessing-formats-and-conversion-ocr-ocrformatprovider%})
4949
- [TextFormatProvider]({%slug radpdfprocessing-formats-and-conversion-plain-text-textformatprovider%})
50+
- [Summarizing the Text Content of PDF Documents using Text Analytics with Azure AI services]({%slug summarize-pdf-content%})
5051

79.6 KB
Loading
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
title: Summarizing the Text Content of PDF Documents using Text Analytics with Azure AI services
3+
description: Learn how to summarize the text content from a PDF document using RadPdfProcessing and Text Analytics with Azure AI services.
4+
type: how-to
5+
page_title: How to Summarize the Text Content of PDF documents using Text Analytics with Azure AI services
6+
slug: summarize-pdf-content
7+
tags: pdf, document, processing, text, summarize,, summary, content, azure
8+
res_type: kb
9+
ticketid: 1657503
10+
---
11+
12+
## Environment
13+
14+
| Version | Product | Author |
15+
| ---- | ---- | ---- |
16+
| 2025.1.128| RadPdfProcessing |[Desislava Yordanova](https://www.telerik.com/blogs/author/desislava-yordanova)|
17+
18+
## Description
19+
20+
Learn how to summarize the text content of a PDF document using [Text Analytics with Azure AI services](https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark).
21+
22+
## Solution
23+
24+
Follow the steps:
25+
26+
1\. Before going further, you can find listed below the **required** assemblies/ NuGet packages that should be added to your project:
27+
28+
* [Azure.AI.TextAnalytics](https://www.nuget.org/packages/Azure.AI.TextAnalytics)
29+
* Telerik.Documents.Fixed
30+
* Telerik.Documents.Core
31+
* Telerik.Zip
32+
33+
2\. It is necessary to generate your Azure AI key and endpoint: [Get your credentials from your Azure AI services resource](https://learn.microsoft.com/en-us/azure/ai-services/use-key-vault?tabs=azure-cli&pivots=programming-language-csharp)
34+
35+
![Azure AI key](images/azure-ai-key.png)
36+
37+
3\. Use the custom implementation to summarize the text content extracted in step 1:
38+
39+
```csharp
40+
static void Main(string[] args)
41+
{
42+
Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.PdfFormatProvider pdf_provider = new PdfFormatProvider();
43+
Telerik.Windows.Documents.Fixed.FormatProviders.Text.TextFormatProvider text_provider = new TextFormatProvider();
44+
Telerik.Windows.Documents.Fixed.Model.RadFixedDocument document = pdf_provider.Import(File.ReadAllBytes("PdfDocument.pdf"), TimeSpan.FromSeconds(10));
45+
string documentTextContent = text_provider.Export(document);
46+
47+
AzureTextSummarizationProvider summarizationProvider = new AzureTextSummarizationProvider(azure_key, azure_endpoint);
48+
string summary = summarizationProvider.SummarizeText(documentTextContent).Result;
49+
50+
Console.WriteLine(summary);
51+
}
52+
53+
public class AzureTextSummarizationProvider
54+
{
55+
private string languageKey;
56+
private string languageEndpoint;
57+
58+
public AzureTextSummarizationProvider(string azure_key, string azure_endpoint)
59+
{
60+
this.languageKey = azure_key;
61+
this.languageEndpoint = azure_endpoint;
62+
}
63+
64+
public async Task<string> SummarizeText(string text)
65+
{
66+
Azure.AzureKeyCredential credentials = new Azure.AzureKeyCredential(languageKey);
67+
Uri endpoint = new Uri(languageEndpoint);
68+
69+
Azure.AI.TextAnalytics.TextAnalyticsClient client = new Azure.AI.TextAnalytics.TextAnalyticsClient(endpoint, credentials);
70+
71+
// Prepare analyze operation input. You can add multiple documents to this list and perform the same
72+
// operation to all of them.
73+
List<string> batchInput = new List<string>
74+
{
75+
text
76+
};
77+
78+
Azure.AI.TextAnalytics.TextAnalyticsActions actions = new Azure.AI.TextAnalytics.TextAnalyticsActions()
79+
{
80+
ExtractiveSummarizeActions = [new Azure.AI.TextAnalytics.ExtractiveSummarizeAction()]
81+
};
82+
83+
// Start analysis process.
84+
Azure.AI.TextAnalytics.AnalyzeActionsOperation operation = await client.StartAnalyzeActionsAsync(batchInput, actions);
85+
await operation.WaitForCompletionAsync();
86+
87+
System.Text.StringBuilder stringBuilder = new System.Text.StringBuilder();
88+
// View operation status.
89+
stringBuilder.AppendLine($"AnalyzeActions operation has completed");
90+
stringBuilder.AppendLine();
91+
92+
stringBuilder.AppendLine($"Created On : {operation.CreatedOn}");
93+
stringBuilder.AppendLine($"Expires On : {operation.ExpiresOn}");
94+
stringBuilder.AppendLine($"Id : {operation.Id}");
95+
stringBuilder.AppendLine($"Status : {operation.Status}");
96+
97+
stringBuilder.AppendLine();
98+
// View operation results.
99+
await foreach (Azure.AI.TextAnalytics.AnalyzeActionsResult documentsInPage in operation.Value)
100+
{
101+
IReadOnlyCollection<Azure.AI.TextAnalytics.ExtractiveSummarizeActionResult> summaryResults = documentsInPage.ExtractiveSummarizeResults;
102+
103+
foreach (Azure.AI.TextAnalytics.ExtractiveSummarizeActionResult summaryActionResults in summaryResults)
104+
{
105+
if (summaryActionResults.HasError)
106+
{
107+
stringBuilder.AppendLine($" Error!");
108+
stringBuilder.AppendLine($" Action error code: {summaryActionResults.Error.ErrorCode}.");
109+
stringBuilder.AppendLine($" Message: {summaryActionResults.Error.Message}");
110+
continue;
111+
}
112+
113+
foreach (Azure.AI.TextAnalytics.ExtractiveSummarizeResult documentResults in summaryActionResults.DocumentsResults)
114+
{
115+
if (documentResults.HasError)
116+
{
117+
stringBuilder.AppendLine($" Error!");
118+
stringBuilder.AppendLine($" Document error code: {documentResults.Error.ErrorCode}.");
119+
stringBuilder.AppendLine($" Message: {documentResults.Error.Message}");
120+
continue;
121+
}
122+
123+
stringBuilder.AppendLine($" Extracted the following {documentResults.Sentences.Count} sentence(s):");
124+
stringBuilder.AppendLine();
125+
126+
foreach (Azure.AI.TextAnalytics.ExtractiveSummarySentence sentence in documentResults.Sentences)
127+
{
128+
stringBuilder.Append($"{sentence.Text} ");
129+
}
130+
}
131+
}
132+
}
133+
134+
string result = stringBuilder.ToString();
135+
136+
return result;
137+
}
138+
}
139+
```
140+
141+
## See Also
142+
143+
- [Extracting Text from PDF Documents]({%slug extract-text-from-pdf%})
144+
- [OcrFormatProvider]({%slug radpdfprocessing-formats-and-conversion-ocr-ocrformatprovider%})
145+
- [TextFormatProvider]({%slug radpdfprocessing-formats-and-conversion-plain-text-textformatprovider%})
146+

libraries/radpdfprocessing/formats-and-conversion/plain-text/textformatprovider.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,4 @@ __Example 1__ shows how to use __TextFormatProvider__ to export __RadFixedDocume
4242
* [TextFormatProvider Settings]({%slug radpdfprocessing-formats-and-conversion-plain-text-settings%})
4343
* [Timeout Mechanism]({%slug timeout-mechanism-in-dpl%})
4444
* [Extracting Text from PDF Documents]({%slug extract-text-from-pdf%})
45+
* [Summarizing the Text Content of PDF Documents using Text Analytics with Azure AI services]({%slug summarize-pdf-content%})

0 commit comments

Comments
 (0)