TextAbsorber
Overview
TextAbsorber is a class in Aspose.PDF FOSS for .NET.
Extracts text from PDF pages by parsing content streams.
This class provides 8 methods for working with TextAbsorber objects in .NET programs.
Available methods include: Reset, TextAbsorber, Visit.
All public members are accessible to any .NET application after installing the Aspose.PDF FOSS for .NET package.
Properties: Errors, ExtractionOptions, HasErrors, Text, TextSearchOptions.
Properties
| Name | Type | Access | Description |
|---|---|---|---|
Text | string | Read | The extracted text after calling Visit(). |
ExtractionOptions | TextExtractionOptions | Read/Write | Gets or sets the text extraction options. |
TextSearchOptions | TextSearchOptions | Read/Write | Gets or sets the text search options used during extraction. |
Errors | List<TextExtractionError> | Read | Errors recorded during extraction. |
HasErrors | bool | Read | Whether any extraction error was recorded. |
Methods
| Signature | Description |
|---|---|
TextAbsorber() | Initializes a new TextAbsorber with default settings. |
TextAbsorber(extractionOptions: TextExtractionOptions) | Initializes a new TextAbsorber with the specified extraction options. |
TextAbsorber(textSearchOptions: TextSearchOptions) | Initializes with text-search options. |
TextAbsorber(extractionOptions: TextExtractionOptions, textSearchOptions: TextSearchOptions) | Initializes with both extraction and search options. |
Visit(page: Page) | Extract text from a single page. |
Visit(form: XForm) | Extract text from all pages of a document. |
Visit(pdf: Document) | Calls Visit on this TextAbsorber instance. |
Reset() | Clears the extracted text and resets the absorber state so it can be reused. |