TextAbsorber

Overview

TextAbsorber is a class in Aspose.PDF FOSS for .NET.

Extracts text from PDF pages by parsing content streams.

This class provides 8 methods for working with TextAbsorber objects in .NET programs. Available methods include: Reset, TextAbsorber, Visit. All public members are accessible to any .NET application after installing the Aspose.PDF FOSS for .NET package. Properties: Errors, ExtractionOptions, HasErrors, Text, TextSearchOptions.

Properties

Name	Type	Access	Description
`Text`	`string`	Read	The extracted text after calling Visit().
`ExtractionOptions`	`TextExtractionOptions`	Read/Write	Gets or sets the text extraction options.
`TextSearchOptions`	`TextSearchOptions`	Read/Write	Gets or sets the text search options used during extraction.
`Errors`	`List<TextExtractionError>`	Read	Errors recorded during extraction.
`HasErrors`	`bool`	Read	Whether any extraction error was recorded.

Methods

Signature	Description
`TextAbsorber()`	Initializes a new TextAbsorber with default settings.
`TextAbsorber(extractionOptions: TextExtractionOptions)`	Initializes a new TextAbsorber with the specified extraction options.
`TextAbsorber(textSearchOptions: TextSearchOptions)`	Initializes with text-search options.
`TextAbsorber(extractionOptions: TextExtractionOptions, textSearchOptions: TextSearchOptions)`	Initializes with both extraction and search options.
`Visit(page: Page)`	Extract text from a single page.
`Visit(form: XForm)`	Extract text from all pages of a document.
`Visit(pdf: Document)`	Calls Visit on this TextAbsorber instance.
`Reset()`	Clears the extracted text and resets the absorber state so it can be reused.

TextAbsorber

Overview

Properties

Methods

See Also