TextAbsorber

Overview

TextAbsorber is a class in Aspose.PDF FOSS for .NET.

Extracts text from PDF pages by parsing content streams.

This class provides 8 methods for working with TextAbsorber objects in .NET programs. Available methods include: Reset, TextAbsorber, Visit. All public members are accessible to any .NET application after installing the Aspose.PDF FOSS for .NET package. Properties: Errors, ExtractionOptions, HasErrors, Text, TextSearchOptions.

Properties

NameTypeAccessDescription
TextstringReadThe extracted text after calling Visit().
ExtractionOptionsTextExtractionOptionsRead/WriteGets or sets the text extraction options.
TextSearchOptionsTextSearchOptionsRead/WriteGets or sets the text search options used during extraction.
ErrorsList<TextExtractionError>ReadErrors recorded during extraction.
HasErrorsboolReadWhether any extraction error was recorded.

Methods

SignatureDescription
TextAbsorber()Initializes a new TextAbsorber with default settings.
TextAbsorber(extractionOptions: TextExtractionOptions)Initializes a new TextAbsorber with the specified extraction options.
TextAbsorber(textSearchOptions: TextSearchOptions)Initializes with text-search options.
TextAbsorber(extractionOptions: TextExtractionOptions, textSearchOptions: TextSearchOptions)Initializes with both extraction and search options.
Visit(page: Page)Extract text from a single page.
Visit(form: XForm)Extract text from all pages of a document.
Visit(pdf: Document)Calls Visit on this TextAbsorber instance.
Reset()Clears the extracted text and resets the absorber state so it can be reused.

See Also