PdfExtractor

Overview

PdfExtractor is a class in Aspose.PDF FOSS for .NET. Inherits from: IDisposable.

Facade for extracting text and images from a PDF document.

This class provides 29 methods for working with PdfExtractor objects in .NET programs. Available methods include: BindPdf, Close, Dispose, ExtractAttachment, ExtractImage, ExtractText, GetAttachNames, GetAttachment, GetAttachmentInfo, GetNextImage, GetNextPageText, GetText, and 3 additional methods. All public members are accessible to any .NET application after installing the Aspose.PDF FOSS for .NET package. Properties: EndPage, ExtractImageMode, ExtractTextMode, IsBidi, Password, Resolution, and 2 more.

Properties

Name	Type	Access	Description
`StartPage`	`int`	Read/Write	1-based start page for extraction.
`EndPage`	`int`	Read/Write	1-based end page for extraction.
`ExtractTextMode`	`int`	Read/Write	Text extraction mode.
`ExtractImageMode`	`ExtractImageMode`	Read/Write	Image extraction strategy.
`Resolution`	`int`	Read/Write	Rendering resolution for image extraction (DPI).
`IsBidi`	`bool`	Read	True when the most recently extracted text contains a right-to-left script run (Hebrew, Arabic, Syriac, Thaana, etc.), i.e.
`Password`	`string?`	Read/Write	Password used when binding encrypted PDFs.
`TextSearchOptions`	`Aspose.Pdf.Text.TextSearchOptions`	Read/Write	Text-search options applied during ExtractText().

Methods

Signature	Description
`PdfExtractor()`	Calls PdfExtractor on this PdfExtractor instance.
`PdfExtractor(document: Document)`
`BindPdf(inputFile: string)`	Calls BindPdf on this PdfExtractor instance.
`BindPdf(document: Document)`
`BindPdf(inputStream: Stream)`
`ExtractText()`	Walks every page in the bound document with TextAbsorber and concatenates the extracted text.
`ExtractText(outputFile: string)`	Extract text from the bound PDF and save it to the given file (UTF-16 LE bytes).
`ExtractText(encoding: Encoding)`	Extract text from the bound PDF using the given encoding.
`GetText(outputStream: Stream)`	Writes the extracted text as bytes into outputStream, using the encoding set by the most recent ExtractText(Encoding) call (defaults to UTF-16 LE).
`GetText(outputStream: Stream, filterNotAscii: bool)`	Writes the extracted text as bytes, optionally dropping non-ASCII codepoints first.
`GetText(outputFile: string)`	Writes the extracted text to the given file path.
`HasNextPageText()`	True if there is another page’s text available via GetNextPageText.
`GetNextPageText(outputFile: string)`	Writes the next page’s text to the given file path (UTF-16 LE bytes).
`GetNextPageText(outputStream: Stream)`	Writes the next page’s text to the given stream (UTF-16 LE bytes).
`ExtractImage()`	Calls ExtractImage on this PdfExtractor instance.
`ExtractImage(outputDirectory: string)`	Extract every image to the given directory, naming files image_N.jpg for JPEG sources and image_N.png otherwise.
`HasNextImage()`	True while GetNextImage(string) has a remaining image to write.
`GetNextImage(outputStream: Stream)`	Calls GetNextImage on this PdfExtractor instance.
`GetNextImage(outputFile: string)`
`GetNextImage(outputStream: Stream, format: System.Drawing.Imaging.ImageFormat)`
`GetNextImage(outputFile: string, format: System.Drawing.Imaging.ImageFormat)`
`ExtractAttachment()`	Select every embedded file in the bound document; a subsequent GetAttachment(string) writes all of them.
`ExtractAttachment(attachmentFileName: string)`	Select a single embedded file by name for extraction.
`GetAttachNames()`	Names of every embedded file in the bound document.
`GetAttachmentInfo()`	FileSpecification entries for every embedded file in the bound document.
`GetAttachment()`	Return the selected attachments’ content as MemoryStreams (all attachments when none were explicitly selected).
`GetAttachment(outputPath: string)`	Write the selected attachments (all when none were explicitly selected) into the outputPath directory, one file per attachment named after its file name.
`Close()`	Calls Close on this PdfExtractor instance.
`Dispose()`	Calls Dispose on this PdfExtractor instance.

PdfExtractor

Overview

Properties

Methods

See Also