PdfExtractor
Overview
PdfExtractor is a class in Aspose.Pdf FOSS for Java.
Inherits from: Closeable.
Facade for extracting text and images from PDF documents.
Properties
| Name | Type | Access | Description |
|---|---|---|---|
password | String | Read | Returns the password applied when opening encrypted PDFs in subsequent |
| {@link #bindPdf(String)} or {@link #bindPdf(InputStream)} calls. | |||
startPage | int | Read | Returns the start page. |
endPage | int | Read | Returns the end page. |
extractTextMode | int | Read | Returns the current text extraction mode. |
resolution | Resolution | Read | Returns the configured extraction resolution. |
extractImageMode | ExtractImageMode | Read | Returns the image extraction mode. |
textAsString | String | Read | Returns the extracted text as a string. |
textSearchOptions | TextSearchOptions | Read | Returns text search options associated with the extractor. |
bidi | boolean | Read | Returns whether the current extraction contains bidi text. |
attachNames | List<String> | Read | Returns the currently prepared attachment names. |
imageCount | int | Read | Returns the number of extracted images. |
Methods
| Signature | Description |
|---|---|
PdfExtractor() | Creates a new PdfExtractor instance. |
PdfExtractor(document: Document) | Creates a new PdfExtractor bound to an existing document. |
PdfExtractor(stream: InputStream) | Creates a new PdfExtractor bound to a PDF stream. |
bindPdf(inputFile: String) | Binds a PDF file to this extractor. |
bindPdf(stream: InputStream) | Binds a PDF from an input stream. |
getPassword() → String | Returns the password applied when opening encrypted PDFs in subsequent |
| {@link #bindPdf(String)} or {@link #bindPdf(InputStream)} calls. | |
setPassword(password: String) | Sets the password used by subsequent {@code bindPdf} calls to open |
| encrypted PDFs. | |
bindPdf(document: Document) | Binds an existing Document to this extractor. |
setStartPage(page: int) | Sets the start page for extraction (1-based). |
getStartPage() → int | Returns the start page. |
setEndPage(page: int) | Sets the end page for extraction (1-based). |
getEndPage() → int | Returns the end page. |
setExtractTextMode(mode: int) | Sets the text extraction mode. |
getExtractTextMode() → int | Returns the current text extraction mode. |
setResolution(resolution: Resolution) | Sets image extraction resolution for API parity. |
getResolution() → Resolution | Returns the configured extraction resolution. |
setExtractImageMode(extractImageMode: ExtractImageMode) | Sets the image extraction mode. |
getExtractImageMode() → ExtractImageMode | Returns the image extraction mode. |
extractText() | Extracts text from the page range. |
extractText(encoding: Charset) | Extracts text from the page range using the requested output encoding. |
getText(outputPath: String) | Writes extracted text to a file. |
getText(stream: OutputStream) | Writes extracted text to an output stream. |
getTextAsString() → String | Returns the extracted text as a string. |
getTextSearchOptions() → TextSearchOptions | Returns text search options associated with the extractor. |
setTextSearchOptions(textSearchOptions: TextSearchOptions) | Sets text search options used by subsequent extraction calls. |
hasNextPageText() → boolean | Returns whether page-by-page extracted text remains available. |
getNextPageText(outputPath: String) | Writes the next page text to a file. |
getNextPageText(stream: OutputStream) | Writes the next page text to a stream. |
isBidi() → boolean | Returns whether the current extraction contains bidi text. |
extractAttachment() | Prepares attachment extraction for all embedded files. |
extractAttachment(name: String) | Prepares extraction for a specific attachment key or file name. |
getAttachNames() → List<String> | Returns the currently prepared attachment names. |
getAttachment(outputPath: String) | Writes the prepared attachment(s) to the given file or directory. |
extractImage() | Extracts images from the page range. |
hasNextImage() → boolean | Returns whether there are more extracted images to retrieve. |
getNextImage(outputPath: String) | Saves the next extracted image to a file. |
getNextImage(outputPath: String, format: ImageFormat) | Saves the next extracted image to a file using the requested output format. |
getNextImage(stream: OutputStream) | Saves the next extracted image to an output stream. |
getNextImage(stream: OutputStream, format: ImageFormat) | Saves the next extracted image to a stream using the requested output format. |
getImageCount() → int | Returns the number of extracted images. |
close() | Closes this extractor and releases the bound document. |