Document
Overview
Document is the main entry point of Aspose.Words FOSS for Python. It provides the public API for loading Word documents (DOC, DOCX, RTF, TXT, Markdown), saving to multiple output formats (PDF, Markdown, TXT), and extracting plain text.
Defined in: aspose/words_foss/document.py
Description
Document is a class in the Aspose.Words FOSS library for Python that exposes 2 methods and 13 properties for programmatic use. It extends BaseModel, inheriting shared functionality from its parent type.
Core capabilities include: optional[union[str, path]]; optional[binaryio]; optional[bytes]. These operations enable developers to integrate document functionality directly into Python applications.
The class also provides the light_document_model property (access the underlying light document model for advanced inspection of document structure (paragraphs, tables, styles, sections)).
Constructor
| Signature | Description |
|---|---|
Document(filepath=None, *, stream=None, data=None) | Load a document from a file path, binary stream, or raw bytes. At least one source must be provided. |
Parameters:
| Name | Type | Description |
|---|---|---|
filepath | Optional[Union[str, Path]] | Path to the document file to load. |
stream | Optional[BinaryIO] | Binary stream containing document data (DOCX format only). |
data | Optional[bytes] | Raw bytes of the document content (DOCX format only). |
Methods
| Signature | Description |
|---|---|
save(output_path, save_format_or_options=None) → None | Save the document to PDF (SaveFormat.PDF), Markdown (SaveFormat.MARKDOWN), or plain text (SaveFormat.TEXT). Pass a SaveFormat constant for default settings, or a save-options object (PdfSaveOptions, MarkdownSaveOptions) for fine-grained control over output. |
get_text() → str | Extract plain text from the loaded document. |
Properties
| Name | Type | Access | Description |
|---|---|---|---|
light_document_model | ldm.Document | Read | Access the underlying light document model for advanced inspection of document structure (paragraphs, tables, styles, sections). |
Usage
import aspose.words_foss as aw
# Load a DOCX file
doc = aw.Document("input.docx")
# Extract plain text
text = doc.get_text()
# Save as PDF
doc.save("output.pdf", aw.SaveFormat.PDF)
# Save as Markdown
doc.save("output.md", aw.SaveFormat.MARKDOWN)Supported Formats
Input Formats
| Format | LoadFormat Constant | Description |
|---|---|---|
| DOCX | LoadFormat.DOCX | Office Open XML Word document |
| DOC | LoadFormat.DOC | Legacy Microsoft Word binary format |
| RTF | LoadFormat.RTF | Rich Text Format |
| TXT | LoadFormat.TEXT | Plain text |
| Markdown | LoadFormat.MARKDOWN | CommonMark Markdown |
| Auto | LoadFormat.AUTO | Detect format from file extension (default) |
Output Formats
| Format | SaveFormat Constant | Save Options |
|---|---|---|
SaveFormat.PDF | PdfSaveOptions — configure compliance, image compression, font embedding | |
| Markdown | SaveFormat.MARKDOWN | MarkdownSaveOptions — configure table alignment, image export, list mode |
| Plain Text | SaveFormat.TEXT | — |
| DOCX | SaveFormat.DOCX | Read-only constant. Document.save() does not support DOCX — raises ValueError. |
| DOC | SaveFormat.DOC | Read-only constant. Document.save() does not support DOC — raises ValueError. |
Light Document Model
The light_document_model property provides access to the internal document structure (ldm.Document, defined in light_document_model.py). This pydantic BaseModel exposes parsed paragraphs, tables, styles, sections, headers, footers, and structural queries like find_style() and headings(). Most users do not need to access the LDM directly — the public Document methods above cover standard workflows.
See Also
- SaveFormat — output format constants used with
Document.save() - LoadFormat — input format constants for explicit format specification
- PdfSaveOptions — fine-grained PDF export control (compliance level, compression, font embedding)
- MarkdownSaveOptions — Markdown export options (table alignment, image handling, list mode)
- Aspose.Words for Python — Enterprise API Reference