Overview
Document is the main entry point of Aspose.Words FOSS for Python. It provides the public API for loading Word documents (DOC, DOCX, RTF, TXT, Markdown), saving to multiple output formats (PDF, Markdown, TXT), and extracting plain text.
Defined in: aspose/words_foss/document.py
Constructor
| Signature | Description |
|---|
Document(filepath=None, *, stream=None, data=None) | Load a document from a file path, binary stream, or raw bytes. At least one source must be provided. |
Parameters:
| Name | Type | Description |
|---|
filepath | Optional[Union[str, Path]] | Path to the document file to load. |
stream | Optional[BinaryIO] | Binary stream containing document data (DOCX format only). |
data | Optional[bytes] | Raw bytes of the document content (DOCX format only). |
Methods
| Signature | Description |
|---|
save(output_path, save_format_or_options=None) → None | Save the document to PDF (SaveFormat.PDF), Markdown (SaveFormat.MARKDOWN), or plain text (SaveFormat.TEXT). Pass a SaveFormat constant for default settings, or a save-options object (PdfSaveOptions, MarkdownSaveOptions) for fine-grained control over output. |
get_text() → str | Extract plain text from the loaded document. |
Properties
| Name | Type | Access | Description |
|---|
light_document_model | ldm.Document | Read | Access the underlying light document model for advanced inspection of document structure (paragraphs, tables, styles, sections). |
Usage
Supported Formats
Input Formats
| Format | LoadFormat Constant | Description |
|---|
| DOCX | LoadFormat.DOCX | Office Open XML Word document |
| DOC | LoadFormat.DOC | Legacy Microsoft Word binary format |
| RTF | LoadFormat.RTF | Rich Text Format |
| TXT | LoadFormat.TEXT | Plain text |
| Markdown | LoadFormat.MARKDOWN | CommonMark Markdown |
| Auto | LoadFormat.AUTO | Detect format from file extension (default) |
Output Formats
| Format | SaveFormat Constant | Save Options |
|---|
| PDF | SaveFormat.PDF | PdfSaveOptions — configure compliance, image compression, font embedding |
| Markdown | SaveFormat.MARKDOWN | MarkdownSaveOptions — configure table alignment, image export, list mode |
| Plain Text | SaveFormat.TEXT | — |
| DOCX | SaveFormat.DOCX | Read-only constant. Document.save() does not support DOCX — raises ValueError. |
| DOC | SaveFormat.DOC | Read-only constant. Document.save() does not support DOC — raises ValueError. |
Light Document Model
The light_document_model property provides access to the internal document structure (ldm.Document, defined in light_document_model.py). This pydantic BaseModel exposes parsed paragraphs, tables, styles, sections, headers, footers, and structural queries like find_style() and headings(). Most users do not need to access the LDM directly — the public Document methods above cover standard workflows.
See Also
- SaveFormat — output format constants used with
Document.save() - LoadFormat — input format constants for explicit format specification
- PdfSaveOptions — fine-grained PDF export control (compliance level, compression, font embedding)
- MarkdownSaveOptions — Markdown export options (table alignment, image handling, list mode)