Document

Overview

Document is the main entry point of Aspose.Words FOSS for Python. It provides the public API for loading Word documents (DOC, DOCX, RTF, TXT, Markdown), saving to multiple output formats (PDF, Markdown, TXT), and extracting plain text.

Defined in: aspose/words_foss/document.py

Description

Document is a class in the Aspose.Words FOSS library for Python that exposes 2 methods and 13 properties for programmatic use. It extends BaseModel, inheriting shared functionality from its parent type.

Core capabilities include: optional[union[str, path]]; optional[binaryio]; optional[bytes]. These operations enable developers to integrate document functionality directly into Python applications.

The class also provides the light_document_model property (access the underlying light document model for advanced inspection of document structure (paragraphs, tables, styles, sections)).

Constructor

Signature	Description
`Document(filepath=None, *, stream=None, data=None)`	Load a document from a file path, binary stream, or raw bytes. At least one source must be provided.

Parameters:

Name	Type	Description
`filepath`	`Optional[Union[str, Path]]`	Path to the document file to load.
`stream`	`Optional[BinaryIO]`	Binary stream containing document data (DOCX format only).
`data`	`Optional[bytes]`	Raw bytes of the document content (DOCX format only).

Methods

Signature	Description
`save(output_path, save_format_or_options=None)` → `None`	Save the document to PDF (`SaveFormat.PDF`), Markdown (`SaveFormat.MARKDOWN`), or plain text (`SaveFormat.TEXT`). Pass a `SaveFormat` constant for default settings, or a save-options object (`PdfSaveOptions`, `MarkdownSaveOptions`) for fine-grained control over output.
`get_text()` → `str`	Extract plain text from the loaded document.

Properties

Name	Type	Access	Description
`light_document_model`	`ldm.Document`	Read	Access the underlying light document model for advanced inspection of document structure (paragraphs, tables, styles, sections).

Usage

import aspose.words_foss as aw

# Load a DOCX file
doc = aw.Document("input.docx")

# Extract plain text
text = doc.get_text()

# Save as PDF
doc.save("output.pdf", aw.SaveFormat.PDF)

# Save as Markdown
doc.save("output.md", aw.SaveFormat.MARKDOWN)

Supported Formats

Input Formats

Format	`LoadFormat` Constant	Description
DOCX	`LoadFormat.DOCX`	Office Open XML Word document
DOC	`LoadFormat.DOC`	Legacy Microsoft Word binary format
RTF	`LoadFormat.RTF`	Rich Text Format
TXT	`LoadFormat.TEXT`	Plain text
Markdown	`LoadFormat.MARKDOWN`	CommonMark Markdown
Auto	`LoadFormat.AUTO`	Detect format from file extension (default)

Output Formats

Format	`SaveFormat` Constant	Save Options
PDF	`SaveFormat.PDF`	`PdfSaveOptions` — configure compliance, image compression, font embedding
Markdown	`SaveFormat.MARKDOWN`	`MarkdownSaveOptions` — configure table alignment, image export, list mode
Plain Text	`SaveFormat.TEXT`	—
DOCX	`SaveFormat.DOCX`	Read-only constant. `Document.save()` does not support DOCX — raises `ValueError`.
DOC	`SaveFormat.DOC`	Read-only constant. `Document.save()` does not support DOC — raises `ValueError`.

Light Document Model

The light_document_model property provides access to the internal document structure (ldm.Document, defined in light_document_model.py). This pydantic BaseModel exposes parsed paragraphs, tables, styles, sections, headers, footers, and structural queries like find_style() and headings(). Most users do not need to access the LDM directly — the public Document methods above cover standard workflows.