Generating DOCX files
Overview
- mutool has experimental support for generating DOCX and ODT files suitable for loading and editing with Microsoft Word, LibreOffice or similar, by examining the input document for glyphs that make up words, lines and paragraphs.
- We can also output the extracted text in HTML and text format.
- If built with the Extract library, gs supports DOCX output.
- For DOCX/ODT, rotated text is placed inside a text box in the generated output file.
- Note that text boxes do not seem to be supported by WordPad.
- (mutool only) Images are extracted and placed inline to nearby text in the generated output file.
- We also have limited support for detecting tables.
Usage
mutool convert -o foo.docx foo.pdf
- With mutool, use the convert command with a
.docx
or .odt
output suffix, for example:mutool convert -o foo.docx foo.pdf
- Putting rotated text into text boxes can be disabled with
-O rotation=no
.
- For HTML or text output, specify the
docx
device explicitly and use the html
or text
options, for example: mutool convert -F docx -O html -o foo.html foo.pdf
- With gs, use the docxwrite device, for example:
gs -sDEVICE=docxwrite -o foo.docx foo.pdf
Examples
Artifex Licensing
Artifex offers a dual licensing model for MuPDF. Meaning we offer both commercial licenses or the GNU Affero General Public License (AGPL).
While Open Source software may be free to use, that does not mean it is free of obligation. To determine whether your intended use of MuPDF is suitable for the AGPL, please read the full text of the AGPL license agreement on the FSF web site.