C++ and Python APIs for MuPDF


API Stability

The C++ and Python MuPDF APIs are currently an alpha release and liable to change.


  • Provides C++ functions that wrap most fz_ and pdf_ functions.
  • Provides C++ classes that wrap most fz_ and pdf_ structs.
  • Class methods provide access to most of the underlying C API functions (except for functions that don't take struct args such as fz_strlcpy()).
  • fz_ exceptions are converted into C++ exceptions.
  • Functions and methods do not take fz_context arguments. (Automatically-generated per-thread contexts are used internally.)
  • Provides a small number of extensions beyond the basic C API:
    • Some generated classes have extra support for iteration.
    • Some custom class methods and constructors.

The Python MuPDF API

  • Generated from the C++ MuPDF API's header files.
  • Allows implementation of mutool in Python - see mupdf:scripts/mutool.py and mupdf:scripts/mutool_draw.py.

Building the C++ and Python MuPDF APIs


  • Linux or OpenBSD.
  • clang-python version 6 or 7. [For example Debian python-clang, OpenBSD py3-llvm.]
  • python3-dev version 3.6 or later.
  • SWIG version 3 or 4.

Build MuPDF shared library, C++ and Python MuPDF APIs, and run basic tests:

    git clone --recursive git://git.ghostscript.com/mupdf.git
    cd mupdf
    ./scripts/mupdfwrap.py -b all -t

As above but do a debug build:

    ./scripts/mupdfwrap.py -d build/shared-debug -b all -t

For more information:

  • Run ./scripts/mupdfwrap.py -h.
  • Read the doc-string at beginning of scripts/mupdfwrap.py.

Building auto-generated documentation

Build HTML documentation for the C, C++ and Python APIs (using Doxygen and pydoc):

    ./scripts/mupdfwrap.py --doc all

This will generate these documentation roots:

  • include/html/index.html [C API]
  • platform/c++/include/html/index.html [C++ API]
  • build/shared-release/mupdf.html [Python API]

Note that the content is ultimately all generated from the MuPDF C header file comments.

Using the Python API

Run python code with:

PYTHONPATH=build/shared-release LD_LIBRARY_PATH=build/shared-release

(This enables Python to find the mupdf module, and enables the system dynamic linker to find the shared libraries that implement the underlying C, C++ and Python MuPDF APIs.)

Minimal Python code that uses the mupdf module:

    import mupdf
    document = mupdf.Document('foo.pdf')

A simple example Python test script (run by scripts/mupdfwrap.py -t) is:

  • scripts/mupdfwrap_test.py

More detailed usage of the Python API can be found in:

  • scripts/mutool.py
  • scripts/mutool_draw.py

Here is some example code that shows all available information about document's Stext blocks, lines and characters:

#!/usr/bin/env python3

import mupdf

def show_stext(document):
    Shows all available information about Stext blocks, lines and characters.
    for p in range(document.count_pages()):
        page = document.load_page(p)
        stextpage = mupdf.StextPage(page, mupdf.StextOptions())
        for block in stextpage:
            block_ = block.m_internal
            print(f'block: type={block_.type} bbox=({block_.bbox.x0:6.2f} {block_.bbox.y0:6.2f} {block_.bbox.x1:6.2f} {block_.bbox.y1:6.2f})')
            for line in block:
                line_ = line.m_internal
                print(f'    line: wmode={line_.wmode}'
                        + f' dir=({line_.dir.x} {line_.dir.y})'
                        + f' bbox=({line_.bbox.x0:6.2f} {line_.bbox.y0:6.2f} {line_.bbox.x1:6.2f} {line_.bbox.y1:6.2f})'
                for char in line:
                    char_ = char.m_internal
                    print(f'        char: {chr(char_.c)!r} c={char_.c:4} color={char_.color}'
                            + f' origin=({char_.origin.x:6.2f} {char_.origin.y:6.2f})'
                            + f' quad=('
                                +  f'ul=({char_.quad.ul.x:6.2f} {char_.quad.ul.y:6.2f})'
                                + f' ur=({char_.quad.ur.x:6.2f} {char_.quad.ur.y:6.2f})'
                                + f' ll=({char_.quad.ll.x:6.2f} {char_.quad.ll.y:6.2f})'
                                + f' lr=({char_.quad.lr.x:6.2f} {char_.quad.lr.y:6.2f})'
                                + f')'
                            + f' size={char_.size:6.2f}'
                            + f' font=('
                                +  f'is_mono={char_.font.flags.is_mono}'
                                + f' is_bold={char_.font.flags.is_bold}'
                                + f' is_italic={char_.font.flags.is_italic}'
                                + f' ft_substitute={char_.font.flags.ft_substitute}'
                                + f' ft_stretch={char_.font.flags.ft_stretch}'
                                + f' fake_bold={char_.font.flags.fake_bold}'
                                + f' fake_italic={char_.font.flags.fake_italic}'
                                + f' has_opentype={char_.font.flags.has_opentype}'
                                + f' invalid_bbox={char_.font.flags.invalid_bbox}'
                                + f' name={char_.font.name}'
                                + f')'

document = mupdf.Document('foo.pdf')

How the build works

Building of MuPDF shared library:

  • Runs make internally.

Generation of the C++ MuPDF API:

  • Uses clang-python to parse MuPDF's C API.
  • Generates C++ code that wraps the basic C interface.
  • Generates C++ classes for each fz_ struct, and uses various heuristics to define constructors, methods and static methods that call fz_() functions.
  • C header file comments are copied into the generated C++ header files.

Generation of the Python MuPDF API:

  • Based on the C++ MuPDF API.
  • Uses SWIG to parse the C++ headers and generate C++ and Python code.
  • Defines some custom-written Python functions and methods.
  • If SWIG is version 4+, C++ comments are converted into Python doc-comments.

Generated files

            shared-release/    [Files needed at runtime]
                libmupdf.so    [implements C MuPDF API]
                libmupdfcpp.so [implements C++ MuPDF API]
                mupdf.py       [implements Python MuPDF API]
                _mupdf.so      [implements Python MuPDF API internals]
                [as shared-release but debug build]
                    mupdf/ [C++ MuPDF API header files]
                    *.cpp [MuPDF C++ implementation files]
                [SWIG build files]

Artifex Licensing

Artifex offers a dual licensing model for MuPDF. Meaning we offer both commercial licenses or the GNU Affero General Public License (AGPL).

While Open Source software may be free to use, that does not mean it is free of obligation. To determine whether your intended use of MuPDF is suitable for the AGPL, please read the full text of the AGPL license agreement on the FSF web site.

With a commercial license from Artifex, you maintain full ownership and control over your products, while allowing you to distribute your products to customers as you wish. You are not obligated to share your proprietary source code and this saves you from having to conform to the requirements and restrictions of the AGPL. For more information, please see our licensing page, or contact our sales team.

Please send any questions, comments or suggestions about this page to: julian.smith@artifex.com

Edit | Attach | Watch | Print version | History: r10 | r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r6 - 2021-03-01 - JulianSmith
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2014 Artifex Software Inc