Auto-generated C++ and Python APIs for mupdf.


As of 2021-2-4:

Customer page:


  • We generate C++ wrapper functions for most fz_ and pdf_ functions. These wrapper convert fz_ exceptions into C++ exceptions, and use auto-generated per-thread fz_context's.
  • We generate C++ class wrappers for most fz_ and pdf_ structs.
  • We auto-detect fz_*() and pdf_*() fns suitable for wrapping as constructors, methods or static methods.
  • Some generated classes have auto-generated support for iteration.
  • We add various custom methods/constructors.
  • Wrapper class constructors and methods provide access to 1270 fz_*() and pdf_*() fns, out of a total of 1513 wrapped fz_*() and pdf_*() functions. Most of the omitted functions don't take struct args, e.g. fz_strlcpy().
  • The C++ API is built by mupdf:scripts/ It requires clang-6 or clang-7, and python-clang.


  • Python API is generated by running SWIG on the C++ API's header files.
  • Python API is enough to allow implementation of mutool in Python - see mupdf:scripts/ and mupdf:scripts/
  • Building the Python API requires swig-3 or swig-4.


  • We work on nuc1 and peeved and jules-laptop.
  • We require:
    • python-clang (version 6 or 7)
    • python3-dev (version 3.6 or later)
    • swig (version 3 or 4)


  • We use clang to extract doxygen-style comments, and propagate them into generated header files.
  • If swig is version 4+, we tell it to propagate comments into the generated

Here are Doxygen html representations of the mupdf C API and the generated mupdf C++ API:

And pydoc html representation of the generated API:

mudpdf:scripts/mutool*.py are an incomplete Python re-implementation of the mutool application.


Auto-generated C++ headers and implementation files, plus test outputs (.html files have syntax-colouring):

Information about fz_*() and pdf_*() fns that are not in the class-based API:

These were generated by the programme, which also runs g++ and SWIG to generate a Python module that gives a Python API:

The generated Python module is tested by the (rather hacky) test_mupdfcpp_swig() function in For convenience, this function and its output can be viewed in

Integration with mupdf git.

       [generated file]
       [generated file, implements C++ API]
       [generated file, implements Python API]
       [generated file, implements Python API internals]
       [implements C++ API]
       [implements Python API]
       [implements Python API internals]
                    *.cpp [generated files]
                        *.h [generated files]
                mupdfcpp_swig.cpp [generated by SWIG]
                mupdf_swig.i [generated by mupdfwraw.pynput to SWIG]


To build:

    cd mupdf/
    ./scripts/ -b all -t

Comparison with PyMuPDF

  • Am writing equivalent code to some example programmes in
  • Method names are usually different, because PyMuPDF uses its own names instead of basing names on the underlying MuPDF API.
  • Have made various additions/fixes to (for details see:;a=summary)
    • Added Document::lookup_metadata() method overload that returns std::string.
    • added global const std::vector<std::string> metadata_keys.
    • Changed Outline iteration to include depth information.
    • Fixed ref-counting in Page::load_links().
    • fixed Page::search_page() to return std::vector.
    • Added python wrapper for PdfDocument::page_write() out-params
    • Using improved scheme for wrapping functions/methods with out-params - instead of trying to use SWIG's typemaps, which are very clumy in the context of and seemingly more designed for custom-written .i files, we now use simple auto-generated C functions to package up out-params into a struct, then extract into a tuple in auto-generated Python.
    • Provide two wrappers for mupdf.Buffer.buffer_extract() - return raw C (size, data) values or return a Python bytes. The former can be used to construct a mupdf.Stream constructor (doesn't seem possible to convert a Python bytes back into (size, data)). [This allows us to mimic PyMuPDF-Utilities/demo/]
  • PyMuPDF has more information about links - fitz.LINK_GOTO, LINK_GOTOR, fitz.LINK_LAUNCH, fitz.LINK_URI.
  • PyMuPDF has abstraction for writing image files which calls fz_save_pixmap_as_png() or fz_save_pixmap_as_pnm() etc, depending on the filename.
  • PyMuPDF can copy a TOC into a PdfDocument.


Equivalent code using mupdfwrap:

#! /usr/bin/env python3

import mupdf

import os
import sys

assert len(sys.argv) == 7
filename, page_num, zoom, rotate, output, needle = sys.argv[1:]
page_num = int(page_num)
zoom = int(zoom)
rotate = int(rotate)

document = mupdf.Document(filename)

print(f'Document {filename} has {document.count_pages()} pages.')
print(f'Metadata Information:')
for key in mupdf.metadata_keys:
    value = document.lookup_metadata(key)
    print(f'    {key}: {value!r}')

outline = mupdf.Outline(document)
for o in outline:
    print(f'    {" "*4*o.m_depth}{o.m_depth}: {o.m_outline.title()}')

if page_num > document.count_pages():
    raise SystemExit(f'page_num={page_num} is out of range - {filename} has {document.count_pages()} pages')

page = document.load_page(page_num)
links = page.load_links()
if links:
    print(f'Links on page {page_num}:')
    for link in links:
        if link.m_internal:
            print(f'    extern={mupdf.is_external_link(link.uri())}: {link.uri()}')
    print(f'No links on page {page_num}')

trans = mupdf.Matrix.scale(zoom / 100.0, zoom / 100.0).pre_rotate(rotate)

pixmap = page.new_pixmap_from_page(trans, mupdf.Colorspace(mupdf.Colorspace.Fixed_RGB), alpha=False)

def save_pixmap(path):
    suffix = os.path.splitext(path)[1]
    if 0: pass
    elif suffix == '.pam':   pixmap.save_pixmap_as_pam(path)
    elif suffix == '.pbm':   pixmap.save_pixmap_as_pbm(path)
    elif suffix == '.pcl':   pixmap.save_pixmap_as_pcl(path, append=0, options=mupdf.PclOptions())
    elif suffix == '.pclm':  pixmap.save_pixmap_as_pclm(path, append=0, options=mupdf.PclmOptions())
    elif suffix == '.pdfocr':pixmap.save_pixmap_as_pdfocr(path, append=0, options=mupdf.PdfocrOptions())
    elif suffix == '.pkm':   pixmap.save_pixmap_as_pkm(path)
    elif suffix == '.png':   pixmap.save_pixmap_as_png(path)
    elif suffix == '.pnm':   pixmap.save_pixmap_as_pnm(path)
    elif suffix == '.ppm':   pixmap.save_pixmap_as_ppm(path)
    elif suffix == '.ps':    pixmap.save_pixmap_as_ps(path, append=0)
    elif suffix == '.psd':   pixmap.save_pixmap_as_psd(path)
    elif suffix == '.pwg':   pixmap.save_pixmap_as_pwg(path, append=0, pwg=mupdf.PwgOptions())
        raise Exception(f'Unrecognised output format: {path}')
hit_quads = page.search_page(needle, max=16)
print(f'search text {needle!r} found {len(hit_quads)} on the page')
for hit_quad in hit_quads:


-- Julian Smith - 2020-03-04


Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r23 - 2021-02-05 - JulianSmith
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2014 Artifex Software Inc