Tags:
tag this topic
create new tag
view all tags
%TOC% ---+ Auto-generated C++ and Python APIs for mupdf. ---++ Status As of 2021-2-4: * Comparison with !PyMuPDF. * No surprises so far. * For more details see: [[#Comparison_with_PyMuPDF]] Customer page: * MuPDFWrap ---+++ C++ * We generate C++ wrapper functions for most fz_ and pdf_ functions. These wrapper convert fz_ exceptions into C++ exceptions, and use auto-generated per-thread fz_context's. * We generate C++ class wrappers for most fz_ and pdf_ structs. * We auto-detect fz_*() and pdf_*() fns suitable for wrapping as constructors, methods or static methods. * Some generated classes have auto-generated support for iteration. * We add various custom methods/constructors. * Wrapper class constructors and methods provide access to 1270 fz_*() and pdf_*() fns, out of a total of 1513 wrapped fz_*() and pdf_*() functions. Most of the omitted functions don't take struct args, e.g. fz_strlcpy(). * The C++ API is built by mupdf:scripts/mupdfwrap.py. It requires clang-6 or clang-7, and python-clang. ---+++ Python * Python API is generated by running SWIG on the C++ API's header files. * Python API is enough to allow implementation of mutool in Python - see mupdf:scripts/mutool.py and mupdf:scripts/mutool_draw.py. * Building the Python API requires swig-3 or swig-4. ---+++ General * We work on nuc1 and peeved and jules-laptop. * We require: * python-clang (version 6 or 7) * python3-dev (version 3.6 or later) * swig (version 3 or 4) ---+++ Comments * We use clang to extract doxygen-style comments, and propagate them into generated header files. * If swig is version 4+, we tell it to propagate comments into the generated mupdf.py. Here are Doxygen html representations of the mupdf C API and the generated mupdf C++ API: * https://ghostscript.com/~julian/mupdf/include/html/index.html * https://ghostscript.com/~julian/mupdf/platform/c++/include/html/index.html And pydoc html representation of the generated mupdf.py API: * https://ghostscript.com/~julian/mupdf/build/shared-release/mupdf.html ---++ mutool.py mudpdf:scripts/mutool*.py are an incomplete Python re-implementation of the mutool application. ---++ Files Auto-generated C++ headers and implementation files, plus test outputs (.html files have syntax-colouring): * https://ghostscript.com/~julian/mupdf/platform/c++/ * https://ghostscript.com/~julian/mupdf/platform/python Information about fz_*() and pdf_*() fns that are not in the class-based API: * https://ghostscript.com/~julian/mupdf/platform/c++/fn_usage.txt These were generated by the mupdfwrap.py programme, which also runs g++ and SWIG to generate a Python module that gives a Python API: * https://git.ghostscript.com/?p=user/julian/mupdf.git;a=blob;f=scripts/mupdfwrap.py;hb=HEAD The generated Python module is tested by the (rather hacky) test_mupdfcpp_swig() function in mupdfwrap.py. For convenience, this function and its output can be viewed in https://ghostscript.com/~julian/mupdf/platform/python. ---++ Integration with mupdf git. <literal> <pre> mupdf/ build/ shared-release/ libmupdf.so [generated file] libmupdfcpp.so [generated file, implements C++ API] mupdf.py [generated file, implements Python API] _mupdf.so [generated file, implements Python API internals] shared-debug/ libmupdf.so libmupdfcpp.so [implements C++ API] mupdf.py [implements Python API] _mupdf.so [implements Python API internals] platform/ c++/ implementation/ *.cpp [generated files] include/ mupdf/ *.h [generated files] python/ mupdfcpp_swig.cpp [generated by SWIG] mupdf_swig.i [generated by mupdfwraw.pynput to SWIG] scripts/ mupdfwrap.py jlib.py mutool.py mutool_draw.py </pre> </literal> See: * https://git.ghostscript.com/?p=user/julian/mupdf.git;a=summary To build: <literal> <pre> cd mupdf/ ./scripts/mupdfwrap.py -b all -t </pre> </literal> ---++ Comparison with !PyMuPDF * Am writing equivalent code to some example programmes in https://github.com/pymupdf/PyMuPDF-Utilities. * Method names are usually different, because !PyMuPDF uses its own names instead of basing names on the underlying !MuPDF API. * Have made various additions/fixes to mypdfwrap.py (for details see: https://git.ghostscript.com/?p=user/julian/mupdf.git;a=summary) * Added Document::lookup_metadata() method overload that returns std::string. * added global const std::vector<std::string> metadata_keys. * Changed Outline iteration to include depth information. * Fixed ref-counting in Page::load_links(). * fixed Page::search_page() to return std::vector<Quad>. * Added python wrapper for PdfDocument::page_write() out-params * Using improved scheme for wrapping functions/methods with out-params - instead of trying to use SWIG's typemaps, which are very clumy in the context of mupdfwrap.py and seemingly more designed for custom-written .i files, we now use simple auto-generated C functions to package up out-params into a struct, then extract into a tuple in auto-generated Python. * Provide two wrappers for mupdf.Buffer.buffer_extract() - return raw C (size, data) values or return a Python bytes. The former can be used to construct a mupdf.Stream constructor (doesn't seem possible to convert a Python bytes back into (size, data)). [This allows us to mimic PyMuPDF-Utilities/demo/pdf-converter.py.] * !PyMuPDF has more information about links - fitz.LINK_GOTO, LINK_GOTOR, fitz.LINK_LAUNCH, fitz.LINK_URI. * !PyMuPDF has abstraction for writing image files which calls fz_save_pixmap_as_png() or fz_save_pixmap_as_pnm() etc, depending on the filename. * !PyMuPDF can copy a TOC into a PdfDocument. * Looks like !PyMuPDF has fairly elaborate support for redactions. Makes use of pdf_redact_page() and then writes on top of the redaction? !PyMuPDF: https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/demo/demo.py Equivalent code using mupdfwrap: <verbatim> #! /usr/bin/env python3 import mupdf import os import sys assert len(sys.argv) == 7 filename, page_num, zoom, rotate, output, needle = sys.argv[1:] page_num = int(page_num) zoom = int(zoom) rotate = int(rotate) document = mupdf.Document(filename) print('') print(f'Document {filename} has {document.count_pages()} pages.') print('') print(f'Metadata Information:') print(f'mupdf.metadata_keys={mupdf.metadata_keys}') for key in mupdf.metadata_keys: value = document.lookup_metadata(key) print(f' {key}: {value!r}') print('') outline = mupdf.Outline(document) for o in outline: print(f' {" "*4*o.m_depth}{o.m_depth}: {o.m_outline.title()}') if page_num > document.count_pages(): raise SystemExit(f'page_num={page_num} is out of range - {filename} has {document.count_pages()} pages') page = document.load_page(page_num) links = page.load_links() if links: print(f'Links on page {page_num}:') for link in links: if link.m_internal: print(f' extern={mupdf.is_external_link(link.uri())}: {link.uri()}') else: print(f'No links on page {page_num}') trans = mupdf.Matrix.scale(zoom / 100.0, zoom / 100.0).pre_rotate(rotate) pixmap = page.new_pixmap_from_page(trans, mupdf.Colorspace(mupdf.Colorspace.Fixed_RGB), alpha=False) def save_pixmap(path): suffix = os.path.splitext(path)[1] if 0: pass elif suffix == '.pam': pixmap.save_pixmap_as_pam(path) elif suffix == '.pbm': pixmap.save_pixmap_as_pbm(path) elif suffix == '.pcl': pixmap.save_pixmap_as_pcl(path, append=0, options=mupdf.PclOptions()) elif suffix == '.pclm': pixmap.save_pixmap_as_pclm(path, append=0, options=mupdf.PclmOptions()) elif suffix == '.pdfocr':pixmap.save_pixmap_as_pdfocr(path, append=0, options=mupdf.PdfocrOptions()) elif suffix == '.pkm': pixmap.save_pixmap_as_pkm(path) elif suffix == '.png': pixmap.save_pixmap_as_png(path) elif suffix == '.pnm': pixmap.save_pixmap_as_pnm(path) elif suffix == '.ppm': pixmap.save_pixmap_as_ppm(path) elif suffix == '.ps': pixmap.save_pixmap_as_ps(path, append=0) elif suffix == '.psd': pixmap.save_pixmap_as_psd(path) elif suffix == '.pwg': pixmap.save_pixmap_as_pwg(path, append=0, pwg=mupdf.PwgOptions()) else: raise Exception(f'Unrecognised output format: {path}') save_pixmap(output) hit_quads = page.search_page(needle, max=16) print(f'search text {needle!r} found {len(hit_quads)} on the page') for hit_quad in hit_quads: pixmap.invert_pixmap_rect(hit_quad.rect_from_quad().irect_from_rect()) save_pixmap(f'dl-{output}') print('finished') </verbatim> ---- -- %USERSIG{JulianSmith - 2020-03-04}% ---++ Comments
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r24
<
r23
<
r22
<
r21
<
r20
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r24 - 2021-02-09
-
JulianSmith
Home
Site map
GSView web
Ghostscript web
Main web
MuPDF web
Sandbox web
TWiki web
Main Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2014 Artifex Software Inc