Difference: MupdfCppWrappers (20 vs. 21)

Revision 212021-02-02 - JulianSmith

Line: 1 to 1
META TOPICPARENT name="JulianSmith"

Auto-generated C++ and Python APIs for mupdf.


As of 2020-5-11:
As of 2021-2-2:
  Customer page:
Line: 121 to 127

Comparison with PyMuPDF

  • Am writing equivalent code to some example programmes in https://github.com/pymupdf/PyMuPDF-Utilities.
  • Method names are usually different, because PyMuPDF uses its own names instead of basing names on the underlying MuPDF API.
  • Have made various additions/fixes to mypdfwrap.py (for details see: https://git.ghostscript.com/?p=user/julian/mupdf.git;a=summary)
    • added Document::lookup_metadata() method overload that returns std::string.
    • added const std::vector<std::string> metadata_keys
    • changed Outline iteration to include depth information
    • Fixed ref-counting in Page::load_links().
    • fixed Page::search_page() to return std::vector.
  • PyMuPDF has more information about links - fitz.LINK_GOTO, LINK_GOTOR, fitz.LINK_LAUNCH, fitz.LINK_URI.
  • PyMuPDF has abstraction for writing image files which calls fz_save_pixmap_as_png() or fz_save_pixmap_as_pnm() etc, depending on the filename.

PyMuPDF: https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/demo/demo.py

Equivalent code using mupdfwrap:

#! /usr/bin/env python3

import mupdf

import os
import sys

assert len(sys.argv) == 7
filename, page_num, zoom, rotate, output, needle = sys.argv[1:]
page_num = int(page_num)
zoom = int(zoom)
rotate = int(rotate)

document = mupdf.Document(filename)

print(f'Document {filename} has {document.count_pages()} pages.')
print(f'Metadata Information:')
for key in mupdf.metadata_keys:
    value = document.lookup_metadata(key)
    print(f'    {key}: {value!r}')

outline = mupdf.Outline(document)
for o in outline:
    print(f'    {" "*4*o.m_depth}{o.m_depth}: {o.m_outline.title()}')

if page_num > document.count_pages():
    raise SystemExit(f'page_num={page_num} is out of range - {filename} has {document.count_pages()} pages')

page = document.load_page(page_num)
links = page.load_links()
if links:
    print(f'Links on page {page_num}:')
    for link in links:
        if link.m_internal:
            print(f'    extern={mupdf.is_external_link(link.uri())}: {link.uri()}')
    print(f'No links on page {page_num}')

trans = mupdf.Matrix.scale(zoom / 100.0, zoom / 100.0).pre_rotate(rotate)

pixmap = page.new_pixmap_from_page(trans, mupdf.Colorspace(mupdf.Colorspace.Fixed_RGB), alpha=False)

def save_pixmap(path):
    suffix = os.path.splitext(path)[1]
    if 0: pass
    elif suffix == '.pam':   pixmap.save_pixmap_as_pam(path)
    elif suffix == '.pbm':   pixmap.save_pixmap_as_pbm(path)
    elif suffix == '.pcl':   pixmap.save_pixmap_as_pcl(path, append=0, options=mupdf.PclOptions())
    elif suffix == '.pclm':  pixmap.save_pixmap_as_pclm(path, append=0, options=mupdf.PclmOptions())
    elif suffix == '.pdfocr':pixmap.save_pixmap_as_pdfocr(path, append=0, options=mupdf.PdfocrOptions())
    elif suffix == '.pkm':   pixmap.save_pixmap_as_pkm(path)
    elif suffix == '.png':   pixmap.save_pixmap_as_png(path)
    elif suffix == '.pnm':   pixmap.save_pixmap_as_pnm(path)
    elif suffix == '.ppm':   pixmap.save_pixmap_as_ppm(path)
    elif suffix == '.ps':    pixmap.save_pixmap_as_ps(path, append=0)
    elif suffix == '.psd':   pixmap.save_pixmap_as_psd(path)
    elif suffix == '.pwg':   pixmap.save_pixmap_as_pwg(path, append=0, pwg=mupdf.PwgOptions())
        raise Exception(f'Unrecognised output format: {path}')
hit_quads = page.search_page(needle, max=16)
print(f'search text {needle!r} found {len(hit_quads)} on the page')
for hit_quad in hit_quads:
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2014 Artifex Software Inc