Tags:
create new tag
view all tags

Pixel Patch Device Method Proposal

At various points in Ghostscript, we are forced to convert from an array of pixel values to a series of fill_rectangle calls. While this is, surprisingly, not the massive bottleneck we might expect, avoiding this would save us time, and admit of some other optimisations.

I'm pondering some new device methods that should help this. This page is to record my train of thought, and to provide a place for comments from other interested parties.

First attempt

We could add:

int begin_pixel_patch(gx_device *dev, fixed x, fixed y, fixed w, fixed h, int src_x, int src_y, int landscape, gx_pixel_patch_enum_t **ppenum);

together with:

int pixel_patch_data(gx_pixel_patch_enum_t *penum, const byte **data, int data_x);

int pixel_patch_end(gx_pixel_patch_enum_t *penum);

The idea would be that when something has a block of pixel data to pass across to a device, it can call this function rather than break it down into rectangles.

The classic example of where we need this is inside functions like image_render_mono or image_render_color_icc_portrait which convert scanlines worth of data into the target colorspace, and then end up having to convert it down to individual fill_rectangle calls.

The default implementation of this routine can thus be lifted from image_render_mono and friends, and we can call it with no significant decrease in speed in the worst case.

Some salient points:

  • The pixel data needs to be in the correct format for the output device (that is to say, it is packed gx_color_index values, of size round_up_to_power_of_2_bits(dev->color_info.depth);

  • The x/y/w/h being fixed rather than ints means that we can use dda to get exactly matching results with the existing code.

  • If we allow for w/h being negative, then we can get the 4 portrait orthogonal flips.

  • If we allow for the boolean landscape flag, that gets us to all 8 orthogonal flips.

Problems with this:

  • This doesn't account for the skewed case, and it would be nice to capture that too so that the caller doesn't ever need to drop to rectangles.

  • This has problems if the caller decides that lines are skipped in that special care will be needed to call pixel_patch_data even in cases where the caller knows that lines are skipped.

Second version

We could add:

   int begin_pixel_patch(gx_device *dev,
                                      const gx_dda_fixed_point *row,
                                      const gx_dda_fixed_point *column,
                                      gx_pixel_patch_enum_t **ppenum);

together with:

   int pixel_patch_data(gx_pixel_patch_enum_t *penum,
                                    const byte **data, int data_x);

   int pixel_patch_end(gx_pixel_patch_enum_t *penum);

gx_pixel_path_enum_t would contain:

  • Function pointers for pixel_patch_data and pixel_patch_end
  • Copies of the dda's passed to the init.

This even more closely follows the internals of image_render_mono etc. This copes with portrait/landscape/skew. Once again implementers can spot the 1:1 and 1:2 etc cases easily.

We now pass gx_dda_fixed_point across the device interface where we have never done so before (though they are part of the image enumerator, so they are effectively there anyway).

This still has problems with skipped lines.

Third version:

We could add:

   int begin_pixel_patch(gx_device *dev,
                                      const gx_dda_fixed_point *row,
                                      const gx_dda_fixed_point *column,
                                      gx_pixel_patch_enum_t **ppenum);

together with:

   int pixel_patch_data_required(gx_pixel_patch_enum_t *penum);

   int pixel_patch_data(gx_pixel_patch_enum_t *penum,
                                    const byte **data, int data_x);

   int pixel_patch_end(gx_pixel_patch_enum_t *penum);

gx_pixel_path_enum_t would contain:

  • Function pointers for pixel_patch_data_required, pixel_patch_data and pixel_patch_end
  • (Internal) Copies of the dda's passed to the init.

This even more closely follows the internals of image_render_mono etc. This copes with portrait/landscape/skew. Once again implementers can spot the 1:1 and 1:2 etc cases easily.

We now pass gx_dda_fixed_point across the device interface where we have never done so before (though they are part of the image enumerator, so they are effectively there anyway).

Before preparing the data for each line, we call pixel_patch_data_required. If this returns 0, we do not need to send the data and can skip the time spent preparing it. If it returns 1, then we prepare the data as usual and pass it in.

Fourth version

Ken has commented that enumerators are a pain for things like subclassing devices. Accordingly, he'd rather see it be done directly via device methods. He also objects to the use of multiple device methods, and would prefer one method with a reason code.

This would give us an implementation like this:

   typedef enum  {
       pixel_patch_begin = 0,
       pixel_patch_data_needed = 1,
       pixel_patch_data = 2,
       pixel_patch_end = 3
   };

   typedef struct {
       union {
           struct {
               const gx_dda_fixed_point *row;
               const gx_dda_fixed_point *column;
           } init;
           struct {
               const unsigned char **buffer[GX_DEVICE_COLOR_MAX_COMPONENTS];
               int data_x;
           } data;
       } u;
   } pixel_patch_data;

   int pixel_patch(gx_device *dev,
                   pixel_patch_reason reason,
                   const pixel_patch_data *data);

Implementations of this

The ultimate point of this, of course, is for devices to give implementations of this that actually confer a benefit.

For devices based on (byte or above wide) memory devices, we can have an implementation that spots the 1:1, 1:2 case and directly memcpys/doubles into the output buffers.

Similarly, for such devices, we can bresenham colour values directly into the output buffers, avoiding the fill_rect overhead.

Questions:

Why do we get the device function to do the scaling?

Why not get the caller to do any scaling, and just pass 1:1 data?

Because that would force a second copy of the data, and we're trying to avoid that.

-- Robin Watts - 2018-04-18

Comments

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2018-04-18 - RobinWatts
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2014 Artifex Software Inc