How to do stuff with the cluster

How to remove jobs from the queue

The current queue of jobs is kept on casper in /home/regression/cluster/queue.lst

It's a dead simple 1 line per job format.

To remove a job from the queue, load it into an editor, remove the line, save it back. Do this fast, because obviously there is a race condition going on.

Removing a job from the queue will NOT stop the current run.

How to stop the current run.

Well, the nasty way is to kill the clustermaster.pl process. A new one will start up ~20 seconds later, spot that the old one is in problems, and commence a 5 minute timeout and restart process. Sometimes you may have to resort to this.

A slightly nicer way (that avoids the 5 minutes) is to: kill `cat clustermaster.pid` && rm clustermaster.pid

The nicest way is to 'touch abort.job' in /home/regression/cluster, but this relies on the clustermaster not actually being jammed, or in an infinite loop etc.

Note that both these things will not remove the current job from the queue, so the new cluster master will start up and restart the old job. If you want to kill a job and restart, then edit queue.lst first, then stop the current run.

How to reset the cluster after a force push.

Suppose you have commits A, then B, then C. C fails disastrously when cluster tested, so we want to force push the golden repo back to B. Do so.

This leaves the cluster in a confused state, for 2 reasons.

Firstly the cluster bases it's 'difference to previous job' reports on the most recent entry in it's database of tabs. We can solve that simply by doing:

rm -rf {,mupdf-,mujstest-}archive/<sha-for-C>*

The second problem is that the cluster checks every 20 seconds or so for master having changed. This check goes wrong if master has not moved 'forwards'. To fix this:

For ghostpdl:

cd /home/regression/cluster/ghostpdl && git checkout master && git reset --hard <sha-for-A>

For mupdf:

cd /home/regression/cluster/mupdf

git checkout incoming_master && git reset --hard <sha-for-A>

git checkout master && git reset --hard <sha-for-A>

(Note A, not B! We we want to make the cluster rerun B, so we tell it that the last job it knows about is the one before B)

How to set up a cluster release test

Log into casper as regression.

Change into /home/regression/cluster.

Look into auto/release for an appropriate directory. Either copy an existing one, or make a new one. In this example, we'll update the gs release test, so we'll reuse the existing 'gs' directory.

Inside that directory there should be a jobdef.txt file that says what to test. Lines beginning with # are comments. All other lines describe a job to run.

Typically we run 3 jobs. The first job generates the reference. For example:

product <gs> ref <ghostpdl-9.20-regression-test> options <extended>

Test 'gs' as a reference run on tag (or SHA) ghostpdl-9.20-regression-test, using the 'extended' set of tests.

The second job runs the target revision, and compares back to the reference we just generated.

product <gs> ref <ghostpdl-9.20-regression-test> rev <83b54c5> options <extended>

Test 'gs' on tag (or SHA) 83b54c5, against the given reference (ghostpdl-9.20-regression-test) using the 'extended' set of tests.

Finally we generate the bmpcmp for those results:

product <bmpcmp> ref <ghostpdl-9.20-regression-test> rev <83b54c5> options <extended cull -w 3 -t 32>

Run a 'bmpcmp' on the results between those 2 commits. "cull" some of the results to avoid generating too many (i.e. if the ppmraw shows a difference, don't bother generating the pgm or the pbm, as they will just show the same difference). Use the "extended" tests. Allow for a slight window and threshold for bmpcmp differences.

Once you have edited the file so you're happy:

./enqueueAuto.pl auto/release/gs

The results will then be mailed out, and can be viewed as:

https://ghostscript.com/regression/cgi-bin/clustermonitor.cgi?report=auto/release/gs

and the bmpcmp at:

https://ghostscript.com/~regression/release__gs/

Note, the / in "release/gs" has been replaced by a double underscore in the above bmpcmp link.

As a trick to reduce the number of needless differences between release X and X+1, it is worth checking out back to X, cherry picking the commit that changes the release number, and commiting that with a tag of X-regression-test to golden. Possibly it may be worth pulling other commits in to this branch as required. Note that currently regressions and references can only be tags or SHAs, not branch names.

-- Robin Watts - 2017-02-27

See also:

ClusterNodes, ClusterStructure, ClusterWork

Comments

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2017-08-16 - ChrisLiddell
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2014 Artifex Software Inc