Difference: ClusterStructure (2 vs. 3)

Revision 32019-05-10 - RobinWatts

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

The Structure of the Artifex Cluster

Line: 14 to 14
 

The cluster master.

Changed:
<
<
Casper, an AWS node, is our clustermaster. This keeps the queue of runs, and for each run in turn farms out jobs around the available resources, and gathers the results in at the end to form the results.
>
>
Casper, an AWS node, is our clustermaster. This is responsible for keeping the queued jobs and farming them out around the nodes.
 
Changed:
<
<
Something (I have yet to figure out what), runs runClustermaster.sh. This checks to see if another instance of runClustermaster.sh is already running - if it is it just exits. If not it goes into an infinite loop, kicking off clustermaster.pl every 10 seconds.
>
>
We actually run 3 parallel clusters, priority 0, 1 and 2. This enables us to run user jobs (level 0, highest priority), git jobs (ones caused by commits, level 1, middle priority) and auto jobs (nightly/weekly jobs and release tests, level 2, low priority).

At the moment priority 0 and 1 are still running together in priority 0 while the code is tested. I will change this in a few days, if no problems are found. The description below assumes I have changed this.

Something (I have yet to figure out what), runs runClustermaster.sh. This checks to see if another instance of runClustermaster.sh is already running - if it is it just exits. If not it goes into an infinite loop, kicking off clustermaster.pl, clustermaster1.pl and clustermaster2.pl every 10 seconds.

clustermaster1.pl and clustermaster2.pl are just symbolic links to clustermaster.pl, so all 3 processes are identical. The first thing they do is to look at the name they have been invoked under and use that as their priority. For (most of) the rest of this description (except where I explicitly say otherwise), I'll just describe the level 0 clustermaster process. The others are pretty much identical, but use suffixed filenames; so whereas the job list for level 0 (clustermaster.pl) goes into jobs, the jobs list for level 1 (clustermaster1.pl) goes into jobs.1.

  clustermaster.pl is the core of the server. It's operation is split into 3 phases.

Phase 1 runs every time it is invoked, and consists of spotting new jobs that need to be added into the queue of jobs.

Changed:
<
<
Phase 2 has clustermaster.pl checking to see if there is another instance of itself running. If there is, and it is too old, it kills the old version and exits. If there is, and it's not too old, it will exit. If there isn't, it proceeds to phase 3.
>
>
Phase 2 has clustermaster.pl checking to see if there is another instance of itself running. If there is, and it is too old, it kills the old version and exits. If there is, and it's not too old, it will exit. If there isn't, it proceeds to phase 3. This dance simply ensures that a) we don't get a clustermaster stuck in an infinite loop, and b) ensures that we have just one of them running at a time.
  Phase 3 involves actually running the job, collating the results, and mailing out the results.
Line: 32 to 38
  The cluster system contains various different copies of the source. /home/regression/cluster/{ghostpdl.mupdf} are git clones of the sources. In addition /home/regression/cluster/user/XXXX/{ghostpdl,mupdf} contain per-user copies (not git clones).
Changed:
<
<
First, the clustermaster checks for new commits having been made to the ghostpdl/mupdf git repos. If found, it queues one job per each new commit.
>
>
At priority 1, the clustermaster checks for new commits having been made to the ghostpdl/mupdf git repos. If found, it queues one job per each new commit (by adding a line to queue.lst).
 
Changed:
<
<
Next the clustermaster checks for user jobs, by looking for various droppings that clusterpush.sh leaves in the users source directories. These include gs.run, ghostscript.run, cluster_command.run. If found the cluster removes these and creates jobs in the job queue.
>
>
At priority 0, the clustermaster checks for user jobs, by looking for various droppings that clusterpush.sh leaves in the users source directories. These include gs.run, ghostscript.run, cluster_command.run. If found the cluster removes these and creates jobs in queue.1.lst.

No checking is done for priority 2 auto jobs, because these are placed directly into the queue.2.lst file by enqueueAuto.pl (typically called from a crontab).

 

Phase 2 in detail

Changed:
<
<
Any clustermaster that gets past phase 2 puts it's process id into clustermaster.pid. This means that when the clustermaster starts up, it can check to see if there is a clustermaster process supposedly running by checking that file. If it exists, it checks the time. No run should take more than 2 hours, so if the time stamp of the pid is older than this, we decide that it's probably dead. To be sure, we remove the clusterMaster.pid file, and wait for 5 minutes. This prevents any other clustermasters from being spawned, and gives any old clustermaster time to shut down neatly.
>
>
Any clustermaster that gets past phase 2 puts its process id into clustermaster.pid. This means that when the clustermaster starts up, it can check to see if there is a clustermaster process supposedly running by checking that file. If it exists, it checks the time. No run should take more than 2 hours, so if the time stamp of the pid is older than this, we decide that it's probably dead. To be sure, we remove the clusterMaster.pid file, and wait for 5 minutes. This prevents any other clustermasters from being spawned, and gives any old clustermaster time to shut down neatly.
  If the pid no longer refers to a clustermaster process, we do similarly.
Line: 46 to 54
 

Phase 3 in detail

Deleted:
<
<
First, we ensure that we have a monitorPort.pl task running. If one isn't running, we start it. This allows the nodes to show their 'heartbeat' to the clustermonitor. See below.
 During this entire phase, we periodically check to see if the PID in clustermaster.pid is ours. If it isn't (or the file has been deleted), we take this as a sign that we have been killed off, and exit.
Changed:
<
<
Next, we check to see if we have any queued runs. If we do, we take the first one, and execute it.
>
>
First, we check to see if we have any queued runs. If we do, we take the first one, and execute it.

We create a job.start file (or job.1.start or job.2.start according to priority) which contains details of the job to run.

Then we call build.pl to get the job list; this goes into jobs. This is a list of lines, one per test to run. The lines are in the format:

   <test> <logfile> true ; <command string>

<<>test<>> is a string that describes the test to be used; a combination of test file name (with / replaced by double _), and the devices/resolution/banding settings. e.g. tests_private__pdf__PDF_1.7_FTS__fts_17_1705.pdf.pdf.pkmraw.300.0

<<>logfile<>> is a string that gives a name to be used for the logfile of the test. e.g. __temp__/tests_private__pdf__PDF_1.7_FTS__fts_17_1705.pdf.pdf.pkmraw.300.0..gs_pdf

<<>command string<>> is a ';' separated list of commands to be executed to run the test. These commands and their outputs are captured to the logfile.

Typically the command string will involve at least one call to ghostscript (or another executable) to produce output.

Simple ghostscript tests will call ghostscript, and pipe its output through md5sum. This md5sum is then echoed to the output for processing when the job results are returned to the clustermaster.

Tests using pdfwrite/xpswrite/ps2write etc will call ghostscript once to output an intermediate file, then again to render that intermediate file. md5sums of both intermediate and final results are sent back in the logs.

Tests using bmpcmp will call the version of ghostscript under test, and then the reference version of ghostscript to generate 2 different bitmaps. These will then be fed into bmpcmp for comparison. In the case of pdfwrite etc, there will be 4 invocations. rsync is used as part of the command string to upload the output of bmpcmp.

Each of these lines will be handed out to nodes during the course of the job.

Then we actually run the job.

The clustermaster starts a service up listening on a port (9000+priority), and goes into a loop.

When a node connects to the service, it sends a bunch of headers, and then either "OK" or "ABORT".

  • One of the headers is "id", a value parrotted back to the server from where the node has read it from job.start. If this doesn't match the id we expect, the node is still trying to run an old job. We tell it to abort.
 
Changed:
<
<
We detect the available machines. We cull this list according to job type (not all machines can run all jobs - windows machines can't run linux jobs, for example).
>
>
  • One of the headers is "node", the node name. This must always be present, or we'll abort the node.
 
Changed:
<
<
Then we call build.pl to get the job list. This goes into job.txt and is a list of commands to be fed out to the client nodes, 1 line per job.
>
>
  • One of the headers is "caps", a string that describes the capabilities of the node. On the first time a node connects, we check to see that the node is suitable for the job; If it is unsuitable, we add it to the 'cull' list and abort the node. On subsequent connections we check to see if the node is in the cull list; if it is, we abort it again.
 
Changed:
<
<
Then we actually run the job. To allow for network outages etc, we try this set of steps up to 5 times until we get success or a non-network related failure.
>
>
  • One of the headers is "failure". This will be set if the node has encountered a failure during a task we have given it (such as building the source, or uploading the results). If it fails during building, we abort the job. If it fails during upload, we tell it to try again.
 
Changed:
<
<
  • The status files for the nodes are cleared.
>
>
  • One of the headers is "busy". This will be set if the node is still busy doing a task we gave it (such as building the source, or uploading the results). It connects in anyway so we know that it is still working on the problem, and hasn't died (so we won't time it out).
 
Changed:
<
<
  • A 'start' file is created for each node in use to trigger it.
>
>
  • The first thing a new node is told is to "BUILD". This involves updating the source and test respositories and then building. This can take a while, so the node continues to connect in periodically to tell us it is "busy" working on it.
 
Changed:
<
<
  • The clustermaster starts a service up listening on a port (9000).
>
>
  • One of the headers is "jobs", followed by 2 numbers. The first is the number of jobs pending (the number of jobs that has been given to the node that have not yet been run). The second is the number of jobs running at the moment. When a node connects with no pending jobs, and we have jobs still to h
 
Changed:
<
<
  • Each node connects to this service in turn to request jobs. They are either given jobs to do, or told to 'standby'.
>
>
  • Each node connects to this service in turn. I to request jobs. They are either given jobs to do, or told to 'standby'.
 
  • If we don't hear a heartbeat from a node every 3 minutes, we assume that that node has died. We therefore put the jobs that have been sent to that node back into the joblist, and carry on without that node.
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2014 Artifex Software Inc