Take a moment to read these if this is a new topic for you.
Code is on github
I’ll refer to this completed Job as I go through the work package set up.


A work package defines the stages, their sequencing and how to split up the work in each stage. Each instance of a work package gets a unique ID which is used to associate the data from stage with its work package. Normally I would set up the package in a namespace named “Job”. This would also contain the parameters to reuse the Work package with various parameters that together would make up multiple jobs. Let’s look at the example stage definition for

Here’s what the properties mean
 Property  example value  purpose
 stage  “sqlScripts”  used to refer to this stage from other stages
 stageTitle  “Make sql scripts”  description appears in the dashboard
 maxThreads  24  The maximum number of threads to run in parallel. If you don’t want to split up the work, use 1
 minChunkSize  15  The minimum number of items to include in a chunk. maxThreads, minChunkSize and the number of Items In will together control how the job is split and how many threads will ultimately be created
 instances  Describes what to run server side for this stage
 .namespace  “Orchestration”  The namespace that contains the server side function. If the function is in the global space (I recommend putting all functions in a namespace), then use “”
 .method  “makeSqlScripts”  The function or namespace method to execute server side
 .arg  fusion  Any arguments specific to the this job. Passing different values will allow you to use the same work package for multiple purposes. These arguments will get passed to your function
 skipReduce  false  Default is false. If true, a “real” reduce to combine the results of all the chunks in a stage is skipped, and future stages will access data in their uncombined form.
 logReduce  false  It can be useful for debugging to examine the results from a stage. True will log them to the browser console
 dataCount  “getData”  Can be used to identify the stage to get the count of input data to allocate across threads. Normally it’s the previous stage and can be omitted.


In this example application, I’m using the same work package to process 3 different Fusion tables. Use this pattern to create jobs that vary the work package by passing an argument with the changed parameters. Here’s the entire job workspace for the example application, and the pattern you should follow when creating Job and Work packages.



Why not join our forum, follow the blog or follow me on Twitter to ensure you get updates when they are available.