Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
paexec(1)							     paexec(1)

       paexec -	parallel executor, distribute tasks over network or CPUs

       paexec [options]

       paexec -C [options] command [args...]

       Suppose you have	a long list of tasks that need to be done, for
       example,	you want to convert thousands of .wav audio files to .ogg
       format.	Also suppose that multiple CPUs	are available, e.g. multi-CPU
       system or a cluster consisting of individual computers connected	to the
       network or internet. paexec can efficiently do this work, that is,
       paexec efficiently distributes different	tasks to different processors
       (or computers), receives	the results of processing from them and	sends
       these results to	stdout.

       There are several notions that should be	defined: task, command,
       transport, node.

       Tasks are read by paexec	from stdin and are represented as one line of
       text, i.e. one input line - one task.

       node identifier - remote	computer or CPU	identifier, for	example	CPU
       ordinal number or computer's DNS	name like

       Command - user's	program	that reads one-line task from stdin and	sends
       multiline result	to stdout ending with an end of	task (EOT) marker.
       EOT marker means	"Task is done. I am ready for the next one". After
       sending EOT to stdout, stdout MUST be flushed. Remember that EOT	marker
       MUST NOT	appears	in general result. Otherwise, paexec may hang due to
       deadlock. The default EOT is an empty line. Command may use environment
       variable	PAEXEC_EOT for the end-of-task marker.

       Transport - special program that	helps to run command on	node. It takes
       the node	identifier as its first	argument and command with its
       arguments as the	rest.  Good examples for transport are ssh and rsh.
       Both transport and command may be specified with	their arguments, e.g,
       '/usr/bin/ssh -x' is allowed as a transport program.

       How paexec works.  Commands are run on each node	with a help of
       transport program. Then,	tasks are read from stdin one-by-one and sent
       to free node (exactly one task per node at a time). At the same time
       result lines are	read from command's stdout and are output to paexec's
       stdout. When EOT	marker is obtained from	the node, it is	marked as free
       and becomes ready for the next task. These steps	repeat until the end
       of stdin	is reached and all nodes finish	their job.

       More formally (to better	understand how paexec works):

	  while	not(end_of_stdin) or not(all_nodes_are_free)
	     while there_is_free_node/i	and not(end_of_stdin)
		task = read_task_from_stdin
		send_task_to_node(task,	i)
	     while result_line_from_node_is_available/i
		result = read_result_line_from_node(i)
		if is_EOT(result)

       Note that commands are run once per node, it is not restarted for every

       Also note that output contains result lines (obtained from different
       nodes) in the mixed order. That is, the first line of the output	may
       contain a result	line obtain from the first node, the second line of
       output -	from the second	node, but the third output line	may contain
       result line from	the first node again. It is also not guaranteed	that
       the first line of output	will be	from the first node or from the	first
       task. All result	lines are output as soon as they are read by paexec,
       i.e as soon as they are ready. paexec works this	way for	the efficiency
       reasons.	 You can play with -l, -r and -p options to see	what happens.
       For reordering output line you can use paexec_reorder utility.

       -h    Display help information.

       -V    Display version information.

       -c command
	     Command with its arguments.

       -C    Command and its arguments are specified by	free arguments.

       -t transport
	     Transport program.

       -n +number
	     A number of commands to run in parallel.

       -n nodes
	     List of nodes separated by	space character. The first character
	     must be alphanumeric, `_' or `/'. All other characters are
	     reserved for future extensions.

       -n :filename
	     Filename containing list of nodes,	one per	line.

       -x    Run command specificed by -c for each task. Its stdout is sent to
	     paexec.  If both "-x" and "-g" are	specified, task	is considered
	     failed if command's exit status is	non-zero.

       -X    Implies -x	and ignore command's stdout.

       -r    Include node identifier or	node number (0-based) to the output,
	     i.e. id/number of node that produces this particular output line.
	     This identifier or	number appears before line number if -l	is
	     also applied. Space character is used as a	separator.

       -l    Include a 0-based task number (input line number) to the output,
	     i.e. line number from which this particular output	line was
	     produced.	It appears before pid if -p is also applied. Space
	     character is used as a separator.

       -p    Include pid of paexec's subprocess	that communicates with
	     node+command to the output. Pid prepends the actual result	line.
	     Space character is	used as	a separator.

       -e    When end-of-task marker is	obtained from node, EOT	marker is
	     output to stdout. This option may be useful together with -l
	     and/or -r.

       -E    Imply -e and flushes stdout.

       -d    Turn on a debugging mode (for debugging purposes only).

       -i    Copy input	lines (i.e. tasks) to stdout.

       -I    Imply -i and flushes stdout.

       -s|-g Orgraph of	tasks (partially ordered set) is read from stdin.

	     Instead of	autonomous tasks, graph	of the tasks is	read from
	     stdin.  In	this mode every	task can either	FAIL or	SUCCEED.  As
	     always an empty line output by command means end of task.	The
	     line before it shows an EXIT STATUS of the	task.  The word
	     "failure" means failure, "success"	- success and "fatal" means
	     that the current task is reassigned to another node (and
	     restarted,	of course) (see	option -z).  See examples/1_div_x/cmd
	     for the sample.  An input line (paexec's stdin) should contain
	     either single task	without	spaces inside or two tasks separated
	     by	single space character,	e.g. task1<SPC>task2. task1<SPC>task2
	     line means	that task1 must	be done	before task2 and it is
	     mandatory,	that is	if task1 fail all dependent tasks (including
	     task2) are	also failed recursively.  Tasks	having dependencies
	     are started only after all	dependencies are succeeded. When a
	     task succeeds paexec outputs "success" word just before
	     end_of_task marker	(see -e	or -E),	otherwise "failure" word is
	     output followed by	a list of tasks	failed because of it.


		tasks (examples/make_package/tasks file)

		  devel/autoconf wip/libmaa
		  devel/gmake wip/libmaa
		  wip/libmaa wip/dict-server
		  wip/libmaa wip/dict-client
		  devel/m4 wip/dict-server
		  devel/byacc wip/dict-server
		  devel/byacc wip/dict-client
		  devel/flex wip/dict-server
		  devel/flex wip/dict-client

		command	(examples/make_package/cmd__flex)

		  #!/usr/bin/awk -f
		     print $0
		     if	($0 == "devel/flex")
			print "failure"
			print "success"

		     print ""	    # end of task marker

		output of "paexec -s -l	-c cmd__flex -n	+10 \
			   < tasks"

		  3 devel/autoconf
		  3 success
		  4 devel/gmake
		  4 success
		  7 devel/m4
		  7 success
		  8 devel/byacc
		  8 success
		  9 devel/flex
		  9 failure
		  9 devel/flex wip/dict-server wip/dict-client
		  10 devel/glib2
		  10 success
		  11 devel/libjudy
		  11 success
		  1 textproc/dictem
		  1 success
		  2 wip/libmaa
		  2 success

       -z    If	applied, read/write(2) operations from/to nodes	becomes	not
	     critical. In case paexec has lost connection to the node, it will
	     reassign failed task to another node and, if -s applied, will
	     output "fatal" string to stdout ("success"	+ "failure" +
	     "fatal").	This makes paexec resistant to the I/O errors, as a
	     result you	can create paexec clusters even	over network
	     consisting	of unreliable hosts (Internet?). Failed	hosts are
	     marked as such and	will not be used during	the current run	of

       -Z timeout
	     When -z applied, if a command fails, appropriate node is marked
	     as	broken and is excluded from the	following task distribution.
	     But if -Z applied,	every timeout seconds an attempt to rerun a
	     comand on a failed	node is	made. -Z implies -z. This option makes
	     possible to organize clusters over	unreliable networks/hardware.

       -w    If	-Z option were applied,	paexec exits with error	if ALL nodes
	     failed. With -w it	will not exit and will wait for	restoring

       -m s=success
	     Set an alternative	string for 'success' message.

       -m f=failure
	     Set an alternative	string for 'failure' message.

       -m F=fatal
	     Set an alternative	string for 'fatal' message.  An	empty string
	     for 'fatal' means it will not be output to	stdout in case of
	     fatal error.

       -m t=eot
	     Set an alternative	string for EOT message.

       -m d=delimiter
	     Set an alternative	string for tasks delimiter (for	-g mode).
	     Delimiter should be at most one character.	No delimiter means an
	     entire input line is treated as a task.

       -m w=weight
	     Set an alternative	string for "weight:" message.

       -W num
	     When multiple machines or CPUs are	used for tasks processing, it
	     makes sense to start "heavier" tasks as soon as possible in order
	     to	minimize total calculatiion time.  If -W is specified, special
	     weight is assigned	to each	tasks which is used for	reordering
	     tasks.  If	num is 0, weights themselves are used for reordering
	     tasks.  The bigger	weight is, the more priority of	the task is.
	     If	num is 1, the total weight of task is a	sum of its own weight
	     (specified	on input) and weights of all tasks depending on	it
	     directly or indirectly.  If num is	2, the total weight of task is
	     a maximum value of	task's own weight and weights of all tasks
	     depending on it directly or indirectly.  Weights are specified
	     with a help of "weight:" keyword (unless -m w= was	specified).
	     If	weight is not specified, it defaults to	1.  The	following is
	     the example for input graph of tasks with weights.

	       weight: gtk2 30
	       weight: glib2 20
	       gtk2 firefox
	       weight: firefox 200
	       glib2 gtk2
	       weight: qt4 200
	       weight: kcachegrind 2
	       qt4 kcachegrind
	       qt4 djview4
	       tiff djview4
	       png djview4
	       weight: twm 1
	       weight: gqview 4

       -y    If	applied, the magic string is used as an	end-of-task marker
	     instead of	empty line.  It	is unlikely that this line appears on
	     command's output.	This option has	higher priority	than
	     PAEXEC_EOT	environment variable.

       -0    Change paexec to expect NUL character as a	line separator instead
	     of	newline.  This is expected to be used in concert with the
	     -print0 function in find(1).

       -J replstr
	     Execute command for each task, replacing one or more occurrences
	     of	replstr	with the entire	task. Only 2-character replstr is
	     allowed. Tale a note that such replacement	works only if shell
	     variable expansion	is allowed in appropriate part of command (if
	     -c	is applied) or if free argument	is exactly a replstr (if -C is
	     applied).	This option implies -x.

	      paexec -t	'/usr/bin/ssh -x' -n 'host1 host2 host3' \
		     -le -g -c calculate-me < tasks.txt	|
	      paexec_reorder -Mf -Sl

	      ls -1 *.wav | paexec -x -n +4 -c 'oggenc -Q'

	      ls -1 *.wav | paexec -xCil -n+4 flac -f --silent

	      {	uname -s; uname	-r; uname -m; }	|
	      paexec -x	-lp -n+2 -c banner |
	      paexec_reorder -l

	       find . -name '*.dat' -print0 |
	       paexec -0 -n+10 -C -J// scp // remoteserver:/remote/path

	       ls -1 *.txt | paexec -n+10 -J%% -c 'awk "BEGIN {print toupper(\"%%\")}"'

       For more	examples see paexec.pdf	and examples/ subdirectory in the

       select(2) system	call and non-blocking read(2) are used to read result
       lines from nodes.

       At the moment blocking write(2) is used to send task to the node. This
       may slow	down an	entire processing if tasks are too big.	So, it is
       recommended to use shorter tasks, for example, filename or URI (several
       tens of bytes in	size) instead of multi-megabyte	content. Though	this
       may be fixed in the future.

       Original	paexec tarball contains	a number of sample of use in
       presentation/paexec.pdf file. After installation	you can	find this file
       under share/doc/paexec/paexec.pdf or nearby.

	     Overrides the compile time	initial	size for internal buffers used
	     to	store tasks and	the result lines. Versions of paexec prior to
	     0.9.0 used	this value as a	maximum	buffer size.  Now internal
	     buffers are resized automatically.	 If unsure, do not set
	     PAEXEC_BUFSIZE variable.  See the default value in	Makefile.

	     A list of variables passed	to command.

	     This variable sets	the end-of-task	marker which is	an empty line
	     by	default.  Also,	through	this variable an end-of-task marker is
	     passed to all commands.

	     Unless option -n was applied, this	variables specifies the	nodes.

	     This variable sets	the shell interpreter used inside paexec.  By
	     default it	is /bin/sh.

	     Unless option -t was applied, this	variables specifies the

       Please send any comments, questions, bug	reports	etc. to	me by e-mail
       or (even	better)	register them at sourceforge project home.  Feature
       requests	are also welcomed.


SEE ALSO ssh(1)	rsh(1) select(2) read(2) write(2)
				  2020-06-01			     paexec(1)


Want to link to this manual page? Use this URL:

home | help