Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
paexec(1)							     paexec(1)

NAME
       paexec -	parallel executor, distribute tasks over network or CPUs

SYNOPSIS
       paexec [options]

       paexec -C [options] command [args...]

DESCRIPTION
       Suppose you have	a long list of tasks that need to be done, for
       example,	you want to convert thousands of .wav audio files to .ogg
       format.	Also suppose that multiple CPUs	are available, e.g. multi-CPU
       system or a cluster consisting of individual computers connected	to the
       network or internet. paexec can efficiently do this work, that is,
       paexec efficiently distributes different	tasks to different processors
       (or computers), receives	the results of processing from them and	sends
       these results to	stdout.

       There are several notions that should be	defined: task, command,
       transport, node.

       Tasks are read by paexec	from stdin and are represented as one line of
       text, i.e. one input line - one task.

       node identifier - remote	computer or CPU	identifier, for	example	CPU
       ordinal number or computer's DNS	name like node12.cluster.company.com.

       Command - user's	program	that reads one-line task from stdin and	sends
       multiline result	to stdout ending with an end of	task (EOT) marker.
       EOT marker means	"Task is done. I am ready for the next one". After
       sending EOT to stdout, stdout MUST be flushed. Remember that EOT	marker
       MUST NOT	appears	in general result. Otherwise, paexec may hang due to
       deadlock. The default EOT is an empty line. Command may use environment
       variable	PAEXEC_EOT for the end-of-task marker.

       Transport - special program that	helps to run command on	node. It takes
       the node	identifier as its first	argument and command with its
       arguments as the	rest.  Good examples for transport are ssh and rsh.
       Both transport and command may be specified with	their arguments, e.g,
       '/usr/bin/ssh -x' is allowed as a transport program.

       How paexec works.  Commands are run on each node	with a help of
       transport program. Then,	tasks are read from stdin one-by-one and sent
       to free node (exactly one task per node at a time). At the same time
       result lines are	read from command's stdout and are output to paexec's
       stdout. When EOT	marker is obtained from	the node, it is	marked as free
       and becomes ready for the next task. These steps	repeat until the end
       of stdin	is reached and all nodes finish	their job.

       More formally (to better	understand how paexec works):

	  run_command_on_each_node
	  mark_all_nodes_as_free
	  while	not(end_of_stdin) or not(all_nodes_are_free)
	     while there_is_free_node/i	and not(end_of_stdin)
		task = read_task_from_stdin
		send_task_to_node(task,	i)
		mark_node_as_busy(i)
	     end
	     while result_line_from_node_is_available/i
		result = read_result_line_from_node(i)
		send_line_to_stdout(result)
		if is_EOT(result)
		   mark_node_as_free(i)
		end
	     end
	  end
	  close_command_on_each_node

       Note that commands are run once per node, it is not restarted for every
       task.

       Also note that output contains result lines (obtained from different
       nodes) in the mixed order. That is, the first line of the output	may
       contain a result	line obtain from the first node, the second line of
       output -	from the second	node, but the third output line	may contain
       result line from	the first node again. It is also not guaranteed	that
       the first line of output	will be	from the first node or from the	first
       task. All result	lines are output as soon as they are read by paexec,
       i.e as soon as they are ready. paexec works this	way for	the efficiency
       reasons.	 You can play with -l, -r and -p options to see	what happens.
       For reordering output line you can use paexec_reorder utility.

OPTIONS
       -h    Display help information.

       -V    Display version information.

       -c command
	     Command with its arguments.

       -C    Command and its arguments are specified by	free arguments.

       -t transport
	     Transport program.

       -n +number
	     A number of commands to run in parallel.

       -n nodes
	     List  of  nodes separated by space	character. The first character
	     must be alphanumeric,  `_'	 or  `/'.  All	other  characters  are
	     reserved for future extensions.

       -n :filename
	     Filename containing list of nodes,	one per	line.

       -x    Run command specificed by -c for each task. Its stdout is sent to
	     paexec.   If both "-x" and	"-g" are specified, task is considered
	     failed if command's exit status is	non-zero.

       -X    Implies -x	and ignore command's stdout.

       -r    Include node identifier or	node number (0-based) to  the  output,
	     i.e. id/number of node that produces this particular output line.
	     This  identifier  or  number  appears before line number if -l is
	     also applied. Space character is used as a	separator.

       -l    Include a 0-based task number (input line number) to the  output,
	     i.e.  line	 number	 from  which  this  particular output line was
	     produced.	It appears before pid if -p  is	 also  applied.	 Space
	     character is used as a separator.

       -p    Include   pid  of	paexec's  subprocess  that  communicates  with
	     node+command to the output. Pid prepends the actual result	 line.
	     Space character is	used as	a separator.

       -e    When  end-of-task	marker	is  obtained  from node, EOT marker is
	     output to stdout. This option may	be  useful  together  with  -l
	     and/or -r.

       -E    Imply -e and flushes stdout.

       -d    Turn on a debugging mode (for debugging purposes only).

       -i    Copy input	lines (i.e. tasks) to stdout.

       -I    Imply -i and flushes stdout.

       -s|-g Orgraph of	tasks (partially ordered set) is read from stdin.

	     Instead  of  autonomous  tasks,  graph  of	the tasks is read from
	     stdin.  In	this mode every	task can either	FAIL or	 SUCCEED.   As
	     always  an	 empty	line output by command means end of task.  The
	     line before it shows an  EXIT  STATUS  of	the  task.   The  word
	     "failure"	means  failure,	 "success" - success and "fatal" means
	     that  the	current	 task  is  reassigned  to  another  node  (and
	     restarted,	 of course) (see option	-z).  See examples/1_div_x/cmd
	     for the sample.  An input line (paexec's  stdin)  should  contain
	     either  single  task without spaces inside	or two tasks separated
	     by	single space character,	e.g. task1<SPC>task2.  task1<SPC>task2
	     line  means  that	task1  must  be	 done  before  task2 and it is
	     mandatory,	that is	if task1 fail all dependent  tasks  (including
	     task2)  are  also	failed recursively.  Tasks having dependencies
	     are started only after all	dependencies  are  succeeded.  When  a
	     task   succeeds   paexec	outputs	 "success"  word  just	before
	     end_of_task marker	(see -e	or -E),	otherwise  "failure"  word  is
	     output followed by	a list of tasks	failed because of it.

	      Samples:

		tasks (examples/make_package/tasks file)

		  textproc/dictem
		  devel/autoconf wip/libmaa
		  devel/gmake wip/libmaa
		  wip/libmaa wip/dict-server
		  wip/libmaa wip/dict-client
		  devel/m4 wip/dict-server
		  devel/byacc wip/dict-server
		  devel/byacc wip/dict-client
		  devel/flex wip/dict-server
		  devel/flex wip/dict-client
		  devel/glib2
		  devel/libjudy

		command	(examples/make_package/cmd__flex)

		  #!/usr/bin/awk -f
		  {
		     print $0
		     if	($0 == "devel/flex")
			print "failure"
		     else
			print "success"

		     print ""	    # end of task marker
		     fflush()
		  }

		output of "paexec -s -l	-c cmd__flex -n	+10 \
			   < tasks"

		  3 devel/autoconf
		  3 success
		  4 devel/gmake
		  4 success
		  7 devel/m4
		  7 success
		  8 devel/byacc
		  8 success
		  9 devel/flex
		  9 failure
		  9 devel/flex wip/dict-server wip/dict-client
		  10 devel/glib2
		  10 success
		  11 devel/libjudy
		  11 success
		  1 textproc/dictem
		  1 success
		  2 wip/libmaa
		  2 success

       -z    If	 applied,  read/write(2)  operations from/to nodes becomes not
	     critical. In case paexec has lost connection to the node, it will
	     reassign failed task to another node and,	if  -s	applied,  will
	     output   "fatal"  string  to  stdout  ("success"  +  "failure"  +
	     "fatal").	This makes paexec resistant to the I/O	errors,	 as  a
	     result   you   can	 create	 paexec	 clusters  even	 over  network
	     consisting	of unreliable  hosts  (Internet?).  Failed  hosts  are
	     marked  as	 such  and  will not be	used during the	current	run of
	     paexec.

       -Z timeout
	     When -z applied, if a command fails, appropriate node  is	marked
	     as	 broken	 and is	excluded from the following task distribution.
	     But if -Z applied,	every timeout seconds an attempt  to  rerun  a
	     comand on a failed	node is	made. -Z implies -z. This option makes
	     possible to organize clusters over	unreliable networks/hardware.

       -w    If	 -Z  option were applied, paexec exits with error if ALL nodes
	     failed. With -w it	will not exit  and  will  wait	for  restoring
	     nodes.

       -m s=success
	     Set an alternative	string for 'success' message.

       -m f=failure
	     Set an alternative	string for 'failure' message.

       -m F=fatal
	     Set  an  alternative string for 'fatal' message.  An empty	string
	     for 'fatal' means it will not be output  to  stdout  in  case  of
	     fatal error.

       -m t=eot
	     Set an alternative	string for EOT message.

       -m d=delimiter
	     Set  an  alternative  string  for	tasks delimiter	(for -g	mode).
	     Delimiter should be at most one character.	No delimiter means  an
	     entire input line is treated as a task.

       -m w=weight
	     Set an alternative	string for "weight:" message.

       -W num
	     When  multiple machines or	CPUs are used for tasks	processing, it
	     makes sense to start "heavier" tasks as soon as possible in order
	     to	minimize total calculatiion time.  If -W is specified, special
	     weight is assigned	to each	tasks which  is	 used  for  reordering
	     tasks.   If  num is 0, weights themselves are used	for reordering
	     tasks.  The bigger	weight is, the more priority of	the  task  is.
	     If	 num is	1, the total weight of task is a sum of	its own	weight
	     (specified	on input) and weights of all  tasks  depending	on  it
	     directly or indirectly.  If num is	2, the total weight of task is
	     a	maximum	 value	of  task's own weight and weights of all tasks
	     depending on it directly or indirectly.   Weights	are  specified
	     with  a  help  of "weight:" keyword (unless -m w= was specified).
	     If	weight is not specified, it defaults to	1.  The	 following  is
	     the example for input graph of tasks with weights.

	       weight: gtk2 30
	       weight: glib2 20
	       gtk2 firefox
	       weight: firefox 200
	       glib2 gtk2
	       weight: qt4 200
	       weight: kcachegrind 2
	       qt4 kcachegrind
	       qt4 djview4
	       tiff djview4
	       png djview4
	       weight: twm 1
	       weight: gqview 4

       -y    If	 applied,  the	magic  string is used as an end-of-task	marker
	     instead of	empty line.  It	is unlikely that this line appears  on
	     command's	 output.    This   option  has	higher	priority  than
	     PAEXEC_EOT	environment variable.

       -0    Change paexec to expect NUL character as a	line separator instead
	     of	newline.  This is expected to be  used	in  concert  with  the
	     -print0 function in find(1).

       -J replstr
	     Execute  command for each task, replacing one or more occurrences
	     of	replstr	with the entire	 task.	Only  2-character  replstr  is
	     allowed.  Tale  a	note that such replacement works only if shell
	     variable expansion	is allowed in appropriate part of command  (if
	     -c	is applied) or if free argument	is exactly a replstr (if -C is
	     applied).	This option implies -x.

EXAMPLES
       1.
	      paexec -t	'/usr/bin/ssh -x' -n 'host1 host2 host3' \
		     -le -g -c calculate-me < tasks.txt	|
	      paexec_reorder -Mf -Sl

       2.
	      ls -1 *.wav | paexec -x -n +4 -c 'oggenc -Q'

       3.
	      ls -1 *.wav | paexec -xCil -n+4 flac -f --silent

       4.
	      {	uname -s; uname	-r; uname -m; }	|
	      paexec -x	-lp -n+2 -c banner |
	      paexec_reorder -l

       5.
	       find . -name '*.dat' -print0 |
	       paexec -0 -n+10 -C -J// scp // remoteserver:/remote/path

       6.
	       ls -1 *.txt | paexec -n+10 -J%% -c 'awk "BEGIN {print toupper(\"%%\")}"'

       For  more  examples  see	 paexec.pdf  and examples/ subdirectory	in the
       distribution.

NOTES
       select(2) system	call and non-blocking read(2) are used to read	result
       lines from nodes.

       At  the moment blocking write(2)	is used	to send	task to	the node. This
       may slow	down an	entire processing if tasks are	too  big.  So,	it  is
       recommended to use shorter tasks, for example, filename or URI (several
       tens  of	 bytes in size)	instead	of multi-megabyte content. Though this
       may be fixed in the future.

       Original	 paexec	 tarball  contains  a  number  of  sample  of  use  in
       presentation/paexec.pdf file. After installation	you can	find this file
       under share/doc/paexec/paexec.pdf or nearby.

ENVIRONMENT
       PAEXEC_BUFSIZE
	     Overrides the compile time	initial	size for internal buffers used
	     to	 store tasks and the result lines. Versions of paexec prior to
	     0.9.0 used	this value as a	maximum	 buffer	 size.	 Now  internal
	     buffers  are  resized  automatically.   If	 unsure,  do  not  set
	     PAEXEC_BUFSIZE variable.  See the default value in	Makefile.

       PAEXEC_ENV
	     A list of variables passed	to command.

       PAEXEC_EOT
	     This variable sets	the end-of-task	marker which is	an empty  line
	     by	default.  Also,	through	this variable an end-of-task marker is
	     passed to all commands.

       PAEXEC_NODES
	     Unless option -n was applied, this	variables specifies the	nodes.

       PAEXEC_SH
	     This  variable sets the shell interpreter used inside paexec.  By
	     default it	is /bin/sh.

       PAEXEC_TRANSPORT
	     Unless option  -t	was  applied,  this  variables	specifies  the
	     transport.

BUGS/FEEDBACK
       Please  send  any comments, questions, bug reports etc. to me by	e-mail
       or (even	better)	register them at sourceforge  project  home.   Feature
       requests	are also welcomed.

HOME
       <http://sourceforge.net/projects/paexec>

SEE ALSO ssh(1)	rsh(1) select(2) read(2) write(2)
				  2020-06-01			     paexec(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=paexec&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help