Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MOK(1)		      User Contributed Perl Documentation		MOK(1)

NAME
       mok - an	awk for	molecules

SYNOPSIS
	   mok [OPTION]...  'CODE' FILE...

DESCRIPTION
       The purpose of mok is to	read all the molecules found in	the files that
       are given in the	command	line, and for each molecule execute the	CODE
       that is given. The CODE is given	in Perl	and it has at its disposal all
       of the methods of the PerlMol toolkit.

       This mini-language is intended to provide a powerful environment	for
       writing "molecular one-liners" for extracting and munging chemical
       information.  It	was inspired by	the AWK	programming language by	Aho,
       Kernighan, and Weinberger, the SMARTS molecular pattern description
       language	by Daylight, Inc., and the Perl	programming language by	Larry
       Wall.

       Mok takes its name from Ookla the Mok, an unforgettable character from
       the animated TV series "Thundarr	the Barbarian",	and from shortening
       "molecular awk".	 For more details about	the Mok	mini-language, see
       LANGUAGE	SPECIFICATION below.

       Mok is part of the PerlMol project, <http://www.perlmol.org>.

OPTIONS
       -3  Generate 3D coordinates using Chemistry::3DBuilder.

       -a  "Aromatize" each molecule as	it is read. This is needed for example
	   for	 matching   SMARTS  patterns  that  use	 aromaticity  or  ring
	   primitives.

       -b  Find	bonds. Use it when reading files with no bond information  but
	   3D  coordinates  to detect the bonds	if needed (for example,	if you
	   want	to do match a pattern that includes bonds). If	the  file  has
	   explicit  bonds,  mok  will	not try	to find	the bonds, but it will
	   reassign the	bond orders from scratch.

       -c CLASS
	   Use CLASS instead of	Chemistry::Mol to read molecules

       -d  Delete dummy	atoms after reading each molecule.  A  dummy  atom  is
	   defined  as an atom with an unknown symbol (i.e., it	doesn't	appear
	   on the periodic table), or an atomic	number of zero.

       -D  Print debugging information,	such as	the way	the input program  was
	   tokenized  and  parsed into blocks and subs.	This may be useful for
	   diagnosing syntax errors when  the  default	error  mesage  is  not
	   informative enough.

       -f FILE
	   Run the code	from FILE instead of the command line

       -h  Print usage information and exit

       -p TYPE
	   Parse  patterns  using the specified	TYPE. Default: 'smarts'. Other
	   options are 'smiles'	and 'midas'.

       -t TYPE
	   Assume that every file has  the  specified  TYPE.  Available	 types
	   depend   on	 which	Chemistry::File	 modules  are  installed,  but
	   currently available types include mdl, sdf, smiles, formula,	mopac,
	   pdb.

LANGUAGE SPECIFICATION
       A Mok script consists of	a sequence of  pattern-action  statements  and
       optional	 subroutine  definitions,  in a	manner very similar to the AWK
       language.

	   pattern_type:/pattern/options { action statements }
	   { action statements }
	   sub name { statements }
	   BEGIN { statements }
	   END { statements }
	   # comment

       When the	whole program consists of one unconditional action block,  the
       braces may be omitted.

       Program execution is as follows:

       1)  The	BEGIN  block  is executed as soon as it's compiled, before any
       other actions are taken.

       2) For each molecule in the files  given	 in  the  command  line,  each
       pattern	is  applied in turn; if	the pattern matches, the corresponding
       statement block is executed. The	pattern	is optional; statement	blocks
       without	a  pattern  are	executed unconditionally. Subroutines are only
       executed	when called explicitly.

       3) Finally, the END block is executed.

       The   statements	  are	evaluated   as	 Perl	statements   in	   the
       Chemistry::Mok::UserCode::Default   package.  The  following  chemistry
       modules are conveniently	loaded by default:

	   Chemistry::Mol;
	   Chemistry::Atom ':all';
	   Chemistry::Bond;
	   Chemistry::Pattern;
	   Chemistry::Pattern::Atom;
	   Chemistry::Pattern::Bond;
	   Chemistry::File;
	   Chemistry::File::*;
	   Math::VectorReal ':all';

       Besides these, there is one more	function  available  for  convenience:
       "println", which	is defined by "sub println { print "\@_", "\n" }".

   Pattern Specification
       The    pattern	 must	be   a	 SMARTS	  string   readable   by   the
       Chemistry::File::SMARTS module, unless a	different type is specified by
       means of	the -p option or a pattern_type	is given explicitly before the
       pattern	itself.	 The  pattern  is  given  within  slashes,  in	a  way
       reminiscent  of	AWK and	Perl regular expressions.  As in Perl, certain
       one-letter options may be included after	the closing slash.  An	option
       is turned on by giving the corresponing lowercase letter	and turned off
       by giving the corresponding uppercase letter.

       g/G Match   globally   (default:	  off).	 When  not  present,  the  Mok
	   interpreter only matches a molecule once; when  present,  it	 tries
	   matching  again  in	other  parts of	the molecule. For example, /C/
	   matches butane only once  (at  an  unspecified  atom),  while  /C/g
	   matches four	times (once at each atom).

       o/O Overlap  (default: on). When	set and	matching globally, matches may
	   overlap. For	example, /CC/go	pattern	could match twice on  propane,
	   but /CC/gO would match only once.

       p/P Permute  (default:  off).  Sometimes	 there is more than one	way of
	   matching the	same set of pattern atoms on the same set of  molecule
	   atoms.  If  true,  return  these "redundant"	matches.  For example,
	   /CC/gp could	match ethane with two different	permutations (forwards
	   and backwards).

   Special Variables
       When blocks with	action statements are  executed,  some	variables  are
       defined	automatically. The variables are local,	so you can do whatever
       you  want  with	them  with  no	side  effects.	However,  the  objects
       themselves may be altered by using their	methods.

       NOTE:  Mok  0.10	 defined  $file, $mol, $match, and $patt in lowercase.
       While they still	work, the lowercase variables are deprecated  and  may
       be removed in the future.

       $FILE
	   The current filename.

       $MOL
	   A reference to the current molecule as a Chemistry::Mol object.

       $MATCH
	   A reference to the current match as a Chemistry::Pattern object.

       $PATT
	   The current pattern as a string.

       $FH The	current	 input	filehandle.  This provides low-level access in
	   case	you want to rewind or seek into	the  file,  tell  the  current
	   position,  etc.  Playing  with  $FH may break things	if you are not
	   careful. Use	at your	own risk!

       @A  The atoms that were matched.	It is defined as @A = $MATCH->atom_map
	   if a	pattern	was used, or @A	= $MOL->atoms within an	 unconditional
	   block.   Remember  that  this is a Perl array, so it	is zero-based,
	   unlike the one-based	numbering used by most	file  types  and  some
	   PerlMol methods.

       @B  The bonds that were matched.	It is defined as @A = $MATCH->bond_map
	   if  a pattern was used, or @A = $MOL->bonds within an unconditional
	   block.  Remember Remember that this is a Perl array,	so it is zero-
	   based, unlike the one-based numbering used by most file  types  and
	   some	PerlMol	methods.

   Special Blocks
       Within  action  blocks, the following block names can be	used with Perl
       funcions	such as	"next" and "last":

       MATCH
       BLOCK
       MOL
       FILE

EXAMPLES
       Print the names of all the molecules found in all the .sdf files	in the
       current directory:

	   mok 'println	$MOL->name' *.sdf

       Find esters  among  *.mol;  print  the  filename,  molecule  name,  and
       formula:

	   mok '/C(=O)OC/{ printf "$FILE: %s (%s)\n",
	       $MOL->name, $MOL->formula }' *.mol

       Find out	the total number of atoms:

	   mok '{ $n +=	$MOL->atoms } END { print "Total: $n atoms\n" }' *.mol

       Find out	the average C-S	bond length:

	   mok '/CS/g{ $n++; $len += $B[0]->length }
	       END { printf "Average C-S bond length: %.3f\n", $len/$n;	}' *.mol

       Convert PDB files to MDL	molfiles:

	   mok '{ $FILE	=~ s/pdb/mol/; $MOL->write($FILE, format => "mdlmol") }' *.pdb

       Find  molecules	with a given formula by	overriding the formula pattern
       type globally (this example requires Chemistry::FormulatPattern):

	   mok -p formula_pattern '/C6H12O6/{ println $MOL->name }' *.sdf

       Find molecules with a given formula by overriding the  formula  pattern
       type just for one specific pattern. This	can be used when more than one
       pattern type is needed in one script.

	   mok 'formula_pattern:/C6H12O6/{ println $MOL->name }' *.sdf

SEE ALSO
       awk(1),	perl(1)	 Chemistry::Mok,  Chemistry::Mol,  Chemistry::Pattern,
       <http://dmoz.org/Arts/Animation/Cartoons/Titles/T/Thundarr_the_Barbarian/>.

       Tubert-Brohman,	I.  Perl  and  Chemistry.  The	Perl  Journal  2004-06
       (<http://www.tpj.com/documents/s=7618/tpj0406/>).

       The PerlMol project site	at <http://www.perlmol.org>.

VERSION
       0.25

AUTHOR
       Ivan Tubert-Brohman <itub@cpan.org>

COPYRIGHT
       Copyright  (c)  2005  Ivan  Tubert-Brohman.  All	 rights	reserved. This
       program is free software; you can  redistribute	it  and/or  modify  it
       under the same terms as Perl itself.

perl v5.32.1			  2005-05-16				MOK(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=mok&sektion=1&manpath=FreeBSD+13.2-RELEASE+and+Ports>

home | help