.TH g_tune_pme 1 "Mon 4 Apr 2011" "" "GROMACS suite, VERSION 4.5.4-dev-20110404-bc5695c" .SH NAME g_tune_pme - time mdrun as a function of PME nodes to optimize settings .B VERSION 4.5.4-dev-20110404-bc5695c .SH SYNOPSIS \f3g_tune_pme\fP .BI "\-p" " perf.out " .BI "\-err" " errors.log " .BI "\-so" " tuned.tpr " .BI "\-s" " topol.tpr " .BI "\-o" " traj.trr " .BI "\-x" " traj.xtc " .BI "\-cpi" " state.cpt " .BI "\-cpo" " state.cpt " .BI "\-c" " confout.gro " .BI "\-e" " ener.edr " .BI "\-g" " md.log " .BI "\-dhdl" " dhdl.xvg " .BI "\-field" " field.xvg " .BI "\-table" " table.xvg " .BI "\-tablep" " tablep.xvg " .BI "\-tableb" " table.xvg " .BI "\-rerun" " rerun.xtc " .BI "\-tpi" " tpi.xvg " .BI "\-tpid" " tpidist.xvg " .BI "\-ei" " sam.edi " .BI "\-eo" " sam.edo " .BI "\-j" " wham.gct " .BI "\-jo" " bam.gct " .BI "\-ffout" " gct.xvg " .BI "\-devout" " deviatie.xvg " .BI "\-runav" " runaver.xvg " .BI "\-px" " pullx.xvg " .BI "\-pf" " pullf.xvg " .BI "\-mtx" " nm.mtx " .BI "\-dn" " dipole.ndx " .BI "\-bo" " bench.trr " .BI "\-bx" " bench.xtc " .BI "\-bcpo" " bench.cpt " .BI "\-bc" " bench.gro " .BI "\-be" " bench.edr " .BI "\-bg" " bench.log " .BI "\-beo" " bench.edo " .BI "\-bdhdl" " benchdhdl.xvg " .BI "\-bfield" " benchfld.xvg " .BI "\-btpi" " benchtpi.xvg " .BI "\-btpid" " benchtpid.xvg " .BI "\-bjo" " bench.gct " .BI "\-bffout" " benchgct.xvg " .BI "\-bdevout" " benchdev.xvg " .BI "\-brunav" " benchrnav.xvg " .BI "\-bpx" " benchpx.xvg " .BI "\-bpf" " benchpf.xvg " .BI "\-bmtx" " benchn.mtx " .BI "\-bdn" " bench.ndx " .BI "\-[no]h" "" .BI "\-[no]version" "" .BI "\-nice" " int " .BI "\-xvg" " enum " .BI "\-np" " int " .BI "\-npstring" " enum " .BI "\-nt" " int " .BI "\-r" " int " .BI "\-max" " real " .BI "\-min" " real " .BI "\-npme" " enum " .BI "\-fix" " int " .BI "\-upfac" " real " .BI "\-downfac" " real " .BI "\-ntpr" " int " .BI "\-four" " real " .BI "\-steps" " step " .BI "\-resetstep" " int " .BI "\-simsteps" " step " .BI "\-[no]launch" "" .BI "\-deffnm" " string " .BI "\-ddorder" " enum " .BI "\-[no]ddcheck" "" .BI "\-rdd" " real " .BI "\-rcon" " real " .BI "\-dlb" " enum " .BI "\-dds" " real " .BI "\-gcom" " int " .BI "\-[no]v" "" .BI "\-[no]compact" "" .BI "\-[no]seppot" "" .BI "\-pforce" " real " .BI "\-[no]reprod" "" .BI "\-cpt" " real " .BI "\-[no]cpnum" "" .BI "\-[no]append" "" .BI "\-maxh" " real " .BI "\-multi" " int " .BI "\-replex" " int " .BI "\-reseed" " int " .BI "\-[no]ionize" "" .SH DESCRIPTION \&For a given number \fB \-np\fR or \fB \-nt\fR of processors/threads, this program systematically \× \fB mdrun\fR with various numbers of PME\-only nodes and determines \&which setting is fastest. It will also test whether performance can \&be enhanced by shifting load from the reciprocal to the real space \&part of the Ewald sum. \&Simply pass your \fB .tpr\fR file to \fB g_tune_pme\fR together with other options \&for \fB mdrun\fR as needed. \&Which executables are used can be set in the environment variables \&MPIRUN and MDRUN. If these are not present, 'mpirun' and 'mdrun' \&will be used as defaults. Note that for certain MPI frameworks you \&need to provide a machine\- or hostfile. This can also be passed \&via the MPIRUN variable, e.g. \&\fB export MPIRUN="/usr/local/mpirun \-machinefile hosts"\fR \&Please call \fB g_tune_pme\fR with the normal options you would pass to \&\fB mdrun\fR and add \fB \-np\fR for the number of processors to perform the \&tests on, or \fB \-nt\fR for the number of threads. You can also add \fB \-r\fR \&to repeat each test several times to get better statistics. \&\fB g_tune_pme\fR can test various real space / reciprocal space workloads \&for you. With \fB \-ntpr\fR you control how many extra \fB .tpr\fR files will be \&written with enlarged cutoffs and smaller fourier grids respectively. \&Typically, the first test (number 0) will be with the settings from the input \&\fB .tpr\fR file; the last test (number \fB ntpr\fR) will have cutoffs multiplied \&by (and at the same time fourier grid dimensions divided by) the scaling \&factor \fB \-fac\fR (default 1.2). The remaining \fB .tpr\fR files will have about \&equally\-spaced values in between these extremes. \fB Note\fR that you can set \fB \-ntpr\fR to 1 \&if you just want to find the optimal number of PME\-only nodes; in that case \&your input \fB .tpr\fR file will remain unchanged. \&For the benchmark runs, the default of 1000 time steps should suffice for most \&MD systems. The dynamic load balancing needs about 100 time steps \&to adapt to local load imbalances, therefore the time step counters \&are by default reset after 100 steps. For large systems \&(1M atoms) you may have to set \fB \-resetstep\fR to a higher value. \&From the 'DD' load imbalance entries in the md.log output file you \&can tell after how many steps the load is sufficiently balanced. Example call: \fB g_tune_pme \-np 64 \-s protein.tpr \-launch\fR \&After calling \fB mdrun\fR several times, detailed performance information \&is available in the output file \fB perf.out.\fR \&\fB Note\fR that during the benchmarks, a couple of temporary files are written \&(options \fB \-b\fR*), these will be automatically deleted after each test. \&If you want the simulation to be started automatically with the \&optimized parameters, use the command line option \fB \-launch\fR. .SH FILES .BI "\-p" " perf.out" .B Output Generic output file .BI "\-err" " errors.log" .B Output Log file .BI "\-so" " tuned.tpr" .B Output Run input file: tpr tpb tpa .BI "\-s" " topol.tpr" .B Input Run input file: tpr tpb tpa .BI "\-o" " traj.trr" .B Output Full precision trajectory: trr trj cpt .BI "\-x" " traj.xtc" .B Output, Opt. Compressed trajectory (portable xdr format) .BI "\-cpi" " state.cpt" .B Input, Opt. Checkpoint file .BI "\-cpo" " state.cpt" .B Output, Opt. Checkpoint file .BI "\-c" " confout.gro" .B Output Structure file: gro g96 pdb etc. .BI "\-e" " ener.edr" .B Output Energy file .BI "\-g" " md.log" .B Output Log file .BI "\-dhdl" " dhdl.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-field" " field.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-table" " table.xvg" .B Input, Opt. xvgr/xmgr file .BI "\-tablep" " tablep.xvg" .B Input, Opt. xvgr/xmgr file .BI "\-tableb" " table.xvg" .B Input, Opt. xvgr/xmgr file .BI "\-rerun" " rerun.xtc" .B Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt .BI "\-tpi" " tpi.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-tpid" " tpidist.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-ei" " sam.edi" .B Input, Opt. ED sampling input .BI "\-eo" " sam.edo" .B Output, Opt. ED sampling output .BI "\-j" " wham.gct" .B Input, Opt. General coupling stuff .BI "\-jo" " bam.gct" .B Output, Opt. General coupling stuff .BI "\-ffout" " gct.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-devout" " deviatie.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-runav" " runaver.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-px" " pullx.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-pf" " pullf.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-mtx" " nm.mtx" .B Output, Opt. Hessian matrix .BI "\-dn" " dipole.ndx" .B Output, Opt. Index file .BI "\-bo" " bench.trr" .B Output Full precision trajectory: trr trj cpt .BI "\-bx" " bench.xtc" .B Output Compressed trajectory (portable xdr format) .BI "\-bcpo" " bench.cpt" .B Output Checkpoint file .BI "\-bc" " bench.gro" .B Output Structure file: gro g96 pdb etc. .BI "\-be" " bench.edr" .B Output Energy file .BI "\-bg" " bench.log" .B Output Log file .BI "\-beo" " bench.edo" .B Output, Opt. ED sampling output .BI "\-bdhdl" " benchdhdl.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-bfield" " benchfld.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-btpi" " benchtpi.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-btpid" " benchtpid.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-bjo" " bench.gct" .B Output, Opt. General coupling stuff .BI "\-bffout" " benchgct.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-bdevout" " benchdev.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-brunav" " benchrnav.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-bpx" " benchpx.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-bpf" " benchpf.xvg" .B Output, Opt. xvgr/xmgr file .BI "\-bmtx" " benchn.mtx" .B Output, Opt. Hessian matrix .BI "\-bdn" " bench.ndx" .B Output, Opt. Index file .SH OTHER OPTIONS .BI "\-[no]h" "no " Print help info and quit .BI "\-[no]version" "no " Print version info and quit .BI "\-nice" " int" " 0" Set the nicelevel .BI "\-xvg" " enum" " xmgrace" xvg plot formatting: \fB xmgrace\fR, \fB xmgr\fR or \fB none\fR .BI "\-np" " int" " 1" Number of nodes to run the tests on (must be 2 for separate PME nodes) .BI "\-npstring" " enum" " \-np" Specify the number of processors to \fB $MPIRUN\fR using this string: \fB \-np\fR, \fB \-n\fR or \fB none\fR .BI "\-nt" " int" " 1" Number of threads to run the tests on (turns MPI & mpirun off) .BI "\-r" " int" " 2" Repeat each test this often .BI "\-max" " real" " 0.5 " Max fraction of PME nodes to test with .BI "\-min" " real" " 0.25 " Min fraction of PME nodes to test with .BI "\-npme" " enum" " auto" Benchmark all possible values for \fB \-npme\fR or just the subset that is expected to perform well: \fB auto\fR, \fB all\fR or \fB subset\fR .BI "\-fix" " int" " \-2" If = \-1, do not vary the number of PME\-only nodes, instead use this fixed value and only vary rcoulomb and the PME grid spacing. .BI "\-upfac" " real" " 1.2 " Upper limit for rcoulomb scaling factor (Note that rcoulomb upscaling results in fourier grid downscaling) .BI "\-downfac" " real" " 1 " Lower limit for rcoulomb scaling factor .BI "\-ntpr" " int" " 0" Number of \fB .tpr\fR files to benchmark. Create this many files with scaling factors ranging from 1.0 to fac. If 1, automatically choose the number of \fB .tpr\fR files to test .BI "\-four" " real" " 0 " Use this fourierspacing value instead of the grid found in the \fB .tpr\fR input file. (Spacing applies to a scaling factor of 1.0 if multiple \fB .tpr\fR files are written) .BI "\-steps" " step" " 1000" Take timings for this many steps in the benchmark runs .BI "\-resetstep" " int" " 100" Let dlb equilibrate this many steps before timings are taken (reset cycle counters after this many steps) .BI "\-simsteps" " step" " \-1" If non\-negative, perform this many steps in the real run (overwrites nsteps from \fB .tpr\fR, add \fB .cpt\fR steps) .BI "\-[no]launch" "no " Launch the real simulation after optimization .BI "\-deffnm" " string" " " Set the default filename for all file options at launch time .BI "\-ddorder" " enum" " interleave" DD node order: \fB interleave\fR, \fB pp_pme\fR or \fB cartesian\fR .BI "\-[no]ddcheck" "yes " Check for all bonded interactions with DD .BI "\-rdd" " real" " 0 " The maximum distance for bonded interactions with DD (nm), 0 is determine from initial coordinates .BI "\-rcon" " real" " 0 " Maximum distance for P\-LINCS (nm), 0 is estimate .BI "\-dlb" " enum" " auto" Dynamic load balancing (with DD): \fB auto\fR, \fB no\fR or \fB yes\fR .BI "\-dds" " real" " 0.8 " Minimum allowed dlb scaling of the DD cell size .BI "\-gcom" " int" " \-1" Global communication frequency .BI "\-[no]v" "no " Be loud and noisy .BI "\-[no]compact" "yes " Write a compact log file .BI "\-[no]seppot" "no " Write separate V and dVdl terms for each interaction type and node to the log file(s) .BI "\-pforce" " real" " \-1 " Print all forces larger than this (kJ/mol nm) .BI "\-[no]reprod" "no " Try to avoid optimizations that affect binary reproducibility .BI "\-cpt" " real" " 15 " Checkpoint interval (minutes) .BI "\-[no]cpnum" "no " Keep and number checkpoint files .BI "\-[no]append" "yes " Append to previous output files when continuing from checkpoint instead of adding the simulation part number to all file names (for launch only) .BI "\-maxh" " real" " \-1 " Terminate after 0.99 times this time (hours) .BI "\-multi" " int" " 0" Do multiple simulations in parallel .BI "\-replex" " int" " 0" Attempt replica exchange every steps .BI "\-reseed" " int" " \-1" Seed for replica exchange, \-1 is generate a seed .BI "\-[no]ionize" "no " Do a simulation including the effect of an X\-ray bombardment on your system .SH SEE ALSO .BR gromacs(7) More information about \fBGROMACS\fR is available at <\fIhttp://www.gromacs.org/\fR>.