.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.42)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
. if \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{\
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "Chemistry::File::Formula 3pm"
.TH Chemistry::File::Formula 3pm "2022-07-14" "perl v5.34.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Chemistry::File::Formula \- Molecular formula reader/formatter
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\& use Chemistry::File::Formula;
\&
\& my $mol = Chemistry::Mol\->parse("H2O");
\& print $mol\->print(format => formula);
\& print $mol\->formula; # this is a shorthand for the above
\& print $mol\->print(format => formula,
\& formula_format => "%s%d{%d});
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
This module converts a molecule object to a string with the formula and back.
It registers the 'formula' format with Chemistry::Mol. Besides its obvious
use, it is included in the Chemistry::Mol distribution because it is a very
simple example of a Chemistry::File derived I/O module.
.SS "Writing formulas"
.IX Subsection "Writing formulas"
The format can be specified as a printf-like string with the following control
sequences, which are specified with the formula_format parameter to \f(CW$mol\fR\->print
or \f(CW$mol\fR\->write.
.ie n .IP "%s symbol" 4
.el .IP "\f(CW%s\fR symbol" 4
.IX Item "%s symbol"
.PD 0
.ie n .IP "%D number of atoms" 4
.el .IP "\f(CW%D\fR number of atoms" 4
.IX Item "%D number of atoms"
.ie n .IP "%d number of atoms, included only when it is greater than one" 4
.el .IP "\f(CW%d\fR number of atoms, included only when it is greater than one" 4
.IX Item "%d number of atoms, included only when it is greater than one"
.ie n .IP "%d{substr} substr is only included when number of atoms is greater than one" 4
.el .IP "\f(CW%d\fR{substr} substr is only included when number of atoms is greater than one" 4
.IX Item "%d{substr} substr is only included when number of atoms is greater than one"
.ie n .IP "%j{substr} substr is inserted between the formatted string for each element. (The 'j' stands for 'joiner'.) The format should have only one joiner, but its location in the format string doesn't matter." 4
.el .IP "\f(CW%j\fR{substr} substr is inserted between the formatted string for each element. (The 'j' stands for 'joiner'.) The format should have only one joiner, but its location in the format string doesn't matter." 4
.IX Item "%j{substr} substr is inserted between the formatted string for each element. (The 'j' stands for 'joiner'.) The format should have only one joiner, but its location in the format string doesn't matter."
.IP "%% a percent sign" 4
.IX Item "%% a percent sign"
.PD
.PP
If no format is specified, the default is \*(L"%s%d\*(R". Some examples follow. Let's
assume that the formula is C2H6O, as it would be formatted by default.
.ie n .IP """%s%D""" 4
.el .IP "\f(CW%s%D\fR" 4
.IX Item "%s%D"
Like the default, but include explicit indices for all atoms.
The formula would be formatted as \*(L"C2H6O1\*(R"
.ie n .IP """%s%d{%d}""" 4
.el .IP "\f(CW%s%d{%d}\fR" 4
.IX Item "%s%d{%d}"
\&\s-1HTML\s0 format. The output would be
\&\*(L"C2H6O\*(R".
.ie n .IP """%D %s%j{, }""" 4
.el .IP "\f(CW%D %s%j{, }\fR" 4
.IX Item "%D %s%j{, }"
Use a comma followed by a space as a joiner. The output would be
\&\*(L"2 C, 6 H, 1 O\*(R".
.PP
\fISymbol Sort Order\fR
.IX Subsection "Symbol Sort Order"
.PP
The elements in the formula are sorted by default in the \*(L"Hill order\*(R", which
means that:
.PP
1) if the formula contains carbon, C goes first, followed by H,
and the rest of the symbols in alphabetical order. For example, \*(L"CH2BrF\*(R".
.PP
2) if there is no carbon, all the symbols (including H) are listed
alphabetically. For example, \*(L"BrH\*(R".
.PP
It is possible to supply a custom sorting subroutine with the 'formula_sort'
option. It expects a subroutine reference that takes a hash reference
describing the formula (similar to what is returned by parse_formula, discussed
below), and that returns a list of symbols in the desired order.
.PP
For example, this will sort the symbols in reverse asciibetical order:
.PP
.Vb 7
\& my $formula = $mol\->print(
\& format => \*(Aqformula\*(Aq,
\& formula_sort => sub {
\& my $formula_hash = shift;
\& return reverse sort keys %$formula_hash;
\& }
\& );
.Ve
.SS "Parsing Formulas"
.IX Subsection "Parsing Formulas"
Formulas can also be parsed back into Chemistry::Mol objects.
The formula may have parentheses and square or triangular brackets, and
it may have the following abbreviations:
.PP
.Vb 7
\& Me => \*(Aq(CH3)\*(Aq,
\& Et => \*(Aq(CH3CH2)\*(Aq,
\& Bu => \*(Aq(C4H9)\*(Aq,
\& Bn => \*(Aq(C6H5CH2)\*(Aq,
\& Cp => \*(Aq(C5H5)\*(Aq,
\& Ph => \*(Aq(C6H5)\*(Aq,
\& Bz => \*(Aq(C6H5CO)\*(Aq,
.Ve
.PP
The formula may also be preceded by a number, which multiplies the whole
formula. Some examples of valid formulas:
.Sp
.Vb 8
\& Formula Equivalent to
\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
\& CH3(CH2)3CH3 C5H12
\& C6H3Me3 C9H12
\& 2Cu[NH3]4(NO3)2 Cu2H24N12O12
\& 2C(C[C5]4)3 C152
\& 2C(C(C(C)5)4)3 C152
\& C 1 0 H 2 2 C10H22 (whitespace is completely ignored)
.Ve
.PP
When a formula is parsed, a molecule object is created which consists of
the set of the atoms in the formula (no bonds or coordinates, of course).
The atoms are created in alphabetical order, so the molecule object for C2H5Br
would have the atoms in the following sequence: Br, C, C, H, H, H, H, H.
.PP
If you don't want to create a molecule object, but would rather have a simple
hash with the number of atoms for each element, use the \f(CW\*(C`parse_formula\*(C'\fR
method:
.PP
.Vb 3
\& my %formula = Chemistry::File::Formula\->parse_formula("C2H6O");
\& use Data::Dumper;
\& print Dumper \e%formula;
.Ve
.PP
which prints something like
.PP
.Vb 5
\& $VAR1 = {
\& \*(AqH\*(Aq => 6,
\& \*(AqO\*(Aq => 1,
\& \*(AqC\*(Aq => 2
\& };
.Ve
.PP
The \f(CW\*(C`parse_formula\*(C'\fR method is called internally by the \f(CW\*(C`parse_string\*(C'\fR method.
.PP
\fINon-integer numbers in formulas\fR
.IX Subsection "Non-integer numbers in formulas"
.PP
The \f(CW\*(C`parse_formula\*(C'\fR method can also accept formulas that contain
floating-point numbers, such as H1.5N0.5. The numbers must be positive, and
numbers smaller than one should include a leading zero (e.g., 0.9, not .9).
.PP
When formulas with non-integer numbers of atoms are turned into molecule
objects as described in the previous section, the number of atoms is always
\&\fBrounded up\fR. For example, H1.5N0.5 will produce a molecule object with two
hydrogen atoms and one nitrogen atom.
.PP
There is currently no way of \fIproducing\fR formulas with non-integer numbers;
perhaps a future version will include an \*(L"occupancy\*(R" property for atoms that
will result in non-integer formulas.
.SH "SOURCE CODE REPOSITORY"
.IX Header "SOURCE CODE REPOSITORY"
.SH "SEE ALSO"
.IX Header "SEE ALSO"
Chemistry::Mol, Chemistry::File
.PP
For discussion about Hill order, just search the web for \f(CW\*(C`formula "hill
order"\*(C'\fR. The original reference is \fIJ. Am. Chem. Soc.\fR \fB1900\fR, \fI22\fR,
478\-494. .
.SH "AUTHOR"
.IX Header "AUTHOR"
Ivan Tubert-Brohman .
.PP
Formula parsing code contributed by Brent Gregersen.
.PP
Patch for non-integer formulas by Daniel Scott.
.SH "COPYRIGHT"
.IX Header "COPYRIGHT"
Copyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same terms as
Perl itself.