.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
.    \" fudge factors for nroff and troff
.if n \{\
.    ds #H 0
.    ds #V .8m
.    ds #F .3m
.    ds #[ \f1
.    ds #] \fP
.\}
.if t \{\
.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
.    ds #V .6m
.    ds #F 0
.    ds #[ \&
.    ds #] \&
.\}
.    \" simple accents for nroff and troff
.if n \{\
.    ds ' \&
.    ds ` \&
.    ds ^ \&
.    ds , \&
.    ds ~ ~
.    ds /
.\}
.if t \{\
.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
.    \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
.    \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
.    \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
.    ds : e
.    ds 8 ss
.    ds o a
.    ds d- d\h'-1'\(ga
.    ds D- D\h'-1'\(hy
.    ds th \o'bp'
.    ds Th \o'LP'
.    ds ae ae
.    ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "Chemistry::OpenSMILES 3pm"
.TH Chemistry::OpenSMILES 3pm "2023-10-26" "perl v5.36.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Chemistry::OpenSMILES \- OpenSMILES format reader and writer
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&    use Chemistry::OpenSMILES::Parser;
\&
\&    my $parser = Chemistry::OpenSMILES::Parser\->new;
\&    my @moieties = $parser\->parse( \*(AqC#C.c1ccccc1\*(Aq );
\&
\&    $\e = "\en";
\&    for my $moiety (@moieties) {
\&        #  $moiety is a Graph::Undirected object
\&        print scalar $moiety\->vertices;
\&        print scalar $moiety\->edges;
\&    }
\&
\&    use Chemistry::OpenSMILES::Writer qw(write_SMILES);
\&
\&    print write_SMILES( \e@moieties );
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
Chemistry::OpenSMILES provides support for \s-1SMILES\s0 chemical identifiers
conforming to OpenSMILES v1.0 specification
(<http://opensmiles.org/opensmiles.html>).
.PP
Chemistry::OpenSMILES::Parser reads in \s-1SMILES\s0 strings and returns them
parsed to arrays of Graph::Undirected objects. Each
atom is represented by a hash.
.PP
Chemistry::OpenSMILES::Writer performs the inverse operation. Generated
\&\s-1SMILES\s0 strings are by no means optimal.
.SS "Molecular graph"
.IX Subsection "Molecular graph"
Disconnected parts of a compound are represented as separate
Graph::Undirected objects. Atoms are represented
as vertices, and bonds are represented as edges.
.PP
\fIAtoms\fR
.IX Subsection "Atoms"
.PP
Atoms, or vertices of a molecular graph, are represented as hash
references:
.PP
.Vb 9
\&    {
\&        "symbol"    => "C",
\&        "isotope"   => 13,
\&        "chirality" => "@@",
\&        "hcount"    => 3,
\&        "charge"    => 1,
\&        "class"     => 0,
\&        "number"    => 0,
\&    }
.Ve
.PP
Except for \f(CW\*(C`symbol\*(C'\fR, \f(CW\*(C`class\*(C'\fR and \f(CW\*(C`number\*(C'\fR, all keys of hash are
optional. Per OpenSMILES specification, default values for \f(CW\*(C`hcount\*(C'\fR
and \f(CW\*(C`class\*(C'\fR are 0.
.PP
For chiral atoms, the order of its neighbours in input is preserved in
an array added as value for \f(CW\*(C`chirality_neighbours\*(C'\fR key of the atom hash.
.PP
\fIBonds\fR
.IX Subsection "Bonds"
.PP
Bonds, or edges of a molecular graph, rely completely on
Graph::Undirected internal representation. Bond
orders other than single (\f(CW\*(C`\-\*(C'\fR, which is also a default) are represented
as values of edge attribute \f(CW\*(C`bond\*(C'\fR. They correspond to the symbols used
in OpenSMILES specification.
.SS "Options"
.IX Subsection "Options"
\&\f(CW\*(C`parse\*(C'\fR accepts the following options for key-value pairs in an
anonymous hash for its second parameter:
.ie n .IP """max_hydrogen_count_digits""" 4
.el .IP "\f(CWmax_hydrogen_count_digits\fR" 4
.IX Item "max_hydrogen_count_digits"
In OpenSMILES specification the number of attached hydrogen atoms for
atoms in square brackets is limited to 9. \s-1IUPAC SMILES+\s0 has increased
this number to 99. With the value of \f(CW\*(C`max_hydrogen_count_digits\*(C'\fR the
parser could be instructed to allow other than 1 digit for attached
hydrogen count.
.ie n .IP """raw""" 4
.el .IP "\f(CWraw\fR" 4
.IX Item "raw"
With \f(CW\*(C`raw\*(C'\fR set to anything evaluating to true, the parser will not
convert neither implicit nor explicit hydrogen atoms in square brackets
to atom hashes of their own. Moreover, it will not attempt to unify the
representations of chirality. It should be noted, though, that many of
subroutines of Chemistry::OpenSMILES expect non-raw data structures,
thus processing raw output may produce distorted results.
.SH "CAVEATS"
.IX Header "CAVEATS"
Element symbols in square brackets are not limited to the ones known to
chemistry. Currently any single or two-letter symbol is allowed.
.PP
Deprecated charge notations (\f(CW\*(C`\-\-\*(C'\fR and \f(CW\*(C`++\*(C'\fR) are supported.
.PP
OpenSMILES specification mandates a strict order of ring bonds and
branches:
.PP
.Vb 1
\&    branched_atom ::= atom ringbond* branch*
.Ve
.PP
Chemistry::OpenSMILES::Parser supports both the mandated, and inverted
structure, where ring bonds follow branch descriptions.
.PP
Whitespace is not supported yet. \s-1SMILES\s0 descriptors must be cleaned of
it before attempting reading with Chemistry::OpenSMILES::Parser.
.PP
The derivation of implicit hydrogen counts for aromatic atoms is not
unambiguously defined in the OpenSMILES specification. Thus only
aromatic carbon is accounted for as if having valence of 3.
.PP
Chiral atoms with three neighbours are interpreted as having a lone
pair of electrons as the fourth chiral neighbour. The lone pair is
always understood as being the second in the order of neighbour
enumeration, except when the atom with the lone pair starts a chain. In
that case lone pair is the first.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
\&\fBperl\fR\|(1)
.SH "AUTHORS"
.IX Header "AUTHORS"
Andrius Merkys, <merkys@cpan.org>