.TH erl_scan 3erl "stdlib 2.2" "Ericsson AB" "Erlang Module Definition"
.SH NAME
erl_scan \- The Erlang Token Scanner
.SH DESCRIPTION
.LP
This module contains functions for tokenizing characters into Erlang tokens\&.
.SH DATA TYPES
.nf

\fBattribute_info()\fR\& = {column, \fBcolumn()\fR\&}
.br
                 | {length, integer() >= 1}
.br
                 | {line, \fBinfo_line()\fR\&}
.br
                 | {location, \fBinfo_location()\fR\&}
.br
                 | {text, string()}
.br
.fi
.nf

\fBattributes()\fR\& = \fBline()\fR\& | \fBattributes_data()\fR\&
.br
.fi
.nf

\fBattributes_data()\fR\& = [{column, \fBcolumn()\fR\&} |
.br
                     {line, \fBinfo_line()\fR\&} |
.br
                     {text, string()}]
.br
                  | {\fBline()\fR\&, \fBcolumn()\fR\&}
.br
.fi
.nf

\fBcategory()\fR\& = atom()
.br
.fi
.nf

\fBcolumn()\fR\& = integer() >= 1
.br
.fi
.nf

\fBerror_description()\fR\& = term()
.br
.fi
.nf

\fBerror_info()\fR\& = {\fBlocation()\fR\&, module(), \fBerror_description()\fR\&}
.br
.fi
.nf

\fBinfo_line()\fR\& = integer() | term()
.br
.fi
.nf

\fBinfo_location()\fR\& = \fBlocation()\fR\& | term()
.br
.fi
.nf

\fBline()\fR\& = integer()
.br
.fi
.nf

\fBlocation()\fR\& = \fBline()\fR\& | {\fBline()\fR\&, \fBcolumn()\fR\&}
.br
.fi
.nf

\fBoption()\fR\& = return
.br
         | return_white_spaces
.br
         | return_comments
.br
         | text
.br
         | {reserved_word_fun, \fBresword_fun()\fR\&}
.br
.fi
.nf

\fBoptions()\fR\& = \fBoption()\fR\& | [\fBoption()\fR\&]
.br
.fi
.nf

\fBsymbol()\fR\& = atom() | float() | integer() | string()
.br
.fi
.nf

\fBresword_fun()\fR\& = fun((atom()) -> boolean())
.br
.fi
.nf

\fBtoken()\fR\& = {\fBcategory()\fR\&, \fBattributes()\fR\&, \fBsymbol()\fR\&}
.br
        | {\fBcategory()\fR\&, \fBattributes()\fR\&}
.br
.fi
.nf

\fBtoken_info()\fR\& = {category, \fBcategory()\fR\&}
.br
             | {symbol, \fBsymbol()\fR\&}
.br
             | \fBattribute_info()\fR\&
.br
.fi
.nf

\fBtokens()\fR\& = [\fBtoken()\fR\&]
.br
.fi
.nf

\fBtokens_result()\fR\& = {ok,
.br
                   Tokens :: \fBtokens()\fR\&,
.br
                   EndLocation :: \fBlocation()\fR\&}
.br
                | {eof, EndLocation :: \fBlocation()\fR\&}
.br
                | {error,
.br
                   ErrorInfo :: \fBerror_info()\fR\&,
.br
                   EndLocation :: \fBlocation()\fR\&}
.br
.fi
.SH EXPORTS
.LP
.nf

.B
string(String) -> Return
.br
.fi
.br
.nf

.B
string(String, StartLocation) -> Return
.br
.fi
.br
.nf

.B
string(String, StartLocation, Options) -> Return
.br
.fi
.br
.RS
.LP
Types:

.RS 3
String = string()
.br
Options = \fBoptions()\fR\&
.br
Return = {ok, Tokens :: \fBtokens()\fR\&, EndLocation}
.br
       | {error, ErrorInfo :: \fBerror_info()\fR\&, ErrorLocation}
.br
StartLocation = EndLocation = ErrorLocation = \fBlocation()\fR\&
.br
.RE
.RE
.RS
.LP
Takes the list of characters \fIString\fR\& and tries to scan (tokenize) them\&. Returns \fI{ok, Tokens, EndLocation}\fR\&, where \fITokens\fR\& are the Erlang tokens from \fIString\fR\&\&. \fIEndLocation\fR\& is the first location after the last token\&.
.LP
\fI{error, ErrorInfo, ErrorLocation}\fR\& is returned if an error occurs\&. \fIErrorLocation\fR\& is the first location after the erroneous token\&.
.LP
\fIstring(String)\fR\& is equivalent to \fIstring(String, 1)\fR\&, and \fIstring(String, StartLocation)\fR\& is equivalent to \fIstring(String, StartLocation, [])\fR\&\&.
.LP
\fIStartLocation\fR\& indicates the initial location when scanning starts\&. If \fIStartLocation\fR\& is a line \fIattributes()\fR\& as well as \fIEndLocation\fR\& and \fIErrorLocation\fR\& will be lines\&. If \fIStartLocation\fR\& is a pair of a line and a column \fIattributes()\fR\& takes the form of an opaque compound data type, and \fIEndLocation\fR\& and \fIErrorLocation\fR\& will be pairs of a line and a column\&. The \fItoken attributes\fR\& contain information about the column and the line where the token begins, as well as the text of the token (if the \fItext\fR\& option is given), all of which can be accessed by calling \fBtoken_info/1,2\fR\& or \fBattributes_info/1,2\fR\&\&.
.LP
A \fItoken\fR\& is a tuple containing information about syntactic category, the token attributes, and the actual terminal symbol\&. For punctuation characters (e\&.g\&. \fI;\fR\&, \fI|\fR\&) and reserved words, the category and the symbol coincide, and the token is represented by a two-tuple\&. Three-tuples have one of the following forms: \fI{atom, Info, atom()}\fR\&, \fI{char, Info, integer()}\fR\&, \fI{comment, Info, string()}\fR\&, \fI{float, Info, float()}\fR\&, \fI{integer, Info, integer()}\fR\&, \fI{var, Info, atom()}\fR\&, and \fI{white_space, Info, string()}\fR\&\&.
.LP
The valid options are:
.RS 2
.TP 2
.B
\fI{reserved_word_fun, reserved_word_fun()}\fR\&:
A callback function that is called when the scanner has found an unquoted atom\&. If the function returns \fItrue\fR\&, the unquoted atom itself will be the category of the token; if the function returns \fIfalse\fR\&, \fIatom\fR\& will be the category of the unquoted atom\&.
.TP 2
.B
\fIreturn_comments\fR\&:
Return comment tokens\&.
.TP 2
.B
\fIreturn_white_spaces\fR\&:
Return white space tokens\&. By convention, if there is a newline character, it is always the first character of the text (there cannot be more than one newline in a white space token)\&.
.TP 2
.B
\fIreturn\fR\&:
Short for \fI[return_comments, return_white_spaces]\fR\&\&.
.TP 2
.B
\fItext\fR\&:
Include the token\&'s text in the token attributes\&. The text is the part of the input corresponding to the token\&.
.RE
.RE

.LP
.nf

.B
tokens(Continuation, CharSpec, StartLocation) -> Return
.br
.fi
.br
.nf

.B
tokens(Continuation, CharSpec, StartLocation, Options) -> Return
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Continuation = \fBreturn_cont()\fR\& | []
.br
CharSpec = \fBchar_spec()\fR\&
.br
StartLocation = \fBlocation()\fR\&
.br
Options = \fBoptions()\fR\&
.br
Return = {done,
.br
          Result :: \fBtokens_result()\fR\&,
.br
          LeftOverChars :: \fBchar_spec()\fR\&}
.br
       | {more, Continuation1 :: \fBreturn_cont()\fR\&}
.br
.nf
\fBchar_spec()\fR\& = string() | eof
.fi
.br
.nf
\fBreturn_cont()\fR\&
.fi
.br
.RS 2
An opaque continuation
.RE
.RE
.RE
.RS
.LP
This is the re-entrant scanner which scans characters until a \fIdot\fR\& (\&'\&.\&' followed by a white space) or \fIeof\fR\& has been reached\&. It returns:
.RS 2
.TP 2
.B
\fI{done, Result, LeftOverChars}\fR\&:
This return indicates that there is sufficient input data to get a result\&. \fIResult\fR\& is:
.RS 2
.TP 2
.B
\fI{ok, Tokens, EndLocation}\fR\&:
The scanning was successful\&. \fITokens\fR\& is the list of tokens including \fIdot\fR\&\&.
.TP 2
.B
\fI{eof, EndLocation}\fR\&:
End of file was encountered before any more tokens\&.
.TP 2
.B
\fI{error, ErrorInfo, EndLocation}\fR\&:
An error occurred\&. \fILeftOverChars\fR\& is the remaining characters of the input data, starting from \fIEndLocation\fR\&\&.
.RE
.TP 2
.B
\fI{more, Continuation1}\fR\&:
More data is required for building a term\&. \fIContinuation1\fR\& must be passed in a new call to \fItokens/3,4\fR\& when more data is available\&.
.RE
.LP
The \fICharSpec\fR\& \fIeof\fR\& signals end of file\&. \fILeftOverChars\fR\& will then take the value \fIeof\fR\& as well\&.
.LP
\fItokens(Continuation, CharSpec, StartLocation)\fR\& is equivalent to \fItokens(Continuation, CharSpec, StartLocation, [])\fR\&\&.
.LP
See \fBstring/3\fR\& for a description of the various options\&.
.RE

.LP
.nf

.B
reserved_word(Atom :: atom()) -> boolean()
.br
.fi
.br
.RS
.LP
Returns \fItrue\fR\& if \fIAtom\fR\& is an Erlang reserved word, otherwise \fIfalse\fR\&\&.
.RE

.LP
.nf

.B
token_info(Token) -> TokenInfo
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = \fBtoken()\fR\&
.br
TokenInfo = [TokenInfoTuple :: \fBtoken_info()\fR\&]
.br
.RE
.RE
.RS
.LP
Returns a list containing information about the token \fIToken\fR\&\&. The order of the \fITokenInfoTuple\fR\&s is not defined\&. See \fBtoken_info/2\fR\& for information about specific \fITokenInfoTuple\fR\&s\&.
.LP
Note that if \fItoken_info(Token, TokenItem)\fR\& returns \fIundefined\fR\& for some \fITokenItem\fR\&, the item is not included in \fITokenInfo\fR\&\&.
.RE

.LP
.nf

.B
token_info(Token, TokenItem) -> TokenInfoTuple | undefined
.br
.fi
.br
.nf

.B
token_info(Token, TokenItems) -> TokenInfo
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = \fBtoken()\fR\&
.br
TokenItems = [TokenItem :: \fBtoken_item()\fR\&]
.br
TokenInfo = [TokenInfoTuple :: \fBtoken_info()\fR\&]
.br
.nf
\fBtoken_item()\fR\& = category | symbol | \fBattribute_item()\fR\&
.fi
.br
.nf
\fBattribute_item()\fR\& = column | length | line | location | text
.fi
.br
.RE
.RE
.RS
.LP
Returns a list containing information about the token \fIToken\fR\&\&. If one single \fITokenItem\fR\& is given the returned value is the corresponding \fITokenInfoTuple\fR\&, or \fIundefined\fR\& if the \fITokenItem\fR\& has no value\&. If a list of \fITokenItem\fR\&s is given the result is a list of \fITokenInfoTuple\fR\&\&. The \fITokenInfoTuple\fR\&s will appear with the corresponding \fITokenItem\fR\&s in the same order as the \fITokenItem\fR\&s appear in the list of \fITokenItem\fR\&s\&. \fITokenItem\fR\&s with no value are not included in the list of \fITokenInfoTuple\fR\&\&.
.LP
The following \fITokenInfoTuple\fR\&s with corresponding \fITokenItem\fR\&s are valid:
.RS 2
.TP 2
.B
\fI{category, \fB category()\fR\&}\fR\&:
The category of the token\&.
.TP 2
.B
\fI{column, \fB column()\fR\&}\fR\&:
The column where the token begins\&.
.TP 2
.B
\fI{length, integer() > 0}\fR\&:
The length of the token\&'s text\&.
.TP 2
.B
\fI{line, \fB line()\fR\&}\fR\&:
The line where the token begins\&.
.TP 2
.B
\fI{location, \fB location()\fR\&}\fR\&:
The line and column where the token begins, or just the line if the column unknown\&.
.TP 2
.B
\fI{symbol, \fB symbol()\fR\&}\fR\&:
The token\&'s symbol\&.
.TP 2
.B
\fI{text, string()}\fR\&:
The token\&'s text\&.
.RE
.RE

.LP
.nf

.B
attributes_info(Attributes) -> AttributesInfo
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Attributes = \fBattributes()\fR\&
.br
AttributesInfo = [AttributeInfoTuple :: \fBattribute_info()\fR\&]
.br
.RE
.RE
.RS
.LP
Returns a list containing information about the token attributes \fIAttributes\fR\&\&. The order of the \fIAttributeInfoTuple\fR\&s is not defined\&. See \fBattributes_info/2\fR\& for information about specific \fIAttributeInfoTuple\fR\&s\&.
.LP
Note that if \fIattributes_info(Token, AttributeItem)\fR\& returns \fIundefined\fR\& for some \fIAttributeItem\fR\& in the list above, the item is not included in \fIAttributesInfo\fR\&\&.
.RE

.LP
.nf

.B
attributes_info(Attributes, AttributeItem) ->
.B
                   AttributeInfoTuple | undefined
.br
.fi
.br
.nf

.B
attributes_info(Attributes, AttributeItems) -> AttributeInfo
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Attributes = \fBattributes()\fR\&
.br
AttributeItems = [AttributeItem :: \fBattribute_item()\fR\&]
.br
AttributeInfo = [AttributeInfoTuple :: \fBattribute_info()\fR\&]
.br
.nf
\fBattribute_item()\fR\& = column | length | line | location | text
.fi
.br
.RE
.RE
.RS
.LP
Returns a list containing information about the token attributes \fIAttributes\fR\&\&. If one single \fIAttributeItem\fR\& is given the returned value is the corresponding \fIAttributeInfoTuple\fR\&, or \fIundefined\fR\& if the \fIAttributeItem\fR\& has no value\&. If a list of \fIAttributeItem\fR\& is given the result is a list of \fIAttributeInfoTuple\fR\&\&. The \fIAttributeInfoTuple\fR\&s will appear with the corresponding \fIAttributeItem\fR\&s in the same order as the \fIAttributeItem\fR\&s appear in the list of \fIAttributeItem\fR\&s\&. \fIAttributeItem\fR\&s with no value are not included in the list of \fIAttributeInfoTuple\fR\&\&.
.LP
The following \fIAttributeInfoTuple\fR\&s with corresponding \fIAttributeItem\fR\&s are valid:
.RS 2
.TP 2
.B
\fI{column, \fB column()\fR\&}\fR\&:
The column where the token begins\&.
.TP 2
.B
\fI{length, integer() > 0}\fR\&:
The length of the token\&'s text\&.
.TP 2
.B
\fI{line, \fB line()\fR\&}\fR\&:
The line where the token begins\&.
.TP 2
.B
\fI{location, \fB location()\fR\&}\fR\&:
The line and column where the token begins, or just the line if the column unknown\&.
.TP 2
.B
\fI{text, string()}\fR\&:
The token\&'s text\&.
.RE
.RE

.LP
.nf

.B
set_attribute(AttributeItem, Attributes, SetAttributeFun) ->
.B
                 Attributes
.br
.fi
.br
.RS
.LP
Types:

.RS 3
AttributeItem = line
.br
Attributes = \fBattributes()\fR\&
.br
SetAttributeFun = fun((\fBinfo_line()\fR\&) -> \fBinfo_line()\fR\&)
.br
.RE
.RE
.RS
.LP
Sets the value of the \fIline\fR\& attribute of the token attributes \fIAttributes\fR\&\&.
.LP
The \fISetAttributeFun\fR\& is called with the value of the \fIline\fR\& attribute, and is to return the new value of the \fIline\fR\& attribute\&.
.RE

.LP
.nf

.B
format_error(ErrorDescriptor) -> string()
.br
.fi
.br
.RS
.LP
Types:

.RS 3
ErrorDescriptor = \fBerror_description()\fR\&
.br
.RE
.RE
.RS
.LP
Takes an \fIErrorDescriptor\fR\& and returns a string which describes the error or warning\&. This function is usually called implicitly when processing an \fIErrorInfo\fR\& structure (see below)\&.
.RE

.SH "ERROR INFORMATION"

.LP
The \fIErrorInfo\fR\& mentioned above is the standard \fIErrorInfo\fR\& structure which is returned from all IO modules\&. It has the following format:
.LP
.nf

{ErrorLocation, Module, ErrorDescriptor}
.fi
.LP
A string which describes the error is obtained with the following call:
.LP
.nf

Module:format_error(ErrorDescriptor)
.fi
.SH "NOTES"

.LP
The continuation of the first call to the re-entrant input functions must be \fI[]\fR\&\&. Refer to Armstrong, Virding and Williams, \&'Concurrent Programming in Erlang\&', Chapter 13, for a complete description of how the re-entrant input scheme works\&.
.SH "SEE ALSO"

.LP
\fBio(3erl)\fR\&, \fBerl_parse(3erl)\fR\&