.TH erl_scan 3erl "stdlib 4.3.1.4" "Ericsson AB" "Erlang Module Definition"
.SH NAME
erl_scan \- The Erlang token scanner.
.SH DESCRIPTION
.LP
This module contains functions for tokenizing (scanning) characters into Erlang tokens\&.
.SH DATA TYPES
.nf

\fBcategory()\fR\& = atom()
.br
.fi
.nf

\fBerror_description()\fR\& = term()
.br
.fi
.nf

\fBerror_info()\fR\& = 
.br
    {erl_anno:location(), module(), error_description()}
.br
.fi
.nf

\fBoption()\fR\& = 
.br
    return | return_white_spaces | return_comments | text |
.br
    {reserved_word_fun, resword_fun()} |
.br
    {text_fun, text_fun()}
.br
.fi
.nf

\fBoptions()\fR\& = option() | [option()]
.br
.fi
.nf

\fBsymbol()\fR\& = atom() | float() | integer() | string()
.br
.fi
.nf

\fBresword_fun()\fR\& = fun((atom()) -> boolean())
.br
.fi
.nf

\fBtoken()\fR\& = 
.br
    {category(), Anno :: erl_anno:anno(), symbol()} |
.br
    {category(), Anno :: erl_anno:anno()}
.br
.fi
.nf

\fBtokens()\fR\& = [token()]
.br
.fi
.nf

\fBtokens_result()\fR\& = 
.br
    {ok, Tokens :: tokens(), EndLocation :: erl_anno:location()} |
.br
    {eof, EndLocation :: erl_anno:location()} |
.br
    {error,
.br
     ErrorInfo :: error_info(),
.br
     EndLocation :: erl_anno:location()}
.br
.fi
.nf

\fBtext_fun()\fR\& = fun((atom(), string()) -> boolean())
.br
.fi
.SH EXPORTS
.LP
.nf

.B
category(Token) -> category()
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = token()
.br
.RE
.RE
.RS
.LP
Returns the category of \fIToken\fR\&\&.
.RE

.LP
.nf

.B
column(Token) -> erl_anno:column() | undefined
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = token()
.br
.RE
.RE
.RS
.LP
Returns the column of \fIToken\fR\&\&'s collection of annotations\&.
.RE

.LP
.nf

.B
end_location(Token) -> erl_anno:location() | undefined
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = token()
.br
.RE
.RE
.RS
.LP
Returns the end location of the text of \fIToken\fR\&\&'s collection of annotations\&. If there is no text, \fIundefined\fR\& is returned\&.
.RE

.LP
.nf

.B
format_error(ErrorDescriptor) -> string()
.br
.fi
.br
.RS
.LP
Types:

.RS 3
ErrorDescriptor = error_description()
.br
.RE
.RE
.RS
.LP
Uses an \fIErrorDescriptor\fR\& and returns a string that describes the error or warning\&. This function is usually called implicitly when an \fIErrorInfo\fR\& structure is processed (see section Error Information)\&.
.RE

.LP
.nf

.B
line(Token) -> erl_anno:line()
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = token()
.br
.RE
.RE
.RS
.LP
Returns the line of \fIToken\fR\&\&'s collection of annotations\&.
.RE

.LP
.nf

.B
location(Token) -> erl_anno:location()
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = token()
.br
.RE
.RE
.RS
.LP
Returns the location of \fIToken\fR\&\&'s collection of annotations\&.
.RE

.LP
.nf

.B
reserved_word(Atom :: atom()) -> boolean()
.br
.fi
.br
.RS
.LP
Returns \fItrue\fR\& if \fIAtom\fR\& is an Erlang reserved word, otherwise \fIfalse\fR\&\&.
.RE

.LP
.nf

.B
string(String) -> Return
.br
.fi
.br
.nf

.B
string(String, StartLocation) -> Return
.br
.fi
.br
.nf

.B
string(String, StartLocation, Options) -> Return
.br
.fi
.br
.RS
.LP
Types:

.RS 3
String = string()
.br
Options = options()
.br
Return = 
.br
    {ok, Tokens :: tokens(), EndLocation} |
.br
    {error, ErrorInfo :: error_info(), ErrorLocation}
.br
StartLocation = EndLocation = ErrorLocation = erl_anno:location()
.br
.RE
.RE
.RS
.LP
Takes the list of characters \fIString\fR\& and tries to scan (tokenize) them\&. Returns one of the following:
.RS 2
.TP 2
.B
\fI{ok, Tokens, EndLocation}\fR\&:
\fITokens\fR\& are the Erlang tokens from \fIString\fR\&\&. \fIEndLocation\fR\& is the first location after the last token\&.
.TP 2
.B
\fI{error, ErrorInfo, ErrorLocation}\fR\&:
An error occurred\&. \fIErrorLocation\fR\& is the first location after the erroneous token\&.
.RE
.LP
\fIstring(String)\fR\& is equivalent to \fIstring(String, 1)\fR\&, and \fIstring(String, StartLocation)\fR\& is equivalent to \fIstring(String, StartLocation, [])\fR\&\&.
.LP
\fIStartLocation\fR\& indicates the initial location when scanning starts\&. If \fIStartLocation\fR\& is a line, \fIAnno\fR\&, \fIEndLocation\fR\&, and \fIErrorLocation\fR\& are lines\&. If \fIStartLocation\fR\& is a pair of a line and a column, \fIAnno\fR\& takes the form of an opaque compound data type, and \fIEndLocation\fR\& and \fIErrorLocation\fR\& are pairs of a line and a column\&. The \fItoken annotations\fR\& contain information about the column and the line where the token begins, as well as the text of the token (if option \fItext\fR\& is specified), all of which can be accessed by calling \fIcolumn/1\fR\&, \fIline/1\fR\&, \fIlocation/1\fR\&, and \fItext/1\fR\&\&.
.LP
A \fItoken\fR\& is a tuple containing information about syntactic category, the token annotations, and the terminal symbol\&. For punctuation characters (such as \fI;\fR\& and \fI|\fR\&) and reserved words, the category and the symbol coincide, and the token is represented by a two-tuple\&. Three-tuples have one of the following forms:
.RS 2
.TP 2
*
\fI{atom, Anno, atom()}\fR\&
.LP
.TP 2
*
\fI{char, Anno, char()}\fR\&
.LP
.TP 2
*
\fI{comment, Anno, string()}\fR\&
.LP
.TP 2
*
\fI{float, Anno, float()}\fR\&
.LP
.TP 2
*
\fI{integer, Anno, integer()}\fR\&
.LP
.TP 2
*
\fI{var, Anno, atom()}\fR\&
.LP
.TP 2
*
\fI{white_space, Anno, string()}\fR\&
.LP
.RE

.LP
Valid options:
.RS 2
.TP 2
.B
\fI{reserved_word_fun, reserved_word_fun()}\fR\&:
A callback function that is called when the scanner has found an unquoted atom\&. If the function returns \fItrue\fR\&, the unquoted atom itself becomes the category of the token\&. If the function returns \fIfalse\fR\&, \fIatom\fR\& becomes the category of the unquoted atom\&.
.TP 2
.B
\fIreturn_comments\fR\&:
Return comment tokens\&.
.TP 2
.B
\fIreturn_white_spaces\fR\&:
Return white space tokens\&. By convention, a newline character, if present, is always the first character of the text (there cannot be more than one newline in a white space token)\&.
.TP 2
.B
\fIreturn\fR\&:
Short for \fI[return_comments, return_white_spaces]\fR\&\&.
.TP 2
.B
\fItext\fR\&:
Include the token text in the token annotation\&. The text is the part of the input corresponding to the token\&. See also \fItext_fun\fR\&\&.
.TP 2
.B
\fI{text_fun, text_fun()}\fR\&:
A callback function used to determine whether the full text for the token shall be included in the token annotation\&. Arguments of the function are the category of the token and the full token string\&. This is only used when \fItext\fR\& is not present\&. If neither are present the text will not be saved in the token annotation\&.
.RE
.RE

.LP
.nf

.B
symbol(Token) -> symbol()
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = token()
.br
.RE
.RE
.RS
.LP
Returns the symbol of \fIToken\fR\&\&.
.RE

.LP
.nf

.B
text(Token) -> erl_anno:text() | undefined
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Token = token()
.br
.RE
.RE
.RS
.LP
Returns the text of \fIToken\fR\&\&'s collection of annotations\&. If there is no text, \fIundefined\fR\& is returned\&.
.RE

.LP
.nf

.B
tokens(Continuation, CharSpec, StartLocation) -> Return
.br
.fi
.br
.nf

.B
tokens(Continuation, CharSpec, StartLocation, Options) -> Return
.br
.fi
.br
.RS
.LP
Types:

.RS 3
Continuation = return_cont() | []
.br
CharSpec = char_spec()
.br
StartLocation = erl_anno:location()
.br
Options = options()
.br
Return = 
.br
    {done,
.br
     Result :: tokens_result(),
.br
     LeftOverChars :: char_spec()} |
.br
    {more, Continuation1 :: return_cont()}
.br
.nf
\fBchar_spec()\fR\& = string() | eof
.fi
.br
.nf
\fBreturn_cont()\fR\&
.fi
.br
.RS 2
An opaque continuation\&.
.RE
.RE
.RE
.RS
.LP
This is the re-entrant scanner, which scans characters until either a \fIdot\fR\& (\&'\&.\&' followed by a white space) or \fIeof\fR\& is reached\&. It returns:
.RS 2
.TP 2
.B
\fI{done, Result, LeftOverChars}\fR\&:
Indicates that there is sufficient input data to get a result\&. \fIResult\fR\& is:
.RS 2
.TP 2
.B
\fI{ok, Tokens, EndLocation}\fR\&:
The scanning was successful\&. \fITokens\fR\& is the list of tokens including \fIdot\fR\&\&.
.TP 2
.B
\fI{eof, EndLocation}\fR\&:
End of file was encountered before any more tokens\&.
.TP 2
.B
\fI{error, ErrorInfo, EndLocation}\fR\&:
An error occurred\&. \fILeftOverChars\fR\& is the remaining characters of the input data, starting from \fIEndLocation\fR\&\&.
.RE
.TP 2
.B
\fI{more, Continuation1}\fR\&:
More data is required for building a term\&. \fIContinuation1\fR\& must be passed in a new call to \fItokens/3,4\fR\& when more data is available\&.
.RE
.LP
The \fICharSpec\fR\& \fIeof\fR\& signals end of file\&. \fILeftOverChars\fR\& then takes the value \fIeof\fR\& as well\&.
.LP
\fItokens(Continuation, CharSpec, StartLocation)\fR\& is equivalent to \fItokens(Continuation, CharSpec, StartLocation, [])\fR\&\&.
.LP
For a description of the options, see \fIstring/3\fR\&\&.
.RE

.SH "ERROR INFORMATION"

.LP
\fIErrorInfo\fR\& is the standard \fIErrorInfo\fR\& structure that is returned from all I/O modules\&. The format is as follows:
.LP
.nf

{ErrorLocation, Module, ErrorDescriptor}
.fi
.LP
A string describing the error is obtained with the following call:
.LP
.nf

Module:format_error(ErrorDescriptor)
.fi
.SH "NOTES"

.LP
The continuation of the first call to the re-entrant input functions must be \fI[]\fR\&\&. For a complete description of how the re-entrant input scheme works, see Armstrong, Virding and Williams: \&'Concurrent Programming in Erlang\&', Chapter 13\&.
.SH "SEE ALSO"

.LP
\fIerl_anno(3erl)\fR\&, \fIerl_parse(3erl)\fR\&, \fIio(3erl)\fR\&