.TH erl_scan 3erl "stdlib 2.2" "Ericsson AB" "Erlang Module Definition" .SH NAME erl_scan \- The Erlang Token Scanner .SH DESCRIPTION .LP This module contains functions for tokenizing characters into Erlang tokens\&. .SH DATA TYPES .nf \fBattribute_info()\fR\& = {column, \fBcolumn()\fR\&} .br | {length, integer() >= 1} .br | {line, \fBinfo_line()\fR\&} .br | {location, \fBinfo_location()\fR\&} .br | {text, string()} .br .fi .nf \fBattributes()\fR\& = \fBline()\fR\& | \fBattributes_data()\fR\& .br .fi .nf \fBattributes_data()\fR\& = [{column, \fBcolumn()\fR\&} | .br {line, \fBinfo_line()\fR\&} | .br {text, string()}] .br | {\fBline()\fR\&, \fBcolumn()\fR\&} .br .fi .nf \fBcategory()\fR\& = atom() .br .fi .nf \fBcolumn()\fR\& = integer() >= 1 .br .fi .nf \fBerror_description()\fR\& = term() .br .fi .nf \fBerror_info()\fR\& = {\fBlocation()\fR\&, module(), \fBerror_description()\fR\&} .br .fi .nf \fBinfo_line()\fR\& = integer() | term() .br .fi .nf \fBinfo_location()\fR\& = \fBlocation()\fR\& | term() .br .fi .nf \fBline()\fR\& = integer() .br .fi .nf \fBlocation()\fR\& = \fBline()\fR\& | {\fBline()\fR\&, \fBcolumn()\fR\&} .br .fi .nf \fBoption()\fR\& = return .br | return_white_spaces .br | return_comments .br | text .br | {reserved_word_fun, \fBresword_fun()\fR\&} .br .fi .nf \fBoptions()\fR\& = \fBoption()\fR\& | [\fBoption()\fR\&] .br .fi .nf \fBsymbol()\fR\& = atom() | float() | integer() | string() .br .fi .nf \fBresword_fun()\fR\& = fun((atom()) -> boolean()) .br .fi .nf \fBtoken()\fR\& = {\fBcategory()\fR\&, \fBattributes()\fR\&, \fBsymbol()\fR\&} .br | {\fBcategory()\fR\&, \fBattributes()\fR\&} .br .fi .nf \fBtoken_info()\fR\& = {category, \fBcategory()\fR\&} .br | {symbol, \fBsymbol()\fR\&} .br | \fBattribute_info()\fR\& .br .fi .nf \fBtokens()\fR\& = [\fBtoken()\fR\&] .br .fi .nf \fBtokens_result()\fR\& = {ok, .br Tokens :: \fBtokens()\fR\&, .br EndLocation :: \fBlocation()\fR\&} .br | {eof, EndLocation :: \fBlocation()\fR\&} .br | {error, .br ErrorInfo :: \fBerror_info()\fR\&, .br EndLocation :: \fBlocation()\fR\&} .br .fi .SH EXPORTS .LP .nf .B string(String) -> Return .br .fi .br .nf .B string(String, StartLocation) -> Return .br .fi .br .nf .B string(String, StartLocation, Options) -> Return .br .fi .br .RS .LP Types: .RS 3 String = string() .br Options = \fBoptions()\fR\& .br Return = {ok, Tokens :: \fBtokens()\fR\&, EndLocation} .br | {error, ErrorInfo :: \fBerror_info()\fR\&, ErrorLocation} .br StartLocation = EndLocation = ErrorLocation = \fBlocation()\fR\& .br .RE .RE .RS .LP Takes the list of characters \fIString\fR\& and tries to scan (tokenize) them\&. Returns \fI{ok, Tokens, EndLocation}\fR\&, where \fITokens\fR\& are the Erlang tokens from \fIString\fR\&\&. \fIEndLocation\fR\& is the first location after the last token\&. .LP \fI{error, ErrorInfo, ErrorLocation}\fR\& is returned if an error occurs\&. \fIErrorLocation\fR\& is the first location after the erroneous token\&. .LP \fIstring(String)\fR\& is equivalent to \fIstring(String, 1)\fR\&, and \fIstring(String, StartLocation)\fR\& is equivalent to \fIstring(String, StartLocation, [])\fR\&\&. .LP \fIStartLocation\fR\& indicates the initial location when scanning starts\&. If \fIStartLocation\fR\& is a line \fIattributes()\fR\& as well as \fIEndLocation\fR\& and \fIErrorLocation\fR\& will be lines\&. If \fIStartLocation\fR\& is a pair of a line and a column \fIattributes()\fR\& takes the form of an opaque compound data type, and \fIEndLocation\fR\& and \fIErrorLocation\fR\& will be pairs of a line and a column\&. The \fItoken attributes\fR\& contain information about the column and the line where the token begins, as well as the text of the token (if the \fItext\fR\& option is given), all of which can be accessed by calling \fBtoken_info/1,2\fR\& or \fBattributes_info/1,2\fR\&\&. .LP A \fItoken\fR\& is a tuple containing information about syntactic category, the token attributes, and the actual terminal symbol\&. For punctuation characters (e\&.g\&. \fI;\fR\&, \fI|\fR\&) and reserved words, the category and the symbol coincide, and the token is represented by a two-tuple\&. Three-tuples have one of the following forms: \fI{atom, Info, atom()}\fR\&, \fI{char, Info, integer()}\fR\&, \fI{comment, Info, string()}\fR\&, \fI{float, Info, float()}\fR\&, \fI{integer, Info, integer()}\fR\&, \fI{var, Info, atom()}\fR\&, and \fI{white_space, Info, string()}\fR\&\&. .LP The valid options are: .RS 2 .TP 2 .B \fI{reserved_word_fun, reserved_word_fun()}\fR\&: A callback function that is called when the scanner has found an unquoted atom\&. If the function returns \fItrue\fR\&, the unquoted atom itself will be the category of the token; if the function returns \fIfalse\fR\&, \fIatom\fR\& will be the category of the unquoted atom\&. .TP 2 .B \fIreturn_comments\fR\&: Return comment tokens\&. .TP 2 .B \fIreturn_white_spaces\fR\&: Return white space tokens\&. By convention, if there is a newline character, it is always the first character of the text (there cannot be more than one newline in a white space token)\&. .TP 2 .B \fIreturn\fR\&: Short for \fI[return_comments, return_white_spaces]\fR\&\&. .TP 2 .B \fItext\fR\&: Include the token\&'s text in the token attributes\&. The text is the part of the input corresponding to the token\&. .RE .RE .LP .nf .B tokens(Continuation, CharSpec, StartLocation) -> Return .br .fi .br .nf .B tokens(Continuation, CharSpec, StartLocation, Options) -> Return .br .fi .br .RS .LP Types: .RS 3 Continuation = \fBreturn_cont()\fR\& | [] .br CharSpec = \fBchar_spec()\fR\& .br StartLocation = \fBlocation()\fR\& .br Options = \fBoptions()\fR\& .br Return = {done, .br Result :: \fBtokens_result()\fR\&, .br LeftOverChars :: \fBchar_spec()\fR\&} .br | {more, Continuation1 :: \fBreturn_cont()\fR\&} .br .nf \fBchar_spec()\fR\& = string() | eof .fi .br .nf \fBreturn_cont()\fR\& .fi .br .RS 2 An opaque continuation .RE .RE .RE .RS .LP This is the re-entrant scanner which scans characters until a \fIdot\fR\& (\&'\&.\&' followed by a white space) or \fIeof\fR\& has been reached\&. It returns: .RS 2 .TP 2 .B \fI{done, Result, LeftOverChars}\fR\&: This return indicates that there is sufficient input data to get a result\&. \fIResult\fR\& is: .RS 2 .TP 2 .B \fI{ok, Tokens, EndLocation}\fR\&: The scanning was successful\&. \fITokens\fR\& is the list of tokens including \fIdot\fR\&\&. .TP 2 .B \fI{eof, EndLocation}\fR\&: End of file was encountered before any more tokens\&. .TP 2 .B \fI{error, ErrorInfo, EndLocation}\fR\&: An error occurred\&. \fILeftOverChars\fR\& is the remaining characters of the input data, starting from \fIEndLocation\fR\&\&. .RE .TP 2 .B \fI{more, Continuation1}\fR\&: More data is required for building a term\&. \fIContinuation1\fR\& must be passed in a new call to \fItokens/3,4\fR\& when more data is available\&. .RE .LP The \fICharSpec\fR\& \fIeof\fR\& signals end of file\&. \fILeftOverChars\fR\& will then take the value \fIeof\fR\& as well\&. .LP \fItokens(Continuation, CharSpec, StartLocation)\fR\& is equivalent to \fItokens(Continuation, CharSpec, StartLocation, [])\fR\&\&. .LP See \fBstring/3\fR\& for a description of the various options\&. .RE .LP .nf .B reserved_word(Atom :: atom()) -> boolean() .br .fi .br .RS .LP Returns \fItrue\fR\& if \fIAtom\fR\& is an Erlang reserved word, otherwise \fIfalse\fR\&\&. .RE .LP .nf .B token_info(Token) -> TokenInfo .br .fi .br .RS .LP Types: .RS 3 Token = \fBtoken()\fR\& .br TokenInfo = [TokenInfoTuple :: \fBtoken_info()\fR\&] .br .RE .RE .RS .LP Returns a list containing information about the token \fIToken\fR\&\&. The order of the \fITokenInfoTuple\fR\&s is not defined\&. See \fBtoken_info/2\fR\& for information about specific \fITokenInfoTuple\fR\&s\&. .LP Note that if \fItoken_info(Token, TokenItem)\fR\& returns \fIundefined\fR\& for some \fITokenItem\fR\&, the item is not included in \fITokenInfo\fR\&\&. .RE .LP .nf .B token_info(Token, TokenItem) -> TokenInfoTuple | undefined .br .fi .br .nf .B token_info(Token, TokenItems) -> TokenInfo .br .fi .br .RS .LP Types: .RS 3 Token = \fBtoken()\fR\& .br TokenItems = [TokenItem :: \fBtoken_item()\fR\&] .br TokenInfo = [TokenInfoTuple :: \fBtoken_info()\fR\&] .br .nf \fBtoken_item()\fR\& = category | symbol | \fBattribute_item()\fR\& .fi .br .nf \fBattribute_item()\fR\& = column | length | line | location | text .fi .br .RE .RE .RS .LP Returns a list containing information about the token \fIToken\fR\&\&. If one single \fITokenItem\fR\& is given the returned value is the corresponding \fITokenInfoTuple\fR\&, or \fIundefined\fR\& if the \fITokenItem\fR\& has no value\&. If a list of \fITokenItem\fR\&s is given the result is a list of \fITokenInfoTuple\fR\&\&. The \fITokenInfoTuple\fR\&s will appear with the corresponding \fITokenItem\fR\&s in the same order as the \fITokenItem\fR\&s appear in the list of \fITokenItem\fR\&s\&. \fITokenItem\fR\&s with no value are not included in the list of \fITokenInfoTuple\fR\&\&. .LP The following \fITokenInfoTuple\fR\&s with corresponding \fITokenItem\fR\&s are valid: .RS 2 .TP 2 .B \fI{category, \fB category()\fR\&}\fR\&: The category of the token\&. .TP 2 .B \fI{column, \fB column()\fR\&}\fR\&: The column where the token begins\&. .TP 2 .B \fI{length, integer() > 0}\fR\&: The length of the token\&'s text\&. .TP 2 .B \fI{line, \fB line()\fR\&}\fR\&: The line where the token begins\&. .TP 2 .B \fI{location, \fB location()\fR\&}\fR\&: The line and column where the token begins, or just the line if the column unknown\&. .TP 2 .B \fI{symbol, \fB symbol()\fR\&}\fR\&: The token\&'s symbol\&. .TP 2 .B \fI{text, string()}\fR\&: The token\&'s text\&. .RE .RE .LP .nf .B attributes_info(Attributes) -> AttributesInfo .br .fi .br .RS .LP Types: .RS 3 Attributes = \fBattributes()\fR\& .br AttributesInfo = [AttributeInfoTuple :: \fBattribute_info()\fR\&] .br .RE .RE .RS .LP Returns a list containing information about the token attributes \fIAttributes\fR\&\&. The order of the \fIAttributeInfoTuple\fR\&s is not defined\&. See \fBattributes_info/2\fR\& for information about specific \fIAttributeInfoTuple\fR\&s\&. .LP Note that if \fIattributes_info(Token, AttributeItem)\fR\& returns \fIundefined\fR\& for some \fIAttributeItem\fR\& in the list above, the item is not included in \fIAttributesInfo\fR\&\&. .RE .LP .nf .B attributes_info(Attributes, AttributeItem) -> .B AttributeInfoTuple | undefined .br .fi .br .nf .B attributes_info(Attributes, AttributeItems) -> AttributeInfo .br .fi .br .RS .LP Types: .RS 3 Attributes = \fBattributes()\fR\& .br AttributeItems = [AttributeItem :: \fBattribute_item()\fR\&] .br AttributeInfo = [AttributeInfoTuple :: \fBattribute_info()\fR\&] .br .nf \fBattribute_item()\fR\& = column | length | line | location | text .fi .br .RE .RE .RS .LP Returns a list containing information about the token attributes \fIAttributes\fR\&\&. If one single \fIAttributeItem\fR\& is given the returned value is the corresponding \fIAttributeInfoTuple\fR\&, or \fIundefined\fR\& if the \fIAttributeItem\fR\& has no value\&. If a list of \fIAttributeItem\fR\& is given the result is a list of \fIAttributeInfoTuple\fR\&\&. The \fIAttributeInfoTuple\fR\&s will appear with the corresponding \fIAttributeItem\fR\&s in the same order as the \fIAttributeItem\fR\&s appear in the list of \fIAttributeItem\fR\&s\&. \fIAttributeItem\fR\&s with no value are not included in the list of \fIAttributeInfoTuple\fR\&\&. .LP The following \fIAttributeInfoTuple\fR\&s with corresponding \fIAttributeItem\fR\&s are valid: .RS 2 .TP 2 .B \fI{column, \fB column()\fR\&}\fR\&: The column where the token begins\&. .TP 2 .B \fI{length, integer() > 0}\fR\&: The length of the token\&'s text\&. .TP 2 .B \fI{line, \fB line()\fR\&}\fR\&: The line where the token begins\&. .TP 2 .B \fI{location, \fB location()\fR\&}\fR\&: The line and column where the token begins, or just the line if the column unknown\&. .TP 2 .B \fI{text, string()}\fR\&: The token\&'s text\&. .RE .RE .LP .nf .B set_attribute(AttributeItem, Attributes, SetAttributeFun) -> .B Attributes .br .fi .br .RS .LP Types: .RS 3 AttributeItem = line .br Attributes = \fBattributes()\fR\& .br SetAttributeFun = fun((\fBinfo_line()\fR\&) -> \fBinfo_line()\fR\&) .br .RE .RE .RS .LP Sets the value of the \fIline\fR\& attribute of the token attributes \fIAttributes\fR\&\&. .LP The \fISetAttributeFun\fR\& is called with the value of the \fIline\fR\& attribute, and is to return the new value of the \fIline\fR\& attribute\&. .RE .LP .nf .B format_error(ErrorDescriptor) -> string() .br .fi .br .RS .LP Types: .RS 3 ErrorDescriptor = \fBerror_description()\fR\& .br .RE .RE .RS .LP Takes an \fIErrorDescriptor\fR\& and returns a string which describes the error or warning\&. This function is usually called implicitly when processing an \fIErrorInfo\fR\& structure (see below)\&. .RE .SH "ERROR INFORMATION" .LP The \fIErrorInfo\fR\& mentioned above is the standard \fIErrorInfo\fR\& structure which is returned from all IO modules\&. It has the following format: .LP .nf {ErrorLocation, Module, ErrorDescriptor} .fi .LP A string which describes the error is obtained with the following call: .LP .nf Module:format_error(ErrorDescriptor) .fi .SH "NOTES" .LP The continuation of the first call to the re-entrant input functions must be \fI[]\fR\&\&. Refer to Armstrong, Virding and Williams, \&'Concurrent Programming in Erlang\&', Chapter 13, for a complete description of how the re-entrant input scheme works\&. .SH "SEE ALSO" .LP \fBio(3erl)\fR\&, \fBerl_parse(3erl)\fR\&