.TH "Genlex" 3o 2023-09-20 OCamldoc "OCaml library" .SH NAME Genlex \- A generic lexical analyzer. .SH Module Module Genlex .SH Documentation .sp Module .BI "Genlex" : .B sig end .sp A generic lexical analyzer\&. .sp This module implements a simple \&'standard\&' lexical analyzer, presented as a function from character streams to token streams\&. It implements roughly the lexical conventions of OCaml, but is parameterized by the set of keywords of your language\&. .sp Example: a lexer suitable for a desk calculator is obtained by .EX .ft B let lexer = make_lexer ["+"; "\-"; "*"; "/"; "let"; "="; "("; ")"] .ft R .EE .sp The associated parser would be a function from .ft B token stream .ft R to, for instance, .ft B int .ft R , and would have rules such as: .sp .EX .ft B .br \& let rec parse_expr = parser .br \& | [< n1 = parse_atom; n2 = parse_remainder n1 >] \-> n2 .br \& and parse_atom = parser .br \& | [< \&'Int n >] \-> n .br \& | [< \&'Kwd "("; n = parse_expr; \&'Kwd ")" >] \-> n .br \& and parse_remainder n1 = parser .br \& | [< \&'Kwd "+"; n2 = parse_expr >] \-> n1 + n2 .br \& | [< >] \-> n1 .br \& .ft R .EE .sp One should notice that the use of the .ft B parser .ft R keyword and associated notation for streams are only available through camlp4 extensions\&. This means that one has to preprocess its sources e\&. g\&. by using the .ft B "\-pp" .ft R command\-line switch of the compilers\&. .sp .sp .sp .I type token = | Kwd .B of .B string | Ident .B of .B string | Int .B of .B int | Float .B of .B float | String .B of .B string | Char .B of .B char .sp The type of tokens\&. The lexical classes are: .ft B Int .ft R and .ft B Float .ft R for integer and floating\-point numbers; .ft B String .ft R for string literals, enclosed in double quotes; .ft B Char .ft R for character literals, enclosed in single quotes; .ft B Ident .ft R for identifiers (either sequences of letters, digits, underscores and quotes, or sequences of \&'operator characters\&' such as .ft B + .ft R , .ft B * .ft R , etc); and .ft B Kwd .ft R for keywords (either identifiers or single \&'special characters\&' such as .ft B ( .ft R , .ft B } .ft R , etc)\&. .sp .I val make_lexer : .B string list -> char Stream.t -> token Stream.t .sp Construct the lexer function\&. The first argument is the list of keywords\&. An identifier .ft B s .ft R is returned as .ft B Kwd s .ft R if .ft B s .ft R belongs to this list, and as .ft B Ident s .ft R otherwise\&. A special character .ft B s .ft R is returned as .ft B Kwd s .ft R if .ft B s .ft R belongs to this list, and cause a lexical error (exception .ft B Stream\&.Error .ft R with the offending lexeme as its parameter) otherwise\&. Blanks and newlines are skipped\&. Comments delimited by .ft B (* .ft R and .ft B *) .ft R are skipped as well, and can be nested\&. A .ft B Stream\&.Failure .ft R exception is raised if end of stream is unexpectedly reached\&. .sp