.TH "Str" 3o source: 2019-01-25 OCamldoc "OCaml library" .SH NAME Str \- Regular expressions and high-level string processing .SH Module Module Str .SH Documentation .sp Module .BI "Str" : .B sig end .sp Regular expressions and high\-level string processing .sp .sp .sp .PP .B === .B Regular expressions .B === .PP .I type regexp .sp The type of compiled regular expressions\&. .sp .I val regexp : .B string -> regexp .sp Compile a regular expression\&. The following constructs are recognized: .sp \- .B \&. Matches any character except newline\&. .sp \- .B * (postfix) Matches the preceding expression zero, one or several times .sp \- .B + (postfix) Matches the preceding expression one or several times .sp \- .B ? (postfix) Matches the preceding expression once or not at all .sp \- .B [\&.\&.] Character set\&. Ranges are denoted with .B \- , as in .B [a\-z] \&. An initial .B ^ , as in .B [^0\-9] , complements the set\&. To include a .B ] character in a set, make it the first character of the set\&. To include a .B \- character in a set, make it the first or the last character of the set\&. .sp \- .B ^ Matches at beginning of line: either at the beginning of the matched string, or just after a \&'\(rsn\&' character\&. .sp \- .B $ Matches at end of line: either at the end of the matched string, or just before a \&'\(rsn\&' character\&. .sp \- .B \(rs| (infix) Alternative between two expressions\&. .sp \- .B \(rs(\&.\&.\(rs) Grouping and naming of the enclosed expression\&. .sp \- .B \(rs1 The text matched by the first .B \(rs(\&.\&.\&.\(rs) expression ( .B \(rs2 for the second expression, and so on up to .B \(rs9 )\&. .sp \- .B \(rsb Matches word boundaries\&. .sp \- .B \(rs Quotes special characters\&. The special characters are .B $^\(rs\&.*+?[] \&. Note: the argument to .B regexp is usually a string literal\&. In this case, any backslash character in the regular expression must be doubled to make it past the OCaml string parser\&. For example, the following expression: .B let r = Str\&.regexp "hello \(rs\(rs([A\-Za\-z]+\(rs\(rs)" in .B Str\&.replace_first r "\(rs\(rs1" "hello world" returns the string .B "world" \&. .sp In particular, if you want a regular expression that matches a single backslash character, you need to quote it in the argument to .B regexp (according to the last item of the list above) by adding a second backslash\&. Then you need to quote both backslashes (according to the syntax of string constants in OCaml) by doubling them again, so you need to write four backslash characters: .B Str\&.regexp "\(rs\(rs\(rs\(rs" \&. .sp .I val regexp_case_fold : .B string -> regexp .sp Same as .B regexp , but the compiled expression will match text in a case\-insensitive way: uppercase and lowercase letters will be considered equivalent\&. .sp .I val quote : .B string -> string .sp .B Str\&.quote s returns a regexp string that matches exactly .B s and nothing else\&. .sp .I val regexp_string : .B string -> regexp .sp .B Str\&.regexp_string s returns a regular expression that matches exactly .B s and nothing else\&. .sp .I val regexp_string_case_fold : .B string -> regexp .sp .B Str\&.regexp_string_case_fold is similar to .B Str\&.regexp_string , but the regexp matches in a case\-insensitive way\&. .sp .PP .B === .B String matching and searching .B === .PP .I val string_match : .B regexp -> string -> int -> bool .sp .B string_match r s start tests whether a substring of .B s that starts at position .B start matches the regular expression .B r \&. The first character of a string has position .B 0 , as usual\&. .sp .I val search_forward : .B regexp -> string -> int -> int .sp .B search_forward r s start searches the string .B s for a substring matching the regular expression .B r \&. The search starts at position .B start and proceeds towards the end of the string\&. Return the position of the first character of the matched substring\&. .sp .B "Raises Not_found" if no substring matches\&. .sp .I val search_backward : .B regexp -> string -> int -> int .sp .B search_backward r s last searches the string .B s for a substring matching the regular expression .B r \&. The search first considers substrings that start at position .B last and proceeds towards the beginning of string\&. Return the position of the first character of the matched substring\&. .sp .B "Raises Not_found" if no substring matches\&. .sp .I val string_partial_match : .B regexp -> string -> int -> bool .sp Similar to .B Str\&.string_match , but also returns true if the argument string is a prefix of a string that matches\&. This includes the case of a true complete match\&. .sp .I val matched_string : .B string -> string .sp .B matched_string s returns the substring of .B s that was matched by the last call to one of the following matching or searching functions: .sp \- .B Str\&.string_match .sp \- .B Str\&.search_forward .sp \- .B Str\&.search_backward .sp \- .B Str\&.string_partial_match .sp \- .B Str\&.global_substitute .sp \- .B Str\&.substitute_first provided that none of the following functions was called inbetween: .sp \- .B Str\&.global_replace .sp \- .B Str\&.replace_first .sp \- .B Str\&.split .sp \- .B Str\&.bounded_split .sp \- .B Str\&.split_delim .sp \- .B Str\&.bounded_split_delim .sp \- .B Str\&.full_split .sp \- .B Str\&.bounded_full_split Note: in the case of .B global_substitute and .B substitute_first , a call to .B matched_string is only valid within the .B subst argument, not after .B global_substitute or .B substitute_first returns\&. .sp The user must make sure that the parameter .B s is the same string that was passed to the matching or searching function\&. .sp .I val match_beginning : .B unit -> int .sp .B match_beginning() returns the position of the first character of the substring that was matched by the last call to a matching or searching function (see .B Str\&.matched_string for details)\&. .sp .I val match_end : .B unit -> int .sp .B match_end() returns the position of the character following the last character of the substring that was matched by the last call to a matching or searching function (see .B Str\&.matched_string for details)\&. .sp .I val matched_group : .B int -> string -> string .sp .B matched_group n s returns the substring of .B s that was matched by the .B n th group .B \(rs(\&.\&.\&.\(rs) of the regular expression that was matched by the last call to a matching or searching function (see .B Str\&.matched_string for details)\&. The user must make sure that the parameter .B s is the same string that was passed to the matching or searching function\&. .sp .B "Raises Not_found" if the .B n th group of the regular expression was not matched\&. This can happen with groups inside alternatives .B \(rs| , options .B ? or repetitions .B * \&. For instance, the empty string will match .B \(rs(a\(rs)* , but .B matched_group 1 "" will raise .B Not_found because the first group itself was not matched\&. .sp .I val group_beginning : .B int -> int .sp .B group_beginning n returns the position of the first character of the substring that was matched by the .B n th group of the regular expression that was matched by the last call to a matching or searching function (see .B Str\&.matched_string for details)\&. .sp .B "Raises Not_found" if the .B n th group of the regular expression was not matched\&. .sp .B "Raises Invalid_argument" if there are fewer than .B n groups in the regular expression\&. .sp .I val group_end : .B int -> int .sp .B group_end n returns the position of the character following the last character of substring that was matched by the .B n th group of the regular expression that was matched by the last call to a matching or searching function (see .B Str\&.matched_string for details)\&. .sp .B "Raises Not_found" if the .B n th group of the regular expression was not matched\&. .sp .B "Raises Invalid_argument" if there are fewer than .B n groups in the regular expression\&. .sp .PP .B === .B Replacement .B === .PP .I val global_replace : .B regexp -> string -> string -> string .sp .B global_replace regexp templ s returns a string identical to .B s , except that all substrings of .B s that match .B regexp have been replaced by .B templ \&. The replacement template .B templ can contain .B \(rs1 , .B \(rs2 , etc; these sequences will be replaced by the text matched by the corresponding group in the regular expression\&. .B \(rs0 stands for the text matched by the whole regular expression\&. .sp .I val replace_first : .B regexp -> string -> string -> string .sp Same as .B Str\&.global_replace , except that only the first substring matching the regular expression is replaced\&. .sp .I val global_substitute : .B regexp -> (string -> string) -> string -> string .sp .B global_substitute regexp subst s returns a string identical to .B s , except that all substrings of .B s that match .B regexp have been replaced by the result of function .B subst \&. The function .B subst is called once for each matching substring, and receives .B s (the whole text) as argument\&. .sp .I val substitute_first : .B regexp -> (string -> string) -> string -> string .sp Same as .B Str\&.global_substitute , except that only the first substring matching the regular expression is replaced\&. .sp .I val replace_matched : .B string -> string -> string .sp .B replace_matched repl s returns the replacement text .B repl in which .B \(rs1 , .B \(rs2 , etc\&. have been replaced by the text matched by the corresponding groups in the regular expression that was matched by the last call to a matching or searching function (see .B Str\&.matched_string for details)\&. .B s must be the same string that was passed to the matching or searching function\&. .sp .PP .B === .B Splitting .B === .PP .I val split : .B regexp -> string -> string list .sp .B split r s splits .B s into substrings, taking as delimiters the substrings that match .B r , and returns the list of substrings\&. For instance, .B split (regexp "[ \(rst]+") s splits .B s into blank\-separated words\&. An occurrence of the delimiter at the beginning or at the end of the string is ignored\&. .sp .I val bounded_split : .B regexp -> string -> int -> string list .sp Same as .B Str\&.split , but splits into at most .B n substrings, where .B n is the extra integer parameter\&. .sp .I val split_delim : .B regexp -> string -> string list .sp Same as .B Str\&.split but occurrences of the delimiter at the beginning and at the end of the string are recognized and returned as empty strings in the result\&. For instance, .B split_delim (regexp " ") " abc " returns .B [""; "abc"; ""] , while .B split with the same arguments returns .B ["abc"] \&. .sp .I val bounded_split_delim : .B regexp -> string -> int -> string list .sp Same as .B Str\&.bounded_split , but occurrences of the delimiter at the beginning and at the end of the string are recognized and returned as empty strings in the result\&. .sp .I type split_result = | Text .B of .B string | Delim .B of .B string .sp .sp .I val full_split : .B regexp -> string -> split_result list .sp Same as .B Str\&.split_delim , but returns the delimiters as well as the substrings contained between delimiters\&. The former are tagged .B Delim in the result list; the latter are tagged .B Text \&. For instance, .B full_split (regexp "[{}]") "{ab}" returns .B [Delim "{"; Text "ab"; Delim "}"] \&. .sp .I val bounded_full_split : .B regexp -> string -> int -> split_result list .sp Same as .B Str\&.bounded_split_delim , but returns the delimiters as well as the substrings contained between delimiters\&. The former are tagged .B Delim in the result list; the latter are tagged .B Text \&. .sp .PP .B === .B Extracting substrings .B === .PP .I val string_before : .B string -> int -> string .sp .B string_before s n returns the substring of all characters of .B s that precede position .B n (excluding the character at position .B n )\&. .sp .I val string_after : .B string -> int -> string .sp .B string_after s n returns the substring of all characters of .B s that follow position .B n (including the character at position .B n )\&. .sp .I val first_chars : .B string -> int -> string .sp .B first_chars s n returns the first .B n characters of .B s \&. This is the same function as .B Str\&.string_before \&. .sp .I val last_chars : .B string -> int -> string .sp .B last_chars s n returns the last .B n characters of .B s \&. .sp