NAME¶
string::token::shell - Parsing of shell command line
SYNOPSIS¶
package require
Tcl 8.5
package require
string::token::shell ?1.1?
package require
string::token ?1?
package require
fileutil
::string token shell ?
-indices? ?
-partial?
string
DESCRIPTION¶
This package provides a command which parses a line of text using basic
sh-syntax into a list of words.
The complete set of procedures is described below.
- ::string token shell ?-indices? ?-partial?
string
- This command parses the input string under the assumption of it
following basic sh-syntax. The result of the command is a list of
words in the string. An error is thrown if the input does not
follow the allowed syntax. The behaviour can be modified by specifying any
of the two options -indices and -partial.
- -indices
- When specified the output is not a list of words, but a list of 4-tuples
describing the words. Each tuple contains the type of the word, its start-
and end-indices in the input, and the actual text of the word.
Note that the length of the word as given by the indices can differ from the
length of the word found in the last element of the tuple. The indices
describe the words extent in the input, including delimiters, intra-word
quoting, etc. whereas for the actual text of the word delimiters are
stripped, intra-word quoting decoded, etc.
The possible token types are
- PLAIN
- Plain word, not quoted.
- D:QUOTED
- Word is delimited by double-quotes.
- S:QUOTED
- Word is delimited by single-quotes.
- D:QUOTED:PART
- S:QUOTED:PART
- Like the previous types, but the word has no closing quote, i.e. is
incomplete. These token types can occur if and only if the option
-partial was specified, and only for the last word of the result.
If the option -partial was not specified such incomplete words
cause the command to thrown an error instead.
- -partial
- When specified the parser will accept an incomplete quoted word (i.e.
without closing quote) at the end of the line as valid instead of throwing
an error.
The basic shell syntax accepted here are unquoted, single- and double-quoted
words, separated by whitespace. Leading and trailing whitespace are possible
too, and stripped. Shell variables in their various forms are
not
recognized, nor are sub-shells. As for the recognized forms of words, see
below for the detailed specification.
- single-quoted word
- A single-quoted word begins with a single-quote character, i.e. '
(ASCII 39) followed by zero or more unicode characters not a single-quote,
and then closed by a single-quote.
The word must be followed by either the end of the string, or whitespace. A
word cannot directly follow the word.
- double-quoted word
- A double-quoted word begins with a double-quote character, i.e.
" (ASCII 34) followed by zero or more unicode characters not a
double-quote, and then closed by a double-quote.
Contrary to single-quoted words a double-quote can be embedded into the
word, by prefacing, i.e. escaping, i.e. quoting it with a backslash
character \ (ASCII 92). Similarly a backslash character must be
quoted with itself to be inserted literally.
- unquoted word
- Unquoted words are not delimited by quotes and thus cannot contain
whitespace or single-quote characters. Double-quote and backslash
characters can be put into unquoted words, by quting them like for
double-quoted words.
- whitespace
- Whitespace is any unicode space character. This is equivalent to string
is space, or the regular expression \\s.
Whitespace may occur before the first word, or after the last word.
Whitespace must occur between adjacent words.
BUGS, IDEAS, FEEDBACK¶
This document, and the package it describes, will undoubtedly contain bugs and
other problems. Please report such in the category
textutil of the
Tcllib Trackers [
http://core.tcl.tk/tcllib/reportlist]. Please also
report any ideas for enhancements you may have for either package and/or
documentation.
KEYWORDS¶
bash, lexing, parsing, shell, string, tokenization
CATEGORY¶
Text processing