Name¶
lua-uri - Lua module for manipulating URIs
Loading the module¶
The URI module doesn't alter any global variables when it loads, so you can
decide what name you want to use to access it. You will probably want to load
it like this:
local URI = require "uri"
You can use a variable called something other than "URI" if you'd
like, or you could assign the table returned by "require" to a
global variable. In this documentation we'll assume you're using a variable
called "URI".
Parsing, validating and normalizing URIs¶
When you create a URI object, the string you supply is checked to make sure it
conforms to the appropriate standards. If everything is OK, the new object
will be returned, otherwise nil and an error message will be returned. You can
convert any errors into Lua exceptions using the "assert" function.
local URI = require "URI"
local uri = assert(URI:new("http://example.com/foo"))
-- In this case, these will print the original string.
-- They are both the same.
print(tostring(uri))
print(uri:uri())
You can extract individual parts of the URI with various accessor methods:
print(uri:scheme()) -- http
print(uri:host()) -- example.com
print(uri:path()) -- /foo
Some URIs will be 'normalized' automatically to produce an equivalent canonical
version. Nothing will be changed which would affect how the URI will be
interpreted. For example:
local uri = assert(URI:new("HTTP://EXAMPLE.COM:80/FOO"))
print(tostring(uri)) -- http://example.com/FOO
In this case the scheme and hostname were both converted to lowercase (but not
the path part, because that's case sensitive). The port number was also
removed because port 80 is the default anyway for HTTP URIs.
If you just want to make sure a URI is correct, but without throwing an
exception, use code like this:
local uri, err = URI:new(uri_to_test)
if uri then
print("valid, normalized to " .. tostring(uri))
else
print("invalid, error message is " .. err)
end
(Note that many invalid URIs will get processed as relative URI references, so
if you're expecting an absolute URI it's also a good idea to check that the
"is_relative" method returns false.)
Cloning URIs¶
To make a copy of a URI object, pass it to the constructor:
local original = URI:new("http://www/foo")
local copy = URI:new(original)
The two objects will contain the same information, but can be changed
independently.
Relative URIs¶
A relative URI reference is not a complete URI. It doesn't have a scheme, so it
doesn't really mean anything until it is resolved against an absolute URI. For
this reason, when you create a URI object from a relative URI, it will belong
to the special class "uri._relative". There is very little you can
do with a relative URI object other than get and set its path, query string,
and fragment identifier.
Relative URI objects can be created in the same way as absolute ones:
local uri = assert(URI:new("../path?query#fragment"))
print(uri:is_relative()) -- true
print(uri._NAME) -- uri._relative
There are two ways to resolve a relative URI reference against an absolute URI
to get another absolute URI. One is to create a new URI object, passing the
base URI as a second argument to the constructor:
local rel = assert(URI:new("../quux.html"))
local base = assert(URI:new("http://example.com/foo/bar/"))
local abs = assert(URI:new(rel, base))
print(tostring(abs)) -- http://example.com/foo/quux.html
You can also do this by passing strings to "new", instead of objects:
local abs = assert(URI:new("../quux.html",
"http://example.com/foo/bar/"))
print(tostring(abs)) -- http://example.com/foo/quux.html
Alternatively, a URI object containing a relative URI can be made absolute
without creating a new object using the "resolve" method:
local uri = assert(URI:new("../quux.html"))
local base = assert(URI:new("http://example.com/foo/bar/"))
uri:resolve(base)
print(tostring(uri)) -- http://example.com/foo/quux.html
The reverse process can be carried out with the "relativize" method,
creating a relative URI from an absolute one, where the relative URI can be
later resolved against a particular base URI:
local uri = assert(URI:new("http://example.com/foo/quux.html"))
local base = assert(URI:new("http://example.com/foo/bar/"))
uri:relativize(base)
print(tostring(uri)) -- ../quux.html
It is possible for a relative URI to have an authority part, although this is
very rare in practice. It is unlikely that you'll ever need to do this, but
you can create a URI like this:
local uri = assert(URI:new("//example.com/path"))
Methods¶
This is a complete list of the methods you can call on a generic "URI"
object once created by calling "new". Some URIs are created in more
specific classes (listed in the
URI schemes section), which may have
additional methods. Arguments shown in square brackets below are optional.
Note that all the accessor methods, like "path" and "uri",
can be used just to return the current value (if they are called without an
argument), or can set a new value while returning the old value. Passing nil
as the argument is generally different from not passing an argument at all, or
to passing an empty string.
- uri:default_port()
- Returns the default port used for this type of URI when no port number is
supplied in the authority part. This will be nil if the standard for the
URI's current scheme doesn't specify a default port, or if the scheme is
one which this library doesn't have any special understanding of.
local uri = assert(URI:new("http://example.com:123/"))
print(uri:default_port()) -- 80
- uri:eq(other)
- Returns true if the two URI objects contain the same URI.
"other" can also be a string, which will be converted to a URI
object (in order for the normalization to be done).
This can also be called as a stand-alone function if you don't know whether
either URI is an object or a string. For example:
print(URI.eq("http://example.com",
"HTTP://EXAMPLE.COM/"))
If either value is a string which isn't a valid URI, this will throw an
exception. It will however accept relative URIs, and they will be compared
as normal. A relative URI is never equal to an absolute one.
There is no less-than comparison function, as URIs don't have any particular
ordering. If you want to sort URI objects you're best bet is probably just
to compare the string versions:
function urisort (a, b)
return a:uri() < b:uri()
end
table.sort(t, urisort)
- uri:fragment([newvalue])
- Returns the current fragment part of the URI (the part after the
"#" character), or nil if the URI has no fragment part. Note
that an empty fragment (zero characters long) is different from one which
is completely missing.
If "newvalue" is supplied, changes the fragment to the new value,
percent encoding any characters which would not be valid in a fragment
part. Any percent encoding already done on the string will be left in
place (not double encoded). If "newvalue" is nil then any
existing fragment will be removed.
The syntax of fragments are meaningful only for particular media types of
resources, so there is no special behaviour for different URI
schemes.
- uri:host([newvalue])
- Get and set the host part of the authority in a URI. This can be a domain
name, an IPv4 address (four numbers separated by dots), or an IPv6 address
(which must include the enclosing square brackets used in URIs).
When setting a new host, the value is normalized to lowercase. An invalid
value will cause an exception to be thrown. The value can be an empty
string to indicate the default host.
Setting the value to nil will cause the host to be removed altogether,
leaving the URI with no authority component. This will throw an exception
if there is a userinfo or port component in the URI, because it is
impossible to represent a URI with no host when there is an authority
component.
Some URI schemes may throw an exception when setting the host to nil or the
empty string, and others when setting it to anything other than nil, if
those schemes require or disallow authority components.
- uri:init()
- This method is called internally to make a URI object belong to the right
class and do any scheme-specific validation an normalization. It is only
of interest if you want to write a new "uri" subclass for
particular types of URIs.
The implementation in the "uri" class itself changes the class of
the object to the one appropriate to the scheme (if there is a more
specific class available). It also removes the port number from the
authority component if it is unnecessary because the scheme defines it as
the default port. Finally, if there is a more specific class available it
calls the "init" method in that.
"init" is called after the URI has been split into components
according to the generic syntax, so it can use the accessor methods to get
at them. It should return the same values as "new", either the
new URI object (the object it was called on), or nil and an error
message.
- uri:is_relative()
- Returns true if this is a relative URI reference, false otherwise. All
relative URIs belong to the class "uri._relative". All the other
URI classes are for absolute URIs.
- uri:path([newvalue])
- Get or set the path component of the URI. Throws an exception if the new
value is not valid in the context of the rest of the URI.
local uri = assert(URI:new("http://example.com/foo"))
local old = uri:path("/bar/")
print(old) -- /foo
print(uri:path()) -- /bar/
When a new path value is supplied, it can already be percent encoded, but
any characters which aren't allowed are encoded as well. Percent
characters are not encoded themselves, because they are assumed to be part
of the existing encoding. The existing percent encoding is normalized, and
any invalid encoding will cause an exception.
There are certain paths which cannot be expressed in the URI syntax. A path
which does not start with a "/" character (unless it's
completely empty) cannot be represented when there is an authority
component, so this will cause an exception to be thrown. A path which
starts with "//" when there is no authority component would be
misinterpreted, so the second slash is percent encoded.
Some URI schemes may impose further restrictions on what is allowed in a
path, so other path values may cause exceptions in certain cases.
- uri:port([newvalue])
- Get or set the port number in a URI. The value returned is always an
integer number or nil.
If "newvalue" is supplied it should be a non-negative integer
number, or a string containing only digits, or nil to remove any existing
port number. An exception is thrown if it is an invalid value, or if the
URI scheme doesn't allow port numbers to be specified. If there is
currently no authority part in the URI, then an empty host will be added
to create one.
If the port number is the default for a URI scheme (the same as the number
returned from the "default_port" method), then the
"port" method will return that number, but the number won't
actually be shown in the URI when it is represented as a string, because
it would be redundant. Setting the port number to nil has the same effect
as setting it to the default port number.
- uri:query([newvalue])
- Get or set the query part of a URI.
If "newvalue" is supplied it should be the new string, or nil to
remove any existing query part. The query part can be an empty string,
which is different from it not being present at all (the "?"
character will still be included to indicate that there is a query part,
even if it is not followed by anything else). Any characters which would
not be valid in a query part will be percent encoded, but any percent
encoding already done on the string will be left in place (not double
encoded).
The base-class implementation of this method never throws exceptions, but
some scheme-specific classes may throw exceptions if they impose
constraints on the syntax of query parts.
- uri:resolve(base)
- Given an object representing a relative URI, resolve it against the base
URI "base" (which can be a URI object or string) and update the
"uri" object to contain an absolute URI.
Has no effect if "uri" is already an absolute URI. Throws an
exception if "base" is not an absolute URI, or if the new URI
formed by combining them would be invalid for the given scheme.
See also the section Relative URIs and the
"uri:relativize(base)" method.
- uri:scheme([newvalue])
- Get and set the scheme of the URI. Altering the scheme of an existing URI
is very unlikely to be useful.
Throws an exception if "newvalue" is nil or not a valid scheme, or
if the rest of the URI is not valid when interpreted with the new scheme.
After calling this method the class of the object may have been changed,
if the old class is not appropriate for the new value.
- uri:relativize(base)
- If possible, update the absolute URI "uri" to contain a relative
URI which, when resolved again against "base", will yield the
original URI value. This doesn't return anything, just modifies the
object.
Has no effect if "uri" is already relative, or if there is no way
to create an appropriate relative URI (so the URI will remain absolute for
example if "base" has a different scheme from "uri").
Throws an exception if "base" is not absolute.
This method will never result in a network-path reference (a relative URI
which includes an authority part). In cases where that would be possible
the value in "uri" will be left as an absolute URI, which is
less likely to cause problems.
See also the section Relative URIs and the
"uri:resolve(base)" method.
- uri:uri([newvalue])
- Returns the URI value as a string. The return value is the same as you'll
get from "tostring(uri)".
If an argument is supplied, this replaces the URI in the "uri"
object with a different one. "newvalue" must be a complete new
URI or relative URI reference in a string, or a URI object.
This is equivalent to creating a new URI object by calling
"URI:new", except that instead of creating a new object the
existing object is updated with the new information. It is also not
possible to pass a base URI to the "uri" method.
Throws an exception if "newvalue" is nil or if there is any error
in parsing the new URI string. After calling this method the class of the
object may have been changed, if the old class is not appropriate for the
new value.
- uri:userinfo([newvalue])
- Get or set the userinfo part of the URI. If "newvalue" is
supplied then it is expected to be percent encoded already. Percent
encoding is normalized. An exception will be thrown if the new value is
invalid, or if the URI scheme does not allow a userinfo part (for example
if it is an HTTP URI). If there is currently no authority part in the URI,
then an empty host will be added to create one.
If "newvalue" is nil then any existing userinfo part is
removed.
URI schemes¶
The following Lua modules provide classes which implement extra validation and
normalization, or provide extra methods, for URIs which specific schemes:
- uri.data
- uri.file
- uri.ftp
- uri.http and uri.https
- uri.pop
- uri.rtsp and uri.rtspu
- uri.telnet
- uri.urn
Other modules¶
Other Lua modules provide additional functionality used in the library, or act
as base classes for the scheme-specific classes:
- uri._login
- Baseclass for URI schemes which use a username and password in their
userinfo part, separated by a colon (for example FTP).
- uri._util
- Utility functions used by the rest of the library. Contains useful
"uri_encode" and "uri_decode" functions which might be
useful elsewhere.
References¶
The parsing of URI syntax is based primarily on "RFC 3986".
Copyright¶
This software and documentation is Copyright (c) 2007 Geoff Richards
<geoff@geoffrichards.co.uk>. It is free software; you can redistribute
it and/or modify it under the terms of the Lua 5.0 license. The full
terms are given in the file
COPYRIGHT supplied with the source code
package, and are also available here: <
http://www.lua.org/license.html>
An older unreleased version of this library was created as a direct port of the
Perl URI library, by Gisle Aas and others. It has since been rewritten with a
somewhat different design.