NAME¶
code2html - Converts a program source code to HTML
SYNOPSIS¶
(1)
code2html [
options] [
input-file [
output-file]]
(2)
code2html -p [
file [
alternate-outfile]]
(3)
code2html (as a CGI script; see the section on
CGI)
DESCRIPTION¶
code2html is a perl script which converts a program source code to syntax
highlighted HTML, or any other format for wich rules are defined.
(1) OPTIONS¶
- input-file
- Is the file which contains the program source code to be formatted. If not
specified or a minus (-) is given, the code will be read from STDIN.
- output-file
- Is the file to write the formatted code to. If not specified or a minus
(-) is given, the code will be written to STDOUT.
- -l, --language-mode
- Specify the set of regular expressions to use. These have to be defined in
a language file (see FILES below). To find out which language modes
are defined, issue a code2html --modes.
- This input is treated case-insensitive.
- If not given, some heuristics will be used to determine the file language.
- -v, --verbose
- Prints progress information to STDERR.
- -n, --linenumbers
- Print out the source code with line numbers.
- -N, --linknumbers
- Print out the source code with line numbers. The linenumbers will link to
themselves, which makes it easy to send links to lines.
- -P, --prefix
- Optional prefix to use for line number anchors.
- -t, --replace-tabs[=TABSTOP-WIDTH]
- Replace each occurence of a <TAB> character with the right amount of
spaces to get to the next tabstop. Default is a tabstop width of 8
characters.
- -L, --language-file=LANGUAGE-FILE
- Specify an alternate file to take the language and output-format
definitions from (see the section on FILES below).
- -m, --modes
- Print all language modes and output-formats currently defined to STDOUT
and exit succesfully. Also prints modes from a LANGUAGE-FILE given
by --language-file if applicable.
- --fallback=LANG
- If the language mode given with --language-mode cannot be found
then use this mode.
- --fallback plain for instance is usefull when code2html is
called from a script to ensure output is created.
- -h, --help
- Print a short help and exit succesfully.
- -V, --version
- Print the program version and exit succesfully.
- -c, --content-type
- Prints ”Content-Type: text/html\n\n“ (or whatever the
output-format defines as a content-type) prior to the rest of the output.
Usefull if the script is ivoked as a cgi script.
- -o, --output-format
- Selects the output-format. html is the default. To find out which
outputformats are defined, issue a code2html --modes.
- -H, --no-header
- do not make use of the template defined by the output-format. For HTML
this means that there will be no <html>, <head>, and no
<typical for patch and CGI modes,pre> tags.
- --template=FILE
- overrides the default template for the given output format. If
--no-header is given too, this has no meaning, since the template
is ignored anyway.
- -T, --title
- Set the title of the produced output file. Only works if the template does
support setting the title.
- -w, --linewidth=LINEWIDTH
- Wrap lines after LINEWIDTH characters. Default is to not wrap lines
at all.
- -b, --linebreakprefix=LINEPREFIX
- Use fILINEPREFIX at the start of wrapped lines. Default is
"» ".
(2) HTML patching¶
code2html -p [
file [
alternate-outfile]]
code2html also allows you to have inline source code in an html file. It
can then take this html file and insert the syntax highlighted code.
If no file is given,
code2html reads from STDIN and writes to STDOUT. If
just one file is given it replaces this file with the output. If two files are
provided, the first one is read from and the second one written to.
To use this feature, just insert a like like this into your html file:
- <!-- code2html add [options] <file> -->
the syntax highlighted file will be inserted at this position enclosed in
<pre> tags.
All options that can be given on the command line like
--linenumbers etc.
work.
--help,
--version, etc. work too however it is not very
intelligent to use them :). Using
--output-format to choose a non-HTML
outputformat is not adviseable.
--content-type is ignored.
You may also write the program's source code directly in the html file with the
following syntax:
- <!-- code2html add [options]
<your program source code here>
-->
It is usually a good idea to at least give the
--language-mode option to
specify the language.
(3) CGI¶
If the the script is used as a CGI script (
GATEWAY_INTERFACE environment
set and no command line arguments given)
code2html reads the arguments
either from the query string or from SDTIN. (methods POST and GET).
--content-type is switched on automatically and the output always goes to
STDOUT.
The following parameters/options are accepted:
- language-mode - optional
- `c', `cc', `pas', etc.
- if not given, some heuristics are used to find out the language.
- fallback - optional
- `plain', `c', etc. if language-mode cannot be found, use this one
- input-selector - optional
- either `file', `cgi-input1', `cgi-input2', or `REDIRECT_URL'
- default: file
- filename
- file to read from if input-selector is `file'
- cgi-input1
- The source code to syntax highlight. For example from a <textarea>
or from a upload. See input-selector.
- cgi-input2
- The source code to syntax highlight. For example from a <textarea>
or from a upload. See input-selector.
- line-numbers - optional
- `yes', `no' or `link'
- default: no
- replace-tabs - optional
- If 0 then tabs are not replaced, else replace each occurence of a
<TAB> character with the right amount of spaces to get to the next
tabstop.
- default: 0
- title - optional
- Set's the title of the file.
- no-encoding - optional
- By default code2html tries to encode the output as either
bz2/ gz/Z if the client supports this (
HTTP_ACCEPT_ENCODING) and the needed program is available on the
server. You may need to modify @CGI_ENCODING in the script to match
your program locations.
- If no-encoding is defined as “true” code2html
does not try to encode the output.
Why two cgi-inputs you may ask: This is to allow your users to choose vie a
<form> interface whether they want to insert their file into a
<textarea> or user a <browse> button to select their file. See the
example on my home- page.
Note that if
$FILES_DISALLOWED_IN_CGI is 0 it is possbile for your users
to read all the files the httpd can read (if you don't run a cgi- wrapper or
something like this. By default this value is set to 1, so file reading via
cgi should not be allowed. You can allow it with setting
$FILES_DISALLOWED_IN_CGI to 0 at the top of the script.
The input selector
REDIRECT_URL needs a special explaination. The file
name is formed from the two enviroment variables
DOCUMENT_ROOT and
REDIRECT_URL.
If you want apache to automatically call
code2html for all program source
code files you may do this by adding these two lines to your srm.conf:
- AddHandler text/x-sourcecode .c .cc .cpp .pas .h .p
Action text/x-sourcecode
/cgi-bin/code2html?input-selector=REDIRECT_URL&foo=
or something similar to this. In the AddHandle line you can choose which
extensions to pass through
code2html.
WARNING: Do not add .pl to this line and name this script
“code2html.pl”. This will result in a loop.
Also make sure that you load the Action module (srm.conf).
Replace /cgi-bin/code2html with the virtual location under which the file can be
accessed. Note the “foo=” part. Apache appends the URL of the
file to display at the end of the action part. We do not need this since we
use the environment variable
REDIRECT_URL however we do not want to get
the url addes to the input-selector string. Therefore we append the
“&foo=” part.
Tnx to Kevin Burton <burton@relativity.yi.org> for the idea. He also
states that
> It is more powerfull if you use it in an Apache
> <Directory> tag
>
> <Directory /source>
>
> #with your Action tag here... this way you can
> #still have regular .java files on your server.
>
> </Directory>
>
EXAMPLE¶
assuming
code2html is in the current directory, you may type
code2html -l perl code2html.pl code2html.html
to convert the script into a html file.
FILES¶
Code2html looks for it's configuration in several places.
- •
- the file specified by -L or --language-file if any
- •
- the files specified in the evironment variable CODE2HTML_CONFIG,
seperated by colons
- •
- user's $HOME/.code2html.config
- •
- /etc/code2html.config
- •
- built in default languages
Entries in a file that is mentioned earlier in this list override rules from
later files.
The file structure must be valid perl code.
The global variables
%LANGUAGE and
%STYLESHEET are already
defined, so you should not redeclare them using “my”.
When you are looking for a model configuration to serve as a basis for your own
configuration file, it is probably best to start out by checking the built-in
definitions at the bottom of
code2html.
If your pattern includes back references like a lot patterns do in perl for
example, then you have to use \2 instead of \1, \3 instead of \2 and so on. I
really don't like this hack but it is a lot faster.
Example:
<<([^\n]*).*?^\2$
In this example the perl << stuff is matched, i.e. everything from a
<< until a line that consists of exactly the same string as behind the
<< was. The \2 references the matched chars in the parentenses.
If you ever write language specific rule files yourself, I'd be grateful if you
could send those to me, so I could make them available (with full credits of
course) on my homepage for anyone to grab, whenever some of those files suit
someone else's needs. Before you do so you might also have a look at my site
to check wheter someone has already written a rule file for your favourite
language.
NOTES¶
The language recognition mechanism relies on specific patterns within the file
name and the content of the processed file, such as file name extensions and
shebangs (#!). This means that if the input is a pipe or a socket, the file
name does not follow traditional naming conventions, or the content of the
processed file is incomplete, the input language name should be specified
using the
--language-mode command line parameter.
BUGS¶
Please report bugs to code2html@palfrader.org. This program is still a beta
release, so you should expect to find some.
Also have a look at my web-site, perhaps a new version is available already at
http://www.palfrader.org/code2html/.
AUTHOR¶
Peter Palfrader, <code2html@palfrader.org> A lot of other people. See
contributers in the file itself.
LICENSE¶
Copyright (c) 1999, 2000 by Peter Palfrader & others.
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the
“Software”, to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.