NAME¶
goaccess - fast web log analyzer and interactive viewer.
SYNOPSIS¶
goaccess [-f input-file][-c][-r][-d][-m][-q][-o][-h][...]
DESCRIPTION¶
goaccess is a free (GPL) real-time web log analyzer and interactive
viewer that runs in a terminal in *nix systems. It provides fast and valuable
HTTP statistics for system administrators that require a visual server report
on the fly. GoAccess parses the specified web log file and outputs the data to
the X terminal. Features include:
- General Statistics:
- Number of valid requests, number of invalid requests, time to analyze the
data, unique visitors, unique requested files, unique static files (css,
ico, jpg, js, swf, gif, png) unique HTTP referrers (URLs), unique 404s
(not found), size of the parsed log file, bandwidth consumption.
- Unique visitors:
- HTTP requests having the same IP, same date and same agent will be
considered a unique visit. This includes crawlers.
- Requested files
- Hit totals are based on total requests. This module will display hits,
percent, bandwidth [time served], [protocol] and [method].
- Requested static files
- Hit totals are based on total requests. Includes files such as: jpg, css,
swf, js, gif, png etc. This module will display hits, percent, bandwidth,
[time served], [protocol] and [method].
- 404 or Not Found
- Hit totals are based on total requests. This module will display hits,
percent, bandwidth, [time served], [protocol] and [method].
- Hosts
- Hit totals are based on total requests. This module will display hits,
percent, [bandwidth, time served]. The expanded module can display extra
information such as reverse DNS and country. If -a is enabled, a list of
user agents will be displayed by selecting the IP and hitting the return
key.
- Operating Systems
- Hit totals are based on unique visitors. This module will display hits and
percent. The expanded module shows all available versions of the parent
node.
- Browsers
- Hit totals are based on unique visitors. This module will display hits and
percent. The expanded module shows all available versions of the parent
node.
- Referrers URLs
- The URL where the request came from. Hit totals are based on total
requests. This module will display hits and percent.
- Referring Sites
- This module will display only the host but not the whole URL. The URL
where the request came from. Hit totals are based on total requests. This
module will display hits and percent.
- Keyphrases
- This module will report keyphrases used on Google search, Google cache,
and Google translate. Hit totals are based on total requests. This module
will display hits and percent.
- Geo Location
- Determines where an IP address is geographically located. It outputs the
continent and country. If it's unable to determine the country, location
will be marked as unknown.
- HTTP Status Codes
- The values of the numeric status code to HTTP requests. Hit totals are
based on total requests. This module will display hits and percent.
STORAGE¶
There are three storage options that can be used with GoAccess. Choosing one
will depend on your environment and needs.
- GLib Hash Tables
- By default GoAccess uses GLib Hash Tables. If your dataset can fit in
memory, then this will perform fine. It has average memory usage and
pretty good performance. For better performance with memory trade-off see
Tokyo Cabinet on-memory hash database.
- Tokyo Cabinet On-Disk B+ Tree
- Use this storage method for large datasets where is not possible to fit
everything in memory. The B+ tree database is slower than any of the hash
databases since it has to hit the disk. However, using an SSD greatly
increases the performance. You may also use this storage method if you
need data persistence to quickly load statistics at a later date.
- Tokyo Cabinet On-Memory Hash Database
- Although this may vary across different systems, in general the on-memory
hash database should perform slightly better than GLib Hash Tables.
CONFIGURATION¶
Multiple options can be used to configure GoAccess. For a complete up-to-date
list of configure options, run
./configure --help
- --enable-debug
- Compile with debugging symbols and turn off compiler optimizations.
- --enable-utf8
- Compile with wide character support. Ncursesw is required.
- --enable-geoip
- Compile with GeoLocation support. MaxMind's GeoIP is required.
- --enable-tcb=<memhash|btree>
- Compile with Tokyo Cabinet storage support. memhash will utilize
Tokyo Cabinet's on-memory hash database. btree will utilize Tokyo
Cabinet's on-disk B+ Tree database.
- --disable-zlib
- Disable zlib compression on B+ Tree database.
- --disable-bzip
- Disable bzip2 compression on B+ Tree database.
OPTIONS¶
The following options can be supplied via the command line or the long options
through the configuration file.
- --date-format=<dateformat>
- The date_format variable followed by a space, specifies the log format
date containing any combination of regular characters and special format
specifiers. They all begin with a percentage (%) sign. See `man strftime`.
Note that there is no need to use time specifiers since they are not used by
GoAccess. It's recommended to use only date specifiers, i.e.,
%Y-%m-%d.
- --log-format=<logformat>
- The log_format variable followed by a space or \t for
tab-delimited, specifies the log format string.
Note that if there are spaces within the format, the string needs to be
enclosed in double quotes. Inner quotes need to be escaped.
- -c --config-dialog
- Prompt log/date configuration window on program start.
- --color-scheme<1|2>
- Choose among color schemes. 1 for the default grey scheme. 2
for the green scheme.
- --no-color
- Turn off colored output. This is the default output on terminals that do
not support colors.
- -f --log-file=<logfile>
- Specify the path to the input log file. If set in the config file, it will
take priority over -f from the command line.
- --debug-file=<debugfile>
- Send all debug messages to the specified file. Needs to be configured with
--enable-debug
- --config-file=<configfile>
- Specify a custom configuration file to use. If set, it will take priority
over the global configuration file (if any).
- --no-global-config
- Do not load the global configuration file. This directory should normally
be /usr/local/etc, unless specified with --sysconfdir=/dir.
- -e --exclude-ip=<IP|IP-range>
- Exclude one or multiple IPv4/6, includes IP ranges. i.e.,
192.168.0.1-192.168.0.10
- -a --agent-list
- Enable a list of user-agents by host. For faster parsing, do not enable
this flag.
- -M --http-method
- Include HTTP request method if found. This will create a request key
containing the request method + the actual request.
- -H --http-protocol
- Include HTTP request protocol if found. This will create a request key
containing the request protocol + the actual request.
- -q --no-query-string
- Ignore request's query string. i.e., www.google.com/page.htm?query =>
www.google.com/page.htm
- -r --no-term-resolver
- Disable IP resolver on terminal output.
- -o --output-format=<json|csv>
- Write output to stdout given one of the following formats: csv :
Comma-separated values (CSV) json : JSON (JavaScript Object
Notation)
- --real-os
- Display real OS names. e.g, Windows XP, Snow Leopard.
- --static-file=<extension>
- Add static file extension. e.g.: .mp3 Extensions are case
sensitive.
- --ignore-crawlers
- Ignore crawlers.
- --no-progress
- Disable progress metrics [total requests/requests per second].
- -m --with-mouse
- Enable mouse support on main dashboard.
- -d --with-output-resolver
- Enable IP resolver on HTML|JSON output.
- -g --std-geoip
- Standard GeoIP database for less memory usage.
- --geoip-city-data=<geocityfile>
- Specify path to GeoIP City database file. i.e., GeoLiteCity.dat. File
needs to be downloaded from maxmind.com.
- --keep-db-files
- Persist parsed data into disk. This should be set to the first dataset
prior to use `load-from-disk`. Setting it to false will delete all
database files when exiting the program.
Only if configured with --enable-tcb=btree
- --load-from-disk
- Load previously stored data from disk. Database files need to exist. See
keep-db-files.
Only if configured with --enable-tcb=btree
- --db-path=<dir>
- Path where the on-disk database files are stored. The default value is the
/tmp directory.
Only if configured with --enable-tcb=btree
- --xmmap=<num>
- Set the size in bytes of the extra mapped memory. The default value is 0.
Only if configured with --enable-tcb=btree
- --cache-lcnum=<num>
- Specifies the maximum number of leaf nodes to be cached. If it is not more
than 0, the default value is specified. The default value is 1024. Setting
a larger value will increase speed performance, however, memory
consumption will increase. Lower value will decrease memory consumption.
Only if configured with --enable-tcb=btree
- --cache-ncnum=<num>
- Specifies the maximum number of non-leaf nodes to be cached. If it is not
more than 0, the default value is specified. The default value is 512.
Only if configured with --enable-tcb=btree
- --tune-lmemb=<num>
- Specifies the number of members in each leaf page. If it is not more than
0, the default value is specified. The default value is 128.
Only if configured with --enable-tcb=btree
- --tune-nmemb=<num>
- Specifies the number of members in each non-leaf page. If it is not more
than 0, the default value is specified. The default value is 256.
Only if configured with --enable-tcb=btree
- --tune-bnum=<num>
- Specifies the number of elements of the bucket array. If it is not more
than 0, the default value is specified. The default value is 32749.
Suggested size of the bucket array is about from 1 to 4 times of the
number of all pages to be stored.
Only if configured with --enable-tcb=btree
- --compression=<zlib|bz2>
- Specifies that each page is compressed with ZLIB|BZ2 encoding.
Only if configured with --enable-tcb=btree
- -h --help
- The help.
- -V --version
- Display version information and exit.
- -s --storage
- Display current storage method. i.e., B+ Tree, Hash.
GoAccess can parse virtually any web log format.
Predefined options include, Common Log Format (CLF), Combined Log Format
(XLF/ELF), including virtual host, Amazon CloudFront (Download Distribution)
and W3C format (IIS).
GoAccess allows any custom format string as well.
There are two ways to configure the log format. The easiest is to run GoAccess
with
-c to prompt a configuration window. Otherwise, it can be
configured under ~/.goaccessrc.
- date_format
- The date_format variable followed by a space, specifies the log
format date containing any combination of regular characters and special
format specifiers. They all begin with a percentage (%) sign. See
http://linux.die.net/man/3/strftime
Note that there is no need to use time specifiers since they are not used by
GoAccess. It's recommended to use only date specifiers, i.e.,
%Y-%m-%d.
- log_format
- The log_format variable followed by a space or \t ,
specifies the log format string.
- %d
- date field matching the date_format variable.
- %h
- host (the client IP address, either IPv4 or IPv6)
- %r
- The request line from the client. This requires specific delimiters around
the request (as single quotes, double quotes, or anything else) to be
parsable. If not, we have to use a combination of special format
specifiers as %m %U %H.
- %m
- The request method.
- %U
- The URL path requested (including any query string).
- %H
- The request protocol.
- %s
- The status code that the server sends back to the client.
- %b
- The size of the object returned to the client.
- %R
- The "Referer" HTTP request header.
- %u
- The user-agent HTTP request header.
- %D
- The time taken to serve the request, in microseconds.
- %T
- The time taken to serve the request, in seconds or milliseconds.
Note: %D will take priority over %T if both are used.
- %^
- Ignore this field.
GoAccess
requires the following fields:
- %h a valid IPv4/6
- %d a valid date
- %s server status code
- %r the request
- F1 or h
- Main help.
- F5
- Redraw main window.
- q
- Quit the program, current window or collapse active module
- o or ENTER
- Expand selected module or open window
- 0-9 and Shift + 0
- Set selected module to active
- j
- Scroll down within expanded module
- k
- Scroll up within expanded module
- c
- Set or change scheme color.
- TAB
- Forward iteration of modules. Starts from current active module.
- SHIFT + TAB
- Backward iteration of modules. Starts from current active module.
- ^ f
- Scroll forward one screen within an active module.
- ^ b
- Scroll backward one screen within an active module.
- s
- Sort options for active module
- /
- Search across all modules (regex allowed)
- n
- Find the position of the next occurrence across all modules.
- g
- Move to the first item or top of screen.
- G
- Move to the last item or bottom of screen.
EXAMPLES¶
The simplest and fastest usage would be:
- # goaccess -f access.log
That will generate an interactive text-only output.
To generate full statistics we can run GoAccess as:
- # goaccess -f access.log -a
To generate an HTML report:
- # goaccess -f access.log -a > report.html
To generate a JSON file:
- # goaccess -f access.log -a -d -o json > report.json
To generate a CSV file:
- # goaccess -f access.log -o csv > report.csv
The
-a flag indicates that we want to process an agent-list for every
host parsed.
The
-d flag indicates that we want to enable the IP resolver on the HTML
| JSON output. (It will take longer time to output since it has to resolve all
queries.)
The
-c flag will prompt the date and log format configuration window.
Only when curses is initialized.
Now if we want to add more flexibility to GoAccess, we can do a series of pipes.
For instance:
If we would like to process all
access.log.*.gz we can do:
- # zcat access.log.*.gz | goaccess
OR
- # zcat -f access.log* | goaccess
Another useful pipe would be filtering dates out of the web log
The following will get all HTTP requests starting on 05/Dec/2010 until the end
of the file.
- # sed -n '/05\/Dec\/2010/,$ p' access.log | goaccess -a
If we want to parse only a certain time-frame from DATE a to DATE b, we can do:
- sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess -a
Note that this could take longer time to parse depending on the speed of
sed.
To exclude a list of virtual hosts you can do the following:
- grep -v "`cat exclude_vhost_list_file`" vhost_access.log |
goaccess
Also, it is worth pointing out that if we want to run GoAccess at lower
priority, we can run it as:
- # nice -n 19 goaccess -f access.log -a
and if you don't want to install it on your server, you can still run it from
your local machine:
- # ssh root@server 'cat /var/log/apache2/access.log' | goaccess -a
NOTES¶
For now, each active window has a total of 300 items. Eventually this will be
customizable.
Piping a log to GoAccess will disable the real-time functionality. This is due
to the portability issue on determining the actual size of STDIN. However, a
future release *might* include this feature.
BUGS¶
If you think you have found a bug, please send me an email to
goaccess@prosoftcorp.com or use the issue tracker in
https://github.com/allinurl/goaccess/issues
AUTHOR¶
Gerardo Orellana <goaccess@prosoftcorp.com> For more details about it, or
new releases, please visit
http://goaccess.prosoftcorp.com