v1.42 (2012-04-05)

Name¶

Xtables-addons — additional extensions for iptables, ip6tables, etc.

Targets¶

ACCOUNT¶

The ACCOUNT target is a high performance accounting system for large local networks. It allows per-IP accounting in whole prefixes of IPv4 addresses with size of up to /8 without the need to add individual accouting rule for each IP address.

The ACCOUNT is designed to be queried for data every second or at least every ten seconds. It is written as kernel module to handle high bandwidths without packet loss.

The largest possible subnet size is 24 bit, meaning for example 10.0.0.0/8 network. ACCOUNT uses fixed internal data structures which speeds up the processing of each packet. Furthermore, accounting data for one complete 192.168.1.X/24 network takes 4 KB of memory. Memory for 16 or 24 bit networks is only allocated when needed.

To optimize the kernel<->userspace data transfer a bit more, the kernel module only transfers information about IPs, where the src/dst packet counter is not 0. This saves precious kernel time.

There is no /proc interface as it would be too slow for continuous access. The read-and-flush query operation is the fastest, as no internal data snapshot needs to be created&copied for all data. Use the "read" operation without flush only for debugging purposes!

Usage:

ACCOUNT takes two mandatory parameters:

--addr network/netmask: where network/netmask is the subnet to account for, in CIDR syntax

--tname NAME: where NAME is the name of the table where the accounting information should be stored

The subnet 0.0.0.0/0 is a special case: all data are then stored in the src_bytes and src_packets structure of slot "0". This is useful if you want to account the overall traffic to/from your internet provider.

The data can be queried using the userspace libxt_ACCOUNT_cl library, and by the reference implementation to show usage of this library, the iptaccount(8) tool.

Here is an example of use:

iptables -A FORWARD -j ACCOUNT --addr 0.0.0.0/0 --tname all_outgoing; iptables -A FORWARD -j ACCOUNT --addr 192.168.1.0/24 --tname sales;

This creates two tables called "all_outgoing" and "sales" which can be queried using the userspace library/iptaccount tool.

Note that this target is non-terminating — the packet destined to it will continue traversing the chain in which it has been used.

Also note that once a table has been defined for specific CIDR address/netmask block, it can be referenced multiple times using -j ACCOUNT, provided that both the original table name and address/netmask block are specified.

For more information go to http://www.intra2net.com/en/developer/ipt_ACCOUNT/

CHAOS¶

Causes confusion on the other end by doing odd things with incoming packets. CHAOS will randomly reply (or not) with one of its configurable subtargets:

--delude: Use the REJECT and DELUDE targets as a base to do a sudden or deferred connection reset, fooling some network scanners to return non-deterministic (randomly open/closed) results, and in case it is deemed open, it is actually closed/filtered.

--tarpit: Use the REJECT and TARPIT target as a base to hold the connection until it times out. This consumes conntrack entries when connection tracking is loaded (which usually is on most machines), and routers inbetween you and the Internet may fail to do their connection tracking if they have to handle more connections than they can.

The randomness factor of not replying vs. replying can be set during load-time of the xt_CHAOS module or during runtime in /sys/modules/xt_CHAOS/parameters.

See http://jengelh.medozas.de/projects/chaostables/ for more information about CHAOS, DELUDE and lscan.

CHECKSUM¶

This target allows to selectively work around broken/old applications. It can only be used in the mangle table.

--checksum-fill: Compute and fill in the checksum in a packet that lacks a checksum. This is particularly useful, if you need to work around old applications such as dhcp clients, that do not work well with checksum offloads, but don't want to disable checksum offload in your device.

DELUDE¶

The DELUDE target will reply to a SYN packet with SYN-ACK, and to all other packets with an RST. This will terminate the connection much like REJECT, but network scanners doing TCP half-open discovery can be spoofed to make them belive the port is open rather than closed/filtered.

DHCPMAC¶

In conjunction with ebtables, DHCPMAC can be used to completely change all MAC addresses from and to a VMware-based virtual machine. This is needed because VMware does not allow to set a non-VMware MAC address before an operating system is booted (and the MAC be changed with `ip link set eth0 address aa:bb..`).

--set-mac aa:bb:cc:dd:ee:ff[/mask]: Replace the client host MAC address field in the DHCP message with the given MAC address. This option is mandatory. The mask parameter specifies the prefix length of bits to change.

EXAMPLE, replacing all addresses from one of VMware's assigned vendor IDs (00:50:56) addresses with something else:

iptables -t mangle -A FORWARD -p udp --dport 67 -m physdev --physdev-in vmnet1 -m dhcpmac --mac 00:50:56:00:00:00/24 -j DHCPMAC --set-mac ab:cd:ef:00:00:00/24

iptables -t mangle -A FORWARD -p udp --dport 68 -m physdev --physdev-out vmnet1 -m dhcpmac --mac ab:cd:ef:00:00:00/24 -j DHCPMAC --set-mac 00:50:56:00:00:00/24

(This assumes there is a bridge interface that has vmnet1 as a port. You will also need to add appropriate ebtables rules to change the MAC address of the Ethernet headers.)

DNETMAP¶

The DNETMAP target allows dynamic two-way 1:1 mapping of IPv4 subnets. Single rule can map private subnet to shorter public subnet creating and maintaining unambigeous private-public ip bindings. Second rule can be used to map new flows to private subnet according to maintained bindings. Target allows efficient public IPv4 space usage and unambigeous NAT at the same time.

Target can be used only in nat table in POSTROUTING or OUTPUT chains for SNAT and in PREROUTING for DNAT. Only flows directed to bound IPs will be DNATed. Packet continues chain traversal if there is no free postnat-ip to be assigned to prenat-ip. Default binding ttl is 10 minutes and can be changed using default_ttl module option. Default ip hash size is 256 and can be changed using hash_size module option.

--prefix addr/mask: Network subnet to map to. If not specified, all existing prefixes are used.

--reuse: Reuse entry for given prenat-ip from any prefix despite bindings ttl < 0.

--ttl seconds: Regenerate bindings ttl value to seconds. If negative value is specified, bindings ttl is kept unchanged. If not specified then default ttl value (600s) is used.

* /proc interface

Module creates following entries for each new specified subnet:

/proc/net/xt_DNETMAP/subnet_mask: Contains binding table for subnet/mask. Each line contains prenat-ip, postnat-ip,ttl (seconds till entry times out), lasthit (last entry hit in seconds relative to system boot time).

/proc/net/xt_DNETMAP/subnet_mask_stat: Contains statistics for given subnet/mask. Line contains contains three numerical values separated by spaces. First one is number of currently used addresses (bindings with negative ttl excluded), second one is number of all usable addresses in subnet and third one is mean ttl value for all active entries.

Entries are removed if the last iptables rule for a specific subnet is deleted.

* Logging

Module logs binding add/timeout events to klog. This behaviour can be disabled using disable_log module parameter.

* Examples

1. Map subnet 192.168.0.0/24 to subnets 20.0.0.0/26. SNAT only:

iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 20.0.0.0/26

Active hosts from 192.168.0.0/24 subnet are mapped to 20.0.0.0/26. If packet from not yet bound prenat-ip hits the rule and there are no free or timed-out (ttl<0) entries in prefix 20.0.0.0/28, then notice is logged to klog and chain traversal continues. If packet from already bound prenat-ip hits the rule, bindings ttl value is regenerated to default_ttl and SNAT is performed.

2. Use of --reuse and --ttl switches, multiple rule interaction:

iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 20.0.0.0/26 --reuse --ttl 200

iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 30.0.0.0/26

Active hosts from 192.168.0.0/24 subnet are mapped to 20.0.0.0/26 with ttl = 200 seconds. If there are no free addresses in first prefix the next one (30.0.0.0/26) is used with default ttl. It's important to note that the first rule SNATs all flows whose source IP is already actively (ttl>0) bound to ANY prefix. Parameter --reuse makes this functionality work even for inactive (ttl<0) entries.

If both subnets are exhaused, then chain traversal continues.

3. Map 192.168.0.0/24 to subnets 20.0.0.0/26 bidirectional way:

iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 20.0.0.0/26

iptables -t nat -A PREROUTING -j DNETMAP

If host 192.168.0.10 generates some traffic, it gets bound to first free IP in subnet - 20.0.0.0. Now any traffic directed to 20.0.0.0 gets DNATed to 192.168.0.10 as long as there's an active (ttl>0) binding. There's no need to specify --prefix parameter in PREROUTING rule, because this way it DNATs traffic to all active prefixes. You could specify prefix it you'd like to make DNAT work for specific prefix only.

ECHO¶

The ECHO target will send back all packets it received. It serves as an examples for an Xtables target.

ECHO takes no options.

IPMARK¶

Allows you to mark a received packet basing on its IP address. This can replace many mangle/mark entries with only one, if you use firewall based classifier.

This target is to be used inside the mangle table.

--addr {src|dst}: Select source or destination IP address as a basis for the mark.

--and-mask mask: Perform bitwise AND on the IP address and this bitmask.

--or-mask mask: Perform bitwise OR on the IP address and this bitmask.

--shift value: Shift addresses to the right by the given number of bits before taking it as a mark. (This is done before ANDing or ORing it.) This option is needed to select part of an IPv6 address, because marks are only 32 bits in size.

The order of IP address bytes is reversed to meet "human order of bytes": 192.168.0.1 is 0xc0a80001. At first the "AND" operation is performed, then "OR".

Examples:

We create a queue for each user, the queue number is adequate to the IP address of the user, e.g.: all packets going to/from 192.168.5.2 are directed to 1:0502 queue, 192.168.5.12 -> 1:050c etc.

We have one classifier rule:

: tc filter add dev eth3 parent 1:0 protocol ip fw

Earlier we had many rules just like below:

: iptables -t mangle -A POSTROUTING -o eth3 -d 192.168.5.2 -j MARK --set-mark 0x10502

: iptables -t mangle -A POSTROUTING -o eth3 -d 192.168.5.3 -j MARK --set-mark 0x10503

Using IPMARK target we can replace all the mangle/mark rules with only one:

: iptables -t mangle -A POSTROUTING -o eth3 -j IPMARK --addr dst --and-mask 0xffff --or-mask 0x10000

On the routers with hundreds of users there should be significant load decrease (e.g. twice).

(IPv6 example) If the source address is of the form 2001:db8:45:1d:20d:93ff:fe9b:e443 and the resulting mark should be 0x93ff, then a right-shift of 16 is needed first:

: -t mangle -A PREROUTING -s 2001:db8::/32 -j IPMARK --addr src --shift 16 --and-mask 0xFFFF

LOGMARK¶

The LOGMARK target will log packet and connection marks to syslog.

--log-level level: A logging level between 0 and 8 (inclusive).

--log-prefix string: Prefix log messages with the specified prefix; up to 29 bytes long, and useful for distinguishing messages in the logs.

RAWDNAT¶

The RAWDNAT target will rewrite the destination address in the IP header, much like the NETMAP target.

--to-destination addr[/mask]: Network address to map to. The resulting address will be constructed the following way: All 'one' bits in the mask are filled in from the new address. All bits that are zero in the mask are filled in from the original address.

See the RAWSNAT help entry for examples and constraints.

RAWSNAT¶

The RAWSNAT and RAWDNAT targets provide stateless network address translation.

The RAWSNAT target will rewrite the source address in the IP header, much like the NETMAP target. RAWSNAT (and RAWDNAT) may only be used in the raw or rawpost tables, but can be used in all chains, which makes it possible to change the source address either when the packet enters the machine or when it leaves it. The reason for this table constraint is that RAWNAT must happen outside of connection tracking.

--to-source addr[/mask]: Network address to map to. The resulting address will be constructed the following way: All 'one' bits in the mask are filled in from the new address. All bits that are zero in the mask are filled in from the original address.

As an example, changing the destination for packets forwarded from an internal LAN to the internet:

: -t raw -A PREROUTING -i lan0 -d 212.201.100.135 -j RAWDNAT --to-destination 199.181.132.250; -t rawpost -A POSTROUTING -o lan0 -s 199.181.132.250 -j RAWSNAT --to-source 212.201.100.135;

Note that changing addresses may influence the route selection! Specifically, it statically NATs packets, not connections, like the normal DNAT/SNAT targets would do. Also note that it can transform already-NATed connections — as said, it is completely external to Netfilter's connection tracking/NAT.

If the machine itself generates packets that are to be rawnat'ed, you need a rule in the OUTPUT chain instead, just like you would with the stateful NAT targets.

It may be necessary that in doing so, you also need an extra RAWSNAT rule, to override the automatic source address selection that the routing code does before passing packets to iptables. If the connecting socket has not been explicitly bound to an address, as is the common mode of operation, the address that will be chosen is the primary address of the device through which the packet would be routed with its initial destination address - the address as seen before any RAWNAT takes place.

STEAL¶

Like the DROP target, but does not throw an error like DROP when used in the OUTPUT chain.

SYSRQ¶

The SYSRQ target allows to remotely trigger sysrq on the local machine over the network. This can be useful when vital parts of the machine hang, for example an oops in a filesystem causing locks to be not released and processes to get stuck as a result — if still possible, use /proc/sysrq-trigger. Even when processes are stuck, interrupts are likely to be still processed, and as such, sysrq can be triggered through incoming network packets.

The xt_SYSRQ implementation uses a salted hash and a sequence number to prevent network sniffers from either guessing the password or replaying earlier requests. The initial sequence number comes from the time of day so you will have a small window of vulnerability should time go backwards at a reboot. However, the file /sys/module/xt_SYSREQ/seqno can be used to both query and update the current sequence number. Also, you should limit as to who can issue commands using -s and/or -m mac, and also that the destination is correct using -d (to protect against potential broadcast packets), noting that it is still short of MAC/IP spoofing:

: -A INPUT -s 10.10.25.1 -m mac --mac-source aa:bb:cc:dd:ee:ff -d 10.10.25.7 -p udp --dport 9 -j SYSRQ

: (with IPsec) -A INPUT -s 10.10.25.1 -d 10.10.25.7 -m policy --dir in --pol ipsec --proto esp --tunnel-src 10.10.25.1 --tunnel-dst 10.10.25.7 -p udp --dport 9 -j SYSRQ

You should also limit the rate at which connections can be received to limit the CPU time taken by illegal requests, for example:

: -A INPUT -s 10.10.25.1 -m mac --mac-source aa:bb:cc:dd:ee:ff -d 10.10.25.7 -p udp --dport 9 -m limit --limit 5/minute -j SYSRQ

This extension does not take any options. The -p udp options are required.

The SYSRQ password can be changed through /sys/module/xt_SYSRQ/parameters/password, for example:

: echo -n "password" >/sys/module/xt_SYSRQ/parameters/password

The module will not respond to sysrq requests until a password has been set.

Alternatively, the password may be specified at modprobe time, but this is insecure as people can possible see it through ps(1). You can use an option line in e.g. /etc/modprobe.d/xt_sysrq if it is properly guarded, that is, only readable by root.

: options xt_SYSRQ password=cookies

The hash algorithm can also be specified as a module option, for example, to use SHA-256 instead of the default SHA-1:

: options xt_SYSRQ hash=sha256

The xt_SYSRQ module is normally silent unless a successful request is received, but the debug module parameter can be used to find exactly why a seemingly correct request is not being processed.

To trigger SYSRQ from a remote host, just use socat:

sysrq_key="s"  # the SysRq key(s)
password="password"
seqno="$(date +%s)"
salt="$(dd bs=12 count=1 if=/dev/urandom 2>/dev/null |
    openssl enc -base64)"
ipaddr=10.10.25.7
req="$sysrq_key,$seqno,$salt"
req="$req,$(echo -n "$req,$ipaddr,$password" | sha1sum | cut -c1-40)"

echo "$req" | socat stdin udp-sendto:$ipaddr:9

See the Linux docs for possible sysrq keys. Important ones are: re(b)oot, power(o)ff, (s)ync filesystems, (u)mount and remount readonly. More than one sysrq key can be used at once, but bear in mind that, for example, a sync may not complete before a subsequent reboot or poweroff.

An IPv4 address should have no leading zeros, an IPv6 address should be in the form recommended by RFC 5952. The debug option will log the correct form of the address.

The hashing scheme should be enough to prevent mis-use of SYSRQ in many environments, but it is not perfect: take reasonable precautions to protect your machines.

TARPIT¶

Captures and holds incoming TCP connections using no local per-connection resources.

TARPIT only works at the TCP level, and is totally application agnostic. This module will answer a TCP request and play along like a listening server, but aside from sending an ACK or RST, no data is sent. Incoming packets are ignored and dropped. The attacker will terminate the session eventually. This module allows the initial packets of an attack to be captured by other software for inspection. In most cases this is sufficient to determine the nature of the attack.

This offers similar functionality to LaBrea <http://www.hackbusters.net/LaBrea/> but does not require dedicated hardware or IPs. Any TCP port that you would normally DROP or REJECT can instead become a tarpit.

--tarpit: This mode completes a connection with the attacker but limits the window size to 0, thus keeping the attacker waiting long periods of time. While he is maintaining state of the connection and trying to continue every 60-240 seconds, we keep none, so it is very lightweight. Attempts to close the connection are ignored, forcing the remote side to time out the connection in 12-24 minutes. This mode is the default.

--honeypot: This mode completes a connection with the attacker, but signals a normal window size, so that the remote side will attempt to send data, often with some very nasty exploit attempts. We can capture these packets for decoding and further analysis. The module does not send any data, so if the remote expects an application level response, the game is up.

--reset: This mode is handy because we can send an inline RST (reset). It has no other function.

To tarpit connections to TCP port 80 destined for the current machine:

: -A INPUT -p tcp -m tcp --dport 80 -j TARPIT

To significantly slow down Code Red/Nimda-style scans of unused address space, forward unused ip addresses to a Linux box not acting as a router (e.g. "ip route 10.0.0.0 255.0.0.0 ip.of.linux.box" on a Cisco), enable IP forwarding on the Linux box, and add:

: -A FORWARD -p tcp -j TARPIT

: -A FORWARD -j DROP

NOTE: If you use the conntrack module while you are using TARPIT, you should also use unset tracking on the packet, or the kernel will unnecessarily allocate resources for each TARPITted connection. To TARPIT incoming connections to the standard IRC port while using conntrack, you could:

: -t raw -A PREROUTING -p tcp --dport 6667 -j CT --notrack

: -A INPUT -p tcp --dport 6667 -j NFLOG

: -A INPUT -p tcp --dport 6667 -j TARPIT

TEE¶

The TEE target will clone a packet and redirect this clone to another machine on the local network segment. In other words, the nexthop must be the target, or you will have to configure the nexthop to forward it further if so desired.

--gateway ipaddr: Send the cloned packet to the host reachable at the given IP address. Use of 0.0.0.0 (for IPv4 packets) or :: (IPv6) is invalid.

To forward all incoming traffic on eth0 to an Network Layer logging box:

-t mangle -A PREROUTING -i eth0 -j TEE --gateway 2001:db8::1