|
UML as a honeypot
A honeypot is a sacrificial system that's (usually) made vulnerable,
and put on the net for nasty people to break into. A properly
constructed honeypot is put on a network which closely monitors the
traffic to and from the honeypot. This data can be used for a variety
of purposes
-
Forensics - analyzing new attacks and exploits
-
Trend analysis - look for changes over time of types of attacks,
techniques, etc
-
Identification - track the bad guys back to their home machines to
figure out who they are
-
Sociology - learn about the bad guys as a group by snooping on email,
IRC traffic, etc which happens to traverse the honeypot
Traditionally, honeypots have been physical systems on a dedicated
network which also contains multiple machines for monitoring the
honeypot and collecting logs from it. This is a huge logistical pain
and prevented honeypots from becoming a common network security tool.
The advent of virtual machines such as UML has made setting up
honeypots far easier. Instead of a set of physical machines, the
honeypot is now a virtual machine with the host filtering and
monitoring network traffic and collecting logs. Even better, one host
can have multiple honeypots running on it, those honeypots can be
configured in a realistic virtual network, and they can be distributed
on a CD rather than a truck.
For more information about honeypots (and honeynets) in general, see
honeynet.org.
Because of the interest from the honeypot people in UML (and support
from Dartmouth
ISTS), a number of features have been added to UML in order to
make it more useful as a honeypot.
These include
-
tty logging - secure logging of all UML tty traffic to the host
-
hppfs - a UML filesystem which allows entries in the UML /proc to be
arbitrarily rewritten from the host, making it possible to make the
UML pretend to be a physical box
-
skas mode - UML can operate in a mode which creates process address
spaces which are identical to the host
A problem with physical honeypots is that it's hard to capture
keystrokes. If the bad guys are using ssh to reach the honeypot (and
they probably are since they tend to be very security-conscious),
sniffing the network doesn't help since that traffic is encrypted.
So, you need to capture keystrokes by running something on the
honeypot. This is problematic since you have to assume that it has
been thoroughly compromised, so the logging mechanism may also have
been compromised.
There are various kernel patches and other kludges (such as an
instrumented bash) to implement tty logging on physical honeypots.
They all suffer from the problem that they can be subverted or
disabled if their presence becomes known to the intruder.
UML solves this problem with a patch to the tty driver which logs all
traffic through tty devices out to the host. In contrast to the
physical honeypot logging mechanisms, this is undetectable and
unsubvertable. It causes no network traffic or anything else which
can be detected from within the honeypot. It's also in the UML
kernel, which means it can't be defeated by anything the intruder
might do.
For more information on UML's tty logging, see
this page.
The single largest weakness of UML as a honeypot is that it is easy to
for a user to tell that it is a UML rather than a physical machine.
The best sources of information that can be used to distinguish UML
from a physical honeypot is /proc. Files such as /proc/mounts,
/proc/interruppts, and /proc/cmdline contain very UML-specific
information.
To lessen the danger of a UML honeypot being fingerprinted by an
intruder in this way, the UML /proc can be covered over by a special
filesystem which allows /proc entries to be changed from the host.
The UML hppfs ("HoneyPot ProcFS") filesystem allows /proc entries to
be added or deleted, and for existing entries to be replaced or
modified.
See this page for
information on setting up and customizing a fake /proc using hppfs.
A major weakness of UML in security-related applications, including
honeypots, has been its overall architecture. UML has loaded itself
into the top .5G of its process' address spaces, leaving the remainder
to the process. Thus, the UML kernel, including all of its data, is
accessible to UML processes. And, by default, that data is also
writeable by UML processes. For any security-related application, for
which UML must be a secure jail for root, the fact that the data can
be written is a big problem. For honeypots, the mere fact that the
UML kernel is visible, is a big problem. An intruder can test whether
a honeypot is a UML simply by looking at the top of its address space.
To address this problem, UML was recently reworked to allow it to run
in a mode in which the UML kernel is in a totally separate host
address space from its processes. This makes the UML kernel binary
and data totally invisible to its processes, and to anyone logged in
to it. It also makes UML kernel data secure from tampering by its
processes.
This new mode (called "skas", for "Separate Kernel Address Space" -
the old mode is retrospectively called "tt", for "Tracing Thread")
therefore makes a far better honeypot than tt mode. The one
disadvantage is that it requires a patch to the host kernel in order
to run. This patch is available as the latest host-skas patch from
here.
UML must have CONFIG_MODE_SKAS enabled. It will check for the
presence of the patch on the host and use skas mode if possible. If
you see the following messages at the start of the boot log, UML is
running in skas mode:
Checking for the skas3 patch in the host...found
Checking for /proc/mm...found
As a side benefit, you will notice that skas mode, which is secure, is
noticably faster than tt mode, which is not. Furthermore, skas mode
is roughly an order of magnitude faster than tt's "jail" mode, which
is how you previously needed to get the same level of security offered
by skas mode.
More information on skas mode is available
here.
|