UML as a honeypot

Site Home Page
The UML Wiki
UML Community Site
The UML roadmap
What it's good for
Case Studies
Kernel Capabilities
Downloading it
Running it
Compiling
Installation
Skas Mode
Incremental Patches
Test Suite
Host memory use
Building filesystems
Troubles
User Contributions
Related Links
The ToDo list
Projects
Diary
Thanks
Contacts

Tutorials
The HOWTO (html)
The HOWTO (text)
Host file access
Device inputs
Sharing filesystems
Creating filesystems
Resizing filesystems
Virtual Networking
Management Console
Kernel Debugging
UML Honeypots
gprof and gcov
Running X
Diagnosing problems
Configuration
Installing Slackware
Porting UML
IO memory emulation
UML on 2G/2G hosts
Adding a UML system call
Running nested UMLs

How you can help
Overview
Documentation
Utilities
Kernel bugs
Kernel projects

Screenshots
A virtual network
An X session

Transcripts
A login session
A debugging session
Slackware installation

Reference
Kernel switches
Slackware README

Papers
ALS 2000 paper (html)
ALS 2000 paper (TeX)
ALS 2000 slides
LCA 2001 slides
OLS 2001 paper (html)
OLS 2001 paper (TeX)
ALS 2001 paper (html)
ALS 2001 paper (TeX)
UML security (html)
LCA 2002 (html)
WVU 2002 (html)
Security Roundtable (html)
OLS 2002 slides
LWE 2005 slides

Fun and Games
Kernel Hangman
Disaster of the Month

UML as a honeypot

What is a honeypot?

A honeypot is a sacrificial system that's (usually) made vulnerable, and put on the net for nasty people to break into. A properly constructed honeypot is put on a network which closely monitors the traffic to and from the honeypot. This data can be used for a variety of purposes

Forensics - analyzing new attacks and exploits

Trend analysis - look for changes over time of types of attacks, techniques, etc

Identification - track the bad guys back to their home machines to figure out who they are

Sociology - learn about the bad guys as a group by snooping on email, IRC traffic, etc which happens to traverse the honeypot

Traditionally, honeypots have been physical systems on a dedicated network which also contains multiple machines for monitoring the honeypot and collecting logs from it. This is a huge logistical pain and prevented honeypots from becoming a common network security tool.
The advent of virtual machines such as UML has made setting up honeypots far easier. Instead of a set of physical machines, the honeypot is now a virtual machine with the host filtering and monitoring network traffic and collecting logs. Even better, one host can have multiple honeypots running on it, those honeypots can be configured in a realistic virtual network, and they can be distributed on a CD rather than a truck.
For more information about honeypots (and honeynets) in general, see honeynet.org.

UML honeypot support

Because of the interest from the honeypot people in UML (and support from Dartmouth ISTS), a number of features have been added to UML in order to make it more useful as a honeypot.
These include

tty logging - secure logging of all UML tty traffic to the host

hppfs - a UML filesystem which allows entries in the UML /proc to be arbitrarily rewritten from the host, making it possible to make the UML pretend to be a physical box

skas mode - UML can operate in a mode which creates process address spaces which are identical to the host

tty logging

A problem with physical honeypots is that it's hard to capture keystrokes. If the bad guys are using ssh to reach the honeypot (and they probably are since they tend to be very security-conscious), sniffing the network doesn't help since that traffic is encrypted. So, you need to capture keystrokes by running something on the honeypot. This is problematic since you have to assume that it has been thoroughly compromised, so the logging mechanism may also have been compromised.
There are various kernel patches and other kludges (such as an instrumented bash) to implement tty logging on physical honeypots. They all suffer from the problem that they can be subverted or disabled if their presence becomes known to the intruder.
UML solves this problem with a patch to the tty driver which logs all traffic through tty devices out to the host. In contrast to the physical honeypot logging mechanisms, this is undetectable and unsubvertable. It causes no network traffic or anything else which can be detected from within the honeypot. It's also in the UML kernel, which means it can't be defeated by anything the intruder might do.
For more information on UML's tty logging, see this page.

hppfs

The single largest weakness of UML as a honeypot is that it is easy to for a user to tell that it is a UML rather than a physical machine. The best sources of information that can be used to distinguish UML from a physical honeypot is /proc. Files such as /proc/mounts, /proc/interruppts, and /proc/cmdline contain very UML-specific information.
To lessen the danger of a UML honeypot being fingerprinted by an intruder in this way, the UML /proc can be covered over by a special filesystem which allows /proc entries to be changed from the host. The UML hppfs ("HoneyPot ProcFS") filesystem allows /proc entries to be added or deleted, and for existing entries to be replaced or modified.
See this page for information on setting up and customizing a fake /proc using hppfs.

skas mode

A major weakness of UML in security-related applications, including honeypots, has been its overall architecture. UML has loaded itself into the top .5G of its process' address spaces, leaving the remainder to the process. Thus, the UML kernel, including all of its data, is accessible to UML processes. And, by default, that data is also writeable by UML processes. For any security-related application, for which UML must be a secure jail for root, the fact that the data can be written is a big problem. For honeypots, the mere fact that the UML kernel is visible, is a big problem. An intruder can test whether a honeypot is a UML simply by looking at the top of its address space.
To address this problem, UML was recently reworked to allow it to run in a mode in which the UML kernel is in a totally separate host address space from its processes. This makes the UML kernel binary and data totally invisible to its processes, and to anyone logged in to it. It also makes UML kernel data secure from tampering by its processes.
This new mode (called "skas", for "Separate Kernel Address Space" - the old mode is retrospectively called "tt", for "Tracing Thread") therefore makes a far better honeypot than tt mode. The one disadvantage is that it requires a patch to the host kernel in order to run. This patch is available as the latest host-skas patch from here.
UML must have CONFIG_MODE_SKAS enabled. It will check for the presence of the patch on the host and use skas mode if possible. If you see the following messages at the start of the boot log, UML is running in skas mode:
                
Checking for the skas3 patch in the host...found
Checking for /proc/mm...found

              
As a side benefit, you will notice that skas mode, which is secure, is noticably faster than tt mode, which is not. Furthermore, skas mode is roughly an order of magnitude faster than tt's "jail" mode, which is how you previously needed to get the same level of security offered by skas mode.
More information on skas mode is available here.

Hosted at