|
The UML roadmap
Here, I describe my view of the future of UML. It is almost always
seen as a standard virtual machine technology, since that is the only
way it can be currently used, but, because of its design as a virtual
OS rather than a virtual machine, it is a general-purpose
virtualization technology which can be put to a much wider array of
uses.
I'm dividing the descriptions below roughly by their timescale,
starting with ongoing stuff which will continue for the forseeable
future, and ending with things which are so far into the future that
they depend on enabling technologies which don't yet exist for Linux.
Work that has been happening, and will continue to happen for the
foreseeable future include:
-
bug fixes
-
functional enhancements - Small features will go in from ocassionally
as time permits to implement them, or as they are implemented by other
people who send me patches. Examples of this include:
-
Finishing the hotplug support, including adding CPU and memory hotplug
-
Integrating
sysemu
support, and pushing the host patch to Linus
-
Adding support for valgrind, including describing the memory
allocators so it can track kernel memory usage
-
keeping up with 2.4, 2.6, and -mm - UML will be updated to the latest
2.4 and 2.6 trees, with 2.4 maybe being dropped when it's clear that
the world has moved on to 2.6. I will pick up 2.7/2.8 when they
start. I'm also tracking the -mm kernels as a staging area for stuff
that destined for Linus.
-
Performance improvements - Profiling both the host and UML, looking
for ways to make UML more efficient as a guest and to make the host a
better host.
externfs is the result of restructuring hostfs to better support
mounting arbitrary outside resources as UML filesystems. Currently,
there are two users of it, hostfs and humfs, both of which import host
directory hierarchies as UML filesystems. However, externfs allows
anything which can implement a standard set of file operations to be
mounted as a UML filesystem.
For example, a SQL database could be imported as a filesystem into a
UML. The sqlfs plug-in to externfs would define the mapping from
tables, rows, and columns to UML directories and files. Reads from
these files would trigger SQL lookups, and writes to them would
trigger updates. Since the data is stored in the database, the
database and its indexes will always be up to date.
Also, A special file (or directory) could be made
available which would allow SQL access to the underlying database.
So, if /mnt is a sqlfs mount,
ls /mnt/sql/"select * from table where foo=3"
could list the set of rows matching the select criteria, and changing
to that directory and looking at the files in it could show the data
in those rows.
I see this particular case as being more of a cute trick than
something that will be generally useful. However, a special case
could be very useful. If a Linux filesystem were stored in the
database, UML could boot from the sqlfs filesystem, and then the raw
SQL access would start being useful. Assuming that file metadata was
stored reasonably in the database, you could do queries such as
ls /sql/"select name from files where setuid=1"
ls /sql/"select name from files where modified=today"
ls /sql/"select name from files where created=today
and owner="root"
These have obvious security uses, and can be done casually and
quickly, since the file metadata is indexed in the underlying
database, rather than having to run a find over the entire
filesystem.
This idea could be extended to mounting specialized databases
containing certain types of data. For example, a UML user could store
email in a filesystem which is backed by a glimpse database on the
host. This would allow rapid searching of email using whatever
searching capabilities the filesystem makes available inside UML.
For some time, I have been planning to do away with the double-caching
that UML causes on the host. When UML reads data, it does so in a way
that the data is stored in the host's page cache. That data also ends
up separately in the UML's page cache, so that there are two copies of
the data in the host's memory, which is an obvious waste. My solution
to this is two-fold:
-
use O_DIRECT IO on private data
-
mmap shared data into the UML page cache
Using O_DIRECT IO bypasses the host's page cache and causes the data
to be read (or written) directly into (or from) the UML's page cache.
For data which is private to the UML, this is perfect. However, if
there's shared data, such as a backing file with multiple COW files,
this will create one copy of the data in each UML which uses it. The
host copy is eliminated, which is an improvement, but it's possible to
do better in this case.
For shared data, the solution is to let the data go through the host's
page cache, and have the UMLs mmap it from there. This eliminates the
copying since all the UMLs have the same page of memory mapped into
their page caches.
This could obviously be done in the case of private data as well, but
mmap is expensive - mapping a page seems to be somewhat slower than
copying it. So, using mmap in the case of private data would just
slow it down. Of course, this is true of shared data as well, but
there are offsetting gains in the form of memory reductions, and
speedups for the second and later UMLs which access the data.
A related but separate project is the use of the AIO support in 2.6
hosts to improve UML's IO performance. humfs has AIO support, and
will use it if it's present on the host. This needs to be extended to
include the ubd driver and hostfs.
ports
swsusp
It is possible to configure UML to run inside the host kernel rather
than in userspace. This involves making it use internal kernel calls
rather than the libc system call wrappers. This would turn UML into a
kernel module.
The most obvious advantage to doing this is speed. It would eliminate
a lot of the context switching that UML currently has to do. In
particular, it could greatly reduce the UML system call overhead.
Personally, I don't favor moving UML into the kernel just for
performance - there are much better reasons for doing it. My goal is
to get UML's userspace performance close enough to the hardware that
it would require measurement to tell the difference.
That being said, comparing the performance of a kernel UML instance
and a userspace instance would indicate where the overhead is, and
possibly point at new interfaces which would allow a userspace
instance to achieve performance closer to a kernel instance.
There are better reasons for loading UML into the kernel:
-
Direct hardware access
-
Resource control
Giving UML the ability to directly control hardware would allow
partitioning the server between UML instances. In the most extreme
case, the host kernel would be not much more than a hypervisor which
just starts the UML instances, gives them hardware, and gets out of
the way. This would be a sort of soft partitioning, which wouldn't
require hardware partitioning support, but would also not provide the
protection offered by hardware support.
So, there's the full range of configurations, from fully virtualized
hardware, as with a default UML instance in the kernel, to fully
physical hardware, as just described. Also, any combination of
physical and virtual hardware. So, two instances could each be given
half of the physical memory (minus whatever the host needs), and half
the disks, but if there's only one physical NIC, then the host would
retain control of that, and provide each instance a virtual NIC in the
form of a TUN/TAP device.
Once UML can be loaded into the kernel, the next step will be to break
it into pieces, so that a subsystem can be loaded by itself, without
the rest of UML. These guest subsystems can be used as compartments
in which processes can be restricted to whatever resources were
provided to the compartment.
For example, loading a guest VM system into the kernel and giving it
512M of memory to manage, and then sticking some processes in it will
restrict those processes to that 512M. When they start using up that
memory, the guest VM system will start swapping or dropping clean
pages in order to free memory within the compartment. It will do even
if there is plenty of free memory on the system outside the
compartment.
Similarly, a guest scheduler can be used to restrict the CPU usage of
a set of processes, and a guest network stack can be used to restrict
their network bandwidth.
Going back to userspace, the next step is to make UML configurable as
a captive virtual machine. By this, I mean UML would be linked into
an application, which would gain an internal virtual machine for its
own use.
There are a number of possible uses for a captive UML:
-
a general-purpose internal programming environment
-
a platform for managing a service from the "inside"
-
an internal authentication and authorization service
I'll use Apache to illustrate these, but they also apply to other
services and applications. Apache already has internal scripting
environments in the form of mod_perl and mod_python, which embed Perl
and Python interpreters inside Apache, and probably others. An
embedded UML could be used for the same purpose by installing the
normal Perl or Python environments inside it, and running the scripts
there, rather than inside Apache itself.
This handles the actual execution of the scripts, but we need a way of
communicating between Apache and the scripts inside the UML. Apache
has to pass HTTP requests in, and receive HTTP responses back. One
way of doing this is with an Apache-specific (call it apachefs)
filesystem which is mounted inside the UML. Reads, writes, and other
file operations within this filesystem would be implemented by Apache.
So, one of the files would be read by the script inside the UML, and
whenever an HTTP request came in, it would be parsed, and sent to the
script via this special file. The script would block reading this
file until a request came in for it to handle. When it had the
results, it would write them back to Apache through a similar (and
possibly the same) file.
This has a bunch of advantages over the current mod_perl/mod_python
arrangement:
-
The scripts are jailed, so they can't access the host, and their
resource consumption is restricted. This is a major advantage, since
anyone who wants to have a dynamically generated web site needs a
dedicated server for it. The less expensive shared server
arrangements offer, at best, cgi. This would allow dynamic HTML from
within a shared server. There would be a UML instance for each such
web site, but they could be a fairly small ones, so they wouldn't
necessarily increase host resource consumption drastically.
-
The scripts can be written in any language and run in any environment
supported by Linux. There doesn't need to be an appropriate mod_foo
for Apache, just the ability for Linux to execute the script.
-
The scripts can be monitored, debugged, and modified from inside the
live server by anyone who can ssh in to the UML. Of course, doing
this on a production server is risky, but the capability would be
there when it's really needed.
Once there's a way of communicating HTTP requests and responses back
and forth between Apache and something running inside its captive UML
via apachefs, it's relatively easy to communicate other information
back and forth via other files within that filesystem. In particular,
it could be used as something analogous to procfs for the kernel.
procfs is a window on the kernel's internal state, and procfs files
can be used to read and modify that state. Similarly, apachefs could
be used as a window on Apache's internal state. Statistics would be
available from read-only files, and configuration variables would be
available from read-write files.
This would allow the Apache user to configure that web site from
within Apache. This would obviously need to be controlled so that
only variables relevant to a single web site could be configured from
within that site's captive UML. Doing this would relieve the host's
system manager from handling configuration changes.
If this was done on a server running multiple services, and all the
services had this "management from the inside" arrangement with a
captive UML, then the server's admin wouldn't normally need to deal
with the services. The managers of the services wouldn't even need
accounts on the server - all they need is an account on their service's
captive UML. So, this would be a nice security arrangement for the
server.
Finally, just as we can load UML subsystems into the kernel with
useful results, we can link UML subsystems into applications and get
something that's similarly useful. Consider linking a captive VM
system into an application, which then uses it for its memory
allocation. The biggest difference between this and malloc is that
the Linux VM system is restricted to using a certain amount of memory,
and won't go beyond that. In contrast, malloc will allocate as much
memory as the system will allow. When the Linux VM system starts
running out of free memory, it will start releasing memory that's not
needed any more. In the context of an application, that means that
its memory consumption is limited, and when it starts approaching that
limit, it will start swapping.
This can make the application much better behaved with respect to the
system. If it starts getting overloaded, it will slow down, but it
won't affect the rest of the system and whatever else is running on
it. If everything else on the system is similarly configured, then
the memory usage will be very predictable. There won't be any
possibility of one service going crazy and grabbing all the memory on
the system.
As another example, consider linking a filesystem into an application
which uses it to store its internal state. The filesystem would flush
this data out to storage on the host, and the consistency semantics
provided by the popular filesystems would guarantee a level of
consistency for this data in the event of an application crash. So,
it could resume after the crash more or less where it was when it
crashed. This would provide automatic checkpointing, so that it could
be shut down in a controlled way, and resumed later from where it left
off.
Application clustering and SMP with UML
|
As Linux is made to run well on increasingly large systems, its SMP
scalability is improving. Large multi-threaded applications have many
of the same problems as an OS kernel running on a large SMP system.
So, the efforts made to improve the SMP scalability of the Linux
kernel can benefit a multi-threaded application which links against
UML and uses whatever kernel facilities it can.
Similarly, some applications are going to need built-in clustering.
An example of this is Oracle clustering with its ocfs filesystem. As
the Linux kernel gains clustering abilities, they will automatically
become available to applications which wish to use them via UML. If
an application needs to distribute its data across many instances of
itself, it can link against whatever UML subsystems it needs -
minimally, a cluster filesystem such as gfs or ocfs and the network
stack. The instances will then store their data in the filesystem,
which will take care of the clustering.
|