Current patches

Site Home Page
The UML Wiki
UML Community Site
The UML roadmap
What it's good for
Case Studies
Kernel Capabilities
Downloading it
Running it
Compiling
Installation
Skas Mode
Incremental Patches
Test Suite
Host memory use
Building filesystems
Troubles
User Contributions
Related Links
The ToDo list
Projects
Diary
Thanks
Contacts

Tutorials
The HOWTO (html)
The HOWTO (text)
Host file access
Device inputs
Sharing filesystems
Creating filesystems
Resizing filesystems
Virtual Networking
Management Console
Kernel Debugging
UML Honeypots
gprof and gcov
Running X
Diagnosing problems
Configuration
Installing Slackware
Porting UML
IO memory emulation
UML on 2G/2G hosts
Adding a UML system call
Running nested UMLs

How you can help
Overview
Documentation
Utilities
Kernel bugs
Kernel projects

Screenshots
A virtual network
An X session

Transcripts
A login session
A debugging session
Slackware installation

Reference
Kernel switches
Slackware README

Papers
ALS 2000 paper (html)
ALS 2000 paper (TeX)
ALS 2000 slides
LCA 2001 slides
OLS 2001 paper (html)
OLS 2001 paper (TeX)
ALS 2001 paper (html)
ALS 2001 paper (TeX)
UML security (html)
LCA 2002 (html)
WVU 2002 (html)
Security Roundtable (html)
OLS 2002 slides
LWE 2005 slides

Fun and Games
Kernel Hangman
Disaster of the Month

Current patches

The purpose of this page is to keep people better informed about ongoing work between UML releases by making the patches currently in my working pool visible to the public. This should alleviate several issues with UML development:

Not infrequently, someone finds a bug in UML, chases it down, and submits a patch, not knowing that the bug has already been fixed in my tree. Since my working tree isn't public until a release, there was no way for anyone to know that the bug was already fixed.
Also not infrequently, the fixes in my tree are incomplete or wrong in some other way. Having those patches available before the release gives UML developers a way to test and sanity-check the patches before they are released to the public.
Having the patches in a release split out makes it easier to fix new bugs by allowing users to back out patches until the bug disappears. Then we know which patch was responsible and can probably figure out the problem quickly. This also allows non-expert users to help track things down since the only expertise needed is the ability to run patch and build UML from source.

To this end, I've started using quilt to manage patches, and will publish the unreleased patches in my current tree here. It will be updated frequently so that there will only be a short window between me putting a patch in my tree and it appearing here.

So, here are the patches pending in my 2.4 and 2.6 trees. The version is the last public release of UML to which these are applied. They apply in order.

2.4.27-3um

Patches tarball : last modified - Wed Apr 26 17:56:03 EDT 2006

notes

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This removes some useless ioctls from the ubd driver.

ifup-flush

Last Changed - Thu Oct 13 18:19:23 EDT 2005

From Gerd Knorr - this avoids a network deadlock that can happen when the
host side of an interface is full when the UML interface is brought up.
In this case, SIGIOs will never be delivered since no new data is ever
queued to the host side.

build-cleanups

Last Changed - Thu Oct 13 18:19:23 EDT 2005

Keeping 2.4 in sync with 2.6 - this moves the linker scripts and main.c from
arch/um to arch/um/kernel.

sysemu

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This is Blaisorblade's sysemu patch for UML, cleaned up some, and with
support added for tt mode. This adds support to UML for Laurent Vivier's
context-switch-reducing sysemu patch.

crypto

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This pulls the crypto stuff into the UML config.

scheduler

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This fixes a use-after-free bug in the context switching. A process going
out of context after exiting wakes up the next process and then kills
itself. The problem is that when it gets around to killing itself is up to
the host and can happen a long time later, including after the incoming
process has freed its stack, and that memory is possibly being used for
something else.
The fix is to have the incoming process kill the exiting process just to
make sure it can't be running at the point that its stack is freed.

eintr

Last Changed - Thu Oct 13 18:19:23 EDT 2005

Add some more EINTR safety with some more uses of CATCH_EINTR.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>

no-mo-ghash

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This removes the much-hated ghash.h. physmem now makes do with an rbtree
instead.

need-bash

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This forces make to use bash rather than whatever /bin/sh is linked to.
There are some bash extensions used in the build (and maybe this needs
fixing) and when /bin/sh isn't bash, then the build fails mysteriously.

skas-flush-tlb

Last Changed - Thu Oct 13 18:19:23 EDT 2005

Do not flush the whole kernel page table instead of flushing only a range
in SKAS mode:
loop from start to end instead than to start_vm to end_vm. To test a lot,
since it could well be wrong, or some callers could be passing wrong
parameters (they were ignored!). Anyway, it seems that this is safe and
that most callers are in arch-independent code (i.e. correct one).
But actually I did not test modules well.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>

I eyeballed all the callers, and they seem to be doing the right
thing - jdike

tmp-exec

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This adds a check that /tmp is not mounted noexec. UML needs to be able
to do PROT_EXEC mmaps of temp files. Previously, a noexec /tmp would
cause an early mysterious UML crash.

2.4.27

Last Changed - Thu Oct 13 18:19:23 EDT 2005

This is the update to 2.4.27.
Upgrading to 2.4.27 is probably better done by applying uml-patch-2.4.27-1.

no-unit-at-a-time

Last Changed - Thu Oct 13 18:19:23 EDT 2005

From: <blaisorblade_spam@yahoo.it>

Avoid that gcc breaks UML with "unit at a time" compilation mode.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>

move-hostfs

Last Changed - Thu Oct 13 18:19:23 EDT 2005

To make patch backports from 2.6 easier, this patch moves hostfs to fs,
where it is in 2.6, from arch/um/fs.

kill-warnings

Last Changed - Thu Oct 13 18:19:23 EDT 2005

From: <blaisorblade_spam@yahoo.it>

Fixes some little warnings about "Defined but not used ..." by #ifdef'ing
things

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>

tkill

Last Changed - Thu Oct 13 18:19:23 EDT 2005

From: <blaisorblade_spam@yahoo.it>

Avoids compile failure when host misses tkill(), by simply using kill() in
that case.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>

disable-sysemu

Last Changed - Thu Oct 13 18:19:23 EDT 2005

From: <blaisorblade_spam@yahoo.it>

Adds the "nosysemu" command line parameter to disable SYSEMU

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>

proc-sysemu

Last Changed - Thu Oct 13 18:19:23 EDT 2005

From: <blaisorblade_spam@yahoo.it>

Adds /proc/sysemu to toggle SYSEMU usage.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>

sysemu-fixes

Last Changed - Thu Oct 13 18:19:23 EDT 2005

From: <blaisorblade_spam@yahoo.it>

- Correct some silly errors (dereferencing a pointer before checking if it's
!= NULL when creating /proc/sysemu, some error messages)

- separate using_sysemu from sysemu_supported (so to refuse to activate
sysemu if it is not supported, avoiding panics)

- not probe sysemu if in tt mode.

Signed-off-by: <blaisorblade_spam@yahoo.it>

stack-overflow

Last Changed - Thu Oct 13 18:19:23 EDT 2005

Remove the stack overflow check in the page fault handler, which was just
wrong, and could be triggered by a process.

bh-update

Last Changed - Thu Oct 13 18:19:23 EDT 2005

2.6.17-rc2

Patches tarball : last modified - Wed Apr 26 17:56:03 EDT 2006

fix-iomem

Last Changed - Thu Apr 13 13:25:07 EDT 2006

From "Victor V. Vengerov" <Victor.Vengerov@oktetlabs.ru>
We need to walk the region list properly.

jmpbuf

Last Changed - Tue Mar 21 11:55:01 EST 2006

Newer libcs don't define the JB_* jmp_buf access macros. If this is
the case, we provide values ourselves.

devshm

Last Changed - Tue Apr 25 12:00:07 EDT 2006

UML really wants shared memory semantics form its physical memory map file,
and the place for that is /dev/shm. So move the default, and fix the error
messages to recognize that this value can be overridden.

Signed-off-by: Rob Landley <rob@landley.net>

fix-ubd-lock1

Last Changed - Mon Apr 3 08:54:12 EDT 2006

I noticed ubd_lock being used to protect crazy amounts of code. This patch
gets rid of the worst offenders.

punctuation-fixes

Last Changed - Tue Mar 21 11:56:32 EST 2006

I'm inconsistent enough about using dashes or underscores for punctuation
within filenames. However, sometimes I use both within the same name, and
I got sick of this. So, this fixes those cases.

ubd-release-akpm

Last Changed - Mon Apr 3 08:55:44 EDT 2006

Define a release method for the ubd driver so that sysfs doesn't complain
when one is removed.

o-direct-field

Last Changed - Mon Apr 3 08:56:21 EDT 2006

This patch pulls the addition of the openflags.d field from externfs.
This will be merged with the o_direct patch when it is sent to mainline.

externfs-aio

Last Changed - Wed Mar 22 15:20:24 EST 2006

These are the AIO changes needed by the ubd driver and humfs.

externfs

Last Changed - Tue Apr 25 12:00:20 EDT 2006

This is the externfs/new hostfs/humfs patch. hostfs now seems to be stable.
The old hostfs will continue to exist until this one is as functional and
stable as it.

delete-hostfs

Last Changed - Tue Apr 25 12:00:22 EDT 2006

This deletes the old hostfs. This will be sent to mainline when the
externfs-based hostfs seems stable.

switch-pipe

Last Changed - Mon Apr 3 09:01:40 EDT 2006

This fixes the interface of make_pipe, which doesn't need to initialize
filehandles. Instead, it is just a wrapper around pipe which just reclaims
descriptors if the initial call to pipe failes with -EMFILE.

fork-not-clone

Last Changed - Mon Apr 3 09:01:44 EDT 2006

Convert the boot-time host ptrace testing from clone to fork. They were
essentially doing fork anyway. This cleans up the code a bit, and makes
valgrind a bit happier about grinding it.

fp-state

Last Changed - Mon Apr 3 09:28:26 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

This patch is for testing only!

It's a quick and dirty patch to verify, that wrong fp-context
saving and restore really are the cause of errors in "memtest.c"
It fixes the problem by:
- making have_fpx_regs accessible for other modules instead of
declaring it "static" in arch/um/os-Linux/sys-i386/registers.c
- Adding 1 to HOST_FP_SIZE to have room for the "status" (and magic)
field. Now skas.fp + skas.xfp combined have the size of
struct _fpstatus
- in arch/um/sys-i386/signal.c adding some code, that handles
the _fpstatus in sigcontext differently, depending on have_fpx_regs.
For (have_fpx_regs == 0), the _fpstatus simply is copied to/from
user from/to skas.fp. When writing to user, the status field is
created also.
For (have_fpx_regs == 1), when writing to user, the full _fpstatus
is created in skas.fp and skas.xfp, from the data found in skas.xfp.
Then it is copied to user. When reading, the full _fpstatus is read
to skas.fp and skas.xfp. Then, skas.xfp is reconstructed from this
data.

x11-fb

Last Changed - Tue Apr 25 12:00:28 EDT 2006

X11 framebuffer driver from Gerd Knorr.
You have to enable CONFIG_FB (UML-specific options/Graphics support/
Support for frabe buffer devices), disable CONFIG_VGA_CONSOLE
(UML-specific options/Graphics support/Console display driver support/
VGA text console), and enable Framebuffer Console support (in the same
place), plus some fonts. You also seem to have to put 'x11=<width>x<height>
on the command line.

logging

Last Changed - Mon Apr 3 09:05:12 EDT 2006

This is a little logger which dumps stuff out to a host file. Used for
tracking down otherwise intractable bugs.

fix-get_user_pages

Last Changed - Mon Apr 10 23:00:54 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

Fix of a wrong condition.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

add-stub-vmas.patch

Last Changed - Mon Apr 10 23:02:17 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

This patch adds vm_area structs for stub-code and stub-data.
So, stub-area is displayed in /proc/XXX/maps. Also, stub-pages
are accessible for debuggers via ptrace now.

Linux has a gate-vma concept, that unfortunately supports one
gate-vma only. Thus, there need to be done some changes in
mm/memory.c and fs/proc/task_mmu.c.
This patch avoids the mainline changes by using some dirty tricks.
So, the patch is for testing only, mainline should be changed
to support more than one gate-vma.

fix-jiffies.patch

Last Changed - Mon Apr 3 15:00:12 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

To support different subarches, UML must not use the same
address for jiffies and jiffies_64 in a hardcoded way.
I added JIFFIES_OFFSET to handle different arches. For
current arches, it is set to 0, for s390 it will be set to 4.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

insert-TIF_RESTART_SVC

Last Changed - Tue Mar 21 11:57:39 EST 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

s390 syscalls might be done by a "svc X" instruction (2 bytes
in size) or a "exec X,Y" instruction (4 bytes in size).
There is no way to read the size of the instruction via ptrace,
so UML/s390 can't do syscall-restarting by resetting instruction
pointer to the value before the syscall.
Also, in most cases syscall number is hardcoded in the "SVC X"
instruction, so there is no way to handle ERESTART_RESTARTBLOCK
correctly by *really* restarting the syscall.
s390 host has implemented TIF_RESTART_SVC-flag to handle the
latter case.
In UML we have to use TIF_RESTART_SVC for both cases.
This patch implements TIF_RESTART_SVC in UML.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

s390 syscalls might be done by a "svc X" instruction (2 bytes
in size) or a "exec X,Y" instruction (4 bytes in size).
There is no way to read the size of the instruction via ptrace,
so UML/s390 can't do syscall-restarting by resetting instruction
pointer to the value before the syscall.
Also, in most cases syscall number is hardcoded in the "SVC X"
instruction, so there is no way to handle ERESTART_RESTARTBLOCK
correctly by *really* restarting the syscall.
s390 host has implemented TIF_RESTART_SVC-flag to handle the
latter case.
In UML we have to use TIF_RESTART_SVC for both cases.
This patch implements TIF_RESTART_SVC in UML.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

390-syscall-restart

Last Changed - Mon Apr 3 09:07:25 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

s390 normally doesn't support a method, that allows us to force
the host to skip its syscall restart handling.
I implemented a new method in the host, which also is posted to
LKML to hopefully be inserted in s390 mainline.
To check availability of this change, I added a new check, which
is done in a slightly different way for the other arches, too.
Success in check_ptrace() and success in the new check are
absolutely necessary for UML to run in any mode.
So I changed the sequence of checks to:
1) check_ptrace() being called at startup very early
2) check_ptrace() calls the new check, too
3) can_do_skas() is called after check_ptrace()
check_ptrace() will never return, if it fails, but it now uses
printf() and exit() instead of panic().

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

fix-tt-USR1-handlers

Last Changed - Mon Apr 3 09:07:28 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

UML: change tt-mode's USR1 handlers to be subarch independent

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

add-PTRACE_AREA

Last Changed - Mon Apr 3 09:09:43 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

s390 doesn't have PTRACE_GETREGS and friends, but has
PTRACE_[PEEK|POKE]USR_AREA to let user of ptrace() read or write
struct user as he wants.
So we need to support this operation conditionally.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

execve1-add-SUBARCH_EXECVE1

Last Changed - Mon Apr 3 14:05:01 EDT 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

UML/s390 needs to reset FP control register in execve1, as
Linux s390 does.
I choose the way to use a macro, as this doesn't need any
changes in the other subarches.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

boot_timer_handler-no-prototye

Last Changed - Tue Mar 21 11:57:44 EST 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

On UML/s390, signal-handlers are defined to have 4 parameters,
while the standard definition for signal-handlers only has one.
So we must not have prototypes of the handlers using one param
only in headers, that are included in the source, that defines
the handlers.
Just that conflict occurs with the incremental patches for
2.6.12-rc3. arch/um/os-Linux/signal.c defines boot_timer_handler
to have 4 parameters (ARCH_SIGHDLR_PARAM), while
arch/um/include/kern_util.h holds a prototype with one int as
param only.
With the latest patches, signal.c includes kern_util.h
indirectly via os.h: build fails.
So, I simply remove the prototype from kern_util.h and give each
modules calling it a separate prototype.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

factor-handler-param

Last Changed - Mon Apr 3 09:09:53 EDT 2006

This puts the int sig back in the signal handler declarations. Before,
with it in ARCH_SIGHDLR_PARAM, there was a use of sig, but no visible
declaration.

stub-arch-optimization

Last Changed - Tue Mar 21 11:57:45 EST 2006

From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

In s390, fpregs are not reset in signal handlers.
Thus we may stop stub_segv_handler on s390 with a breakpoint instruction
instead of calling getpid, kill and sigreturn.
To make this run, we must not mask any signals in stub_segv_handler.

So I added conditional execution of set_handler in userspace_tramp
depending on ARCH_STUB_NO_SIGRETURN. If this macro isn't defined,
the code remains unchanged, else no signals for sa_mask are defined
and SA_NODEFER is added to flags.

Using the change, we also no longer need to care about correct stack
pointer for sigreturn, which would cause some nasty code on s390.

Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>

test_stub_kill

Last Changed - Tue Mar 21 11:57:46 EST 2006

A small stub optimization. I forget what the reasoning was, needs
more thought.

s390

Last Changed - Tue Mar 21 11:57:46 EST 2006

This patch adds s390 (31-bit) to UML.

SKAS0 and SKAS3 are tested a bit at least system boots and
shuts down correctly, network (tun/tap) works and we even could
start YaST on it.

Note:s
We use a host running SuSE SLES8 with a "private" kernel named
2.4.21-fsc.11, that contains special adaptions and drivers for
Fujitsu-Siemens mainframes. This means, our current SKAS3-patch
doesn't fit to vanilla kernels. We will create a reworked patch for
vanilla later (2.4 and 2.6).
Our 2.4 kernel also contains two fixes and one enhancement, that all
are essential to make UML run, even in SKAS0. The enhancement is to
support PT_TRACESYSGOOD, that generally is available in 2.6 kernels
but not in 2.4 for s390. So I would suggest to use 2.6 host for the
moment.
The fixes meanwhile are included into mainline. I don't know precisely
the first version containing them, in 2.6.13-rc5 they are present.
As those two patches are very small, they are inserted here as as
comment (AFAICS, its easy to do the changes by hand on older kernel
versions):

First patch to fix signal stack handling:
--- a/arch/s390/kernel/signal.c 2005-03-22 11:07:39.000000000 +0100
+++ b/arch/s390/kernel/signal.c 2005-03-22 11:08:44.000000000 +0100
@@ -285,7 +285,7 @@

/* This is the X/Open sanctioned signal stack switching. */
if (ka->sa.sa_flags & SA_ONSTACK) {
- if (! on_sig_stack(sp))
+ if (! sas_ss_flags(sp))
sp = current->sas_ss_sp + current->sas_ss_size;
}

Second patch to allow skipping of syscall restart:
--- a/arch/s390/kernel/ptrace.c 2005-05-07 07:20:31.000000000 +0200
+++ b/arch/s390/kernel/ptrace.c 2005-08-02 06:45:48.000000000 +0200
@@ -723,6 +761,13 @@
? 0x80 : 0));

/*
+ * If the debugger has set an invalid system call number,
+ * we prepare to skip the system call restart handling.
+ */
+ if (!entryexit && regs->gprs[2] >= NR_syscalls)
+ regs->trap = -1;
+
+ /*
* this isn't the same as continuing with a signal, but it will do
* for normal use. strace only continues with a signal if the
* stopping signal is not SIGTRAP. -brl

ubd-aio

Last Changed - Mon Apr 3 13:40:46 EDT 2006

This adds AIO support to the ubd driver.

ubd-drop-lock

Last Changed - Mon Apr 3 13:43:37 EDT 2006

This patch changes the ubd I/O submission process to avoid some sleeping.
When the host returns -EAGAIN from io_submit, do_ubd_request returns to
its caller, saving the current state of the request submission in the
struct ubd. This state consists of the request structure and the range
of sg entries which have not yet been submitted. If the request queue is
drained, then this state is reset to indicate that, the next time it is
called, a new request needs to be pulled from the request queue.
When do_ubd_request returns because the host can handle no more requests,
it is necessary to rerun the queue after some completions have been handled.
This is done by adding the device to the restart list. ubd_intr walks
this list before returning, calling do_ubd_request for each device.
In addition, the queues and queue locks are now per-device, rather than
having a single queue and lock for all devices.
Note that kmalloc is still called, and can sleep. This is fixed in a
future patch.
This patch changes the ubd I/O submission process to avoid some sleeping.
When the host returns -EAGAIN from io_submit, do_ubd_request returns to
its caller, saving the current state of the request submission in the
struct ubd. This state consists of the request structure and the range
of sg entries which have not yet been submitted. If the request queue is
drained, then this state is reset to indicate that, the next time it is
called, a new request needs to be pulled from the request queue.
When do_ubd_request returns because the host can handle no more requests,
it is necessary to rerun the queue after some completions have been handled.
This is done by adding the device to the restart list. ubd_intr walks
this list before returning, calling do_ubd_request for each device.
In addition, the queues and queue locks are now per-device, rather than
having a single queue and lock for all devices.
Note that kmalloc is still called, and can sleep. This is fixed in a
future patch.

ubd-atomic

Last Changed - Mon Apr 3 13:43:45 EDT 2006

To ensure that I/O can always make progress, even when there is no
memory, we provide static buffers which are to be used when dynamic
ones can't be allocated. These buffers are protected by flags which
are set when they are currently in use. The use of these flags is
protected by the queue lock, which is held for the duration of the
do_ubd_request call.

There is an allocation failure emulation
mechanism here - setting fail_start and fail_end will cause
allocations in that range (fail_start <= allocations < fail_end) to
fail, invoking the emergency mechanism.
When this is happening, I/O requests proceed one at a time,
essentially synchronously, until allocations start succeeding again.

This currently doesn't handle the bitmap array, since that can be of
any length, so we can't have a static version of it at this point.

bitmap-atomic

Last Changed - Mon Apr 3 14:02:09 EDT 2006

This patch completes the robustness and deadlock avoidance work by
handling the writing of the bitmap. The existing method of dealing
with low-memory situations by having an emergency structure for use
when memory can't be allocated won't work here because of the
variable size of the bitmap buffer, and the unknown (to me) limit of
a contiguous I/O request.
The allocation is avoided by writing directly from the bitmap rather
than allocating a buffer and copying the relevant chunk of the
bitmap into it.
This has a number of consequences. First, since the bitmap is
written directly from the device's bitmap, bits should not be set in
it until the I/O is just about to start. This is because reads
would see that and possibly race with the outgoing writes, returning
data from a section of the COW file which has never been written.
To prevent this, reads that overlap a pending bitmap write are
stalled until the write is finished. Modifying the bitmap bits as
late as possible shrinks the window in which this could happen.
Second, a section of bitmap that's being written out should not be
modified again until the write has finished. Otherwise, a bit might
be set and picked up by a pending I/O, resulting in it being on disk
too soon.
So, there are a couple new lists. Bitmap writes which have been
issued, but not finished are on the pending_bitmaps list. Any
subsequent bitmap writes which overlap a pending write have to
wait. These are put on the waiting_bitmaps list. Whenever a
pending bitmap write finishes, any overlapping waiting writes are
tried. They may continue waiting because they overlap an earlier
waiter, but at least one will proceed. Third, reads which overlap a
pending or waiting bitmap write will wait until those writes have
finished. This is done by do_io returning -EAGAIN, causing the
queue to wait until some requests have finished.
This patch completes the robustness and deadlock avoidance work by
handling the writing of the bitmap. The existing method of dealing
with low-memory situations by having an emergency structure for use
when memory can't be allocated won't work here because of the
variable size of the bitmap buffer, and the unknown (to me) limit of
a contiguous I/O request.
The allocation is avoided by writing directly from the bitmap rather
than allocating a buffer and copying the relevant chunk of the
bitmap into it.
This has a number of consequences. First, since the bitmap is
written directly from the device's bitmap, bits should not be set in
it until the I/O is just about to start. This is because reads
would see that and possibly race with the outgoing writes, returning
data from a section of the COW file which has never been written.
To prevent this, reads that overlap a pending bitmap write are
stalled until the write is finished. Modifying the bitmap bits as
late as possible shrinks the window in which this could happen.
Second, a section of bitmap that's being written out should not be
modified again until the write has finished. Otherwise, a bit might
be set and picked up by a pending I/O, resulting in it being on disk
too soon.
So, there are a couple new lists. Bitmap writes which have been
issued, but not finished are on the pending_bitmaps list. Any
subsequent bitmap writes which overlap a pending write have to
wait. These are put on the waiting_bitmaps list. Whenever a
pending bitmap write finishes, any overlapping waiting writes are
tried. They may continue waiting because they overlap an earlier
waiter, but at least one will proceed. Third, reads which overlap a
pending or waiting bitmap write will wait until those writes have
finished. This is done by do_io returning -EAGAIN, causing the
queue to wait until some requests have finished.

non-aio-deadlock

Last Changed - Tue Mar 21 11:57:50 EST 2006

The pipe to the AIO thread needs to be non-blocking so that we know when
to return -EAGAIN and process some completions before trying some more
requests.

ubd-no-count

Last Changed - Mon Apr 3 13:50:36 EDT 2006

This patch eliminates the atomic count associated with a bitmap_io
struct. The original thinking was that there would be a number of
aio structures associated with the bitmap_io, since different chunks
of the sg element could go to different layers. The count was
needed to know when the full sg segment reached disk and it was safe
to write the bitmap.
However, the flaw in that thinking is that a bitmap_io struct is
only needed for writes, and writes always go to the COW layer.
Hence, there will only be one bitmap_io per aio, and the counting is
unnecessary.
This patch makes it possible to merge the aio and bitmap_io structs,
which would be a good cleanup.

init_aio_err

Last Changed - Tue Mar 21 11:57:51 EST 2006

Tidy the error handling in the AIO initialization.

aio-batching

Last Changed - Mon Apr 3 14:06:59 EDT 2006

I noticed that the common case in io_submit is an immediate context
switch to the AIO thread when it returns from io_getevents, followed
by a switch back. This patch changes that by having the AIO thread
wait on a pipe before calling io_getevents. When the kernel
finishes a batch of I/O, it writes the number of requests down the
pipe, and the AIO thread waits for that number, and goes back to
sleeping on the pipe.
This probably shouldn't reach mainline, as O_DIRECT I/O should have
the property of causing switching on every I/O request. Also, the
wakeup mechanism should be only used when the other side might be
sleeping.
I noticed that the common case in io_submit is an immediate context
switch to the AIO thread when it returns from io_getevents, followed
by a switch back. This patch changes that by having the AIO thread
wait on a pipe before calling io_getevents. When the kernel
finishes a batch of I/O, it writes the number of requests down the
pipe, and the AIO thread waits for that number, and goes back to
sleeping on the pipe.
This probably shouldn't reach mainline, as O_DIRECT I/O should have
the property of causing switching on every I/O request. Also, the
wakeup mechanism should be only used when the other side might be
sleeping.

o_direct

Last Changed - Mon Apr 3 13:53:03 EDT 2006

This enables O_DIRECT on ubd devices. This needs work, as it will die
when creating a COW file. It also needs to do buffered I/O on backing
files.

init-io-req

Last Changed - Mon Apr 3 13:53:05 EDT 2006

This uses the C99 syntax to initialize an io_thread_req. Given how this
is compiled, it may not be a good idea, as it will consume more stack
than it should.

no-o-direct

Last Changed - Mon Apr 3 09:18:14 EDT 2006

This is the reversion of the o_direct patch so I can make COW files.

no-fakehd

Last Changed - Fri Mar 24 14:48:53 EST 2006

The fakehd switch lost its implementation at some point. Since no one is
screaming for it, we might as well remove it.

cow-odirect

Last Changed - Mon Apr 3 14:02:43 EDT 2006

Start fixing the problems with aligned access to COW files when O_DIRECT
is enabled.

fuse

Last Changed - Thu Apr 13 13:18:24 EDT 2006

This is the start of the FUSE server support, which will export the UML
filesystem to the host as a FUSE filesystem.

no-cow-odirect

Last Changed - Mon Apr 3 14:03:02 EDT 2006

Back out the odirect stuff temporarily.

fix-humfs

Last Changed - Mon Apr 3 08:47:32 EDT 2006

Fixes to humfs, which haven't been merged back into externfs yet.

no-sigjmpbuf

Last Changed - Mon Apr 3 15:46:53 EDT 2006

Clean up the jmpbuf code. Since softints, we no longer use sig_setjmp, so
the UML_SIGSETJMP wrapper now has a misleading name. Also, I forgot to
change the buffers from sigjmp_buf to jmp_buf.

fix-errno

Last Changed - Wed Apr 26 17:50:44 EDT 2006

Blairsorblade noticed some confusion between a system call's return
value and errno. This patch fixes a number of related bugs -
using errno instead of a return value
using a return value instead of errno
forgetting to negate a error return to get a positive error
code

2g

Last Changed - Wed Apr 26 17:55:40 EDT 2006

From Joris van Rantwijk <jvrantwijk@xs4all.nl>:
A quick hack to allow skas0 mode to run on 2G/2G hosts.

jesper-cleanups

Last Changed - Wed Apr 26 15:40:48 EDT 2006

Remove redundant NULL checks before [kv]free + small CodingStyle cleanup
for arch/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>

Hosted at