RSoC: improving drivers and kernel - part 3 (largely io_uring)

By 4lDO2 on Friday, July 17, 2020

Introduction

After the last week where I was mainly blocked by the bug about blocking init, I’ve now been able to make further progress with the io_uring design. I have improved the redox-iou crate, which is Redox’s own liburing alternative, to support a fully-features buffer pool allocator meant for userspace-to-userspace io_urings (where the kernel can’t manage memory); to work with multiple secondary rings other than the main kernel ring; and to support spawning which you would expect from a proper executor in tokio or async-std.

`AsyncScheme`

So, one of the main issues with writing a new completely non-blocking interface where the previous interface sometimes blocks (or rather, blocks on syscall level, not operation level; Redox does have event queues and nonblocking I/O mainly for networking, but the syscalls aren’t asynchronous in the way that one can do multiple at at time without internally blocking, like io_uring requires). I quickly realized as I started to implement the io_uring opcodes, that I would have to reimplement every syscall from scratch, which obviously isn’t that good, especially for an already quite complex API like io_uring.

So, the Redox syscalls are mainly based on the Scheme trait (and other related traits), which is used both by userspace processes like redoxfs or nvmed for their schemes, and for internal kernel schemes, such as event:, irq: or debug:, and the in-kernel UserScheme, that abstracts away buffer management and such when a process is involved in handling a syscall. The Scheme trait is mainly blocking, but it does support toggling the O_NONBLOCK flag by using fcntl.

However, even though the kernel schemes can be nonblocking, almost every single one of them will block the current context (which means either process or a thread. Redox is quite flexible with the actual difference of a thread and a process; they are all contexts, that decide what to share and what not to share between each other when forking). The current way that the kernel handles situations where blocking is required, is by calling context::switch, which will tell the scheduler to keep continuing and update more processes, until the context that was switched away due to blocking, is unblocked, and can continue the syscall.

So, I came up with the AsyncScheme trait, which defines only a single new function, namely poll_handle. What this function is supposed to be doing, is to let the scheme know beforehand, that the caller does not want the scheme to block during the processing of that packet. It’s defined as the following:

pub trait AsyncScheme: Scheme {
    #[allow(unused_variables)]
    unsafe fn poll_handle(&self, packet: &mut Packet, cx: &mut task::Context<'_>) -> task::Poll<()> {
        task::Poll::Ready(self.handle(packet))
    }
}

As you can see, this looks quite like the AsyncRead and AsyncWrite traits from futures; instead of directly returning a future, which Rust currently doesn’t support, they force the implementor to keep track of the state themselves. While this is not the ideal solution in all cases, it actually works pretty well for schemes, since most schemes in userspace store a vec of syscalls to process once they get updatable.

This also comes with the AsyncSchemeExt trait, that wraps every scheme method, into a method returning a future.

Asyncifying the kernel syscalls

As previously mentioned, to avoid having to rewrite every syscall as an async fn, I changed the existing syscalls to be async fn, and wrote a macro that defines a corresponding .*_sync function, that will poll the syscall future, and then context switch when the future returns Poll::Pending. This changes nothing whatsoever for regular syscalls, except that they block outside the syscall function, rather than inside. This means that the io_uring kernel handler, can use the async functions instead, and prevent complete blocking.

Since the io_uring has three different modes, userspace-to-kernel, kernel-to-userspace, userspace-to-userspace, this would also allow these async syscall handlers, to maybe use the kernel’s kernel-to-userspace ring, as a replacement for the regular scheme packet mechanism, for the schemes that support io_urings.

Better `pcid` IPC

I also began to improve the IPC between pcid and subdrivers, like xhcid and nvmed, to support io_uring. This is what lead to writing a buffer pool with an included general-purpose size+align allocator (which in fact should be able to function as the global allocator for a Rust program, although it would presumably be quite slow for that purpose). The pcid IPC is one of the main examples of where io_uring gives additional benefits, mostly since it needs to be called by another process every time an MSI interrupt is masked or unmasked.

TODO

While the current redox-iou executor and reactor works fine for processes that use io_urings, it doesn’t yet have the functionality for processes that are the producers of an io_uring, for example pcid, which needs to do IPC. It should be straitforward to implement this, for the most part.

I also need to implement buffer pool sharing within the kernel, to let the producer process be able to automatically access new buffers that the consumer process needs. This is not that important for pcid, but a file system and its disk driver would certainly want to have a fast buffer pool between them, where the kernel would automagically mmap the buffers for the other process directly.

I should probably also update the RFC at some point.

And yes, the pcid <=> xhcid <=> usbscsid io_uring-backed IPC remains.