~yerinalexey

Operating systems and namespaces

July 20, 2022

Most operating systems nowadays are running userspace processes in unrestricted mode, with barely any tools to contain their execution. Sandboxing unwanted software or limiting resource use of it is unorthodox and has to be done with external tools (e.g. bubblewrap on Linux). Limiting resources is also pretty critical when it comes to I/O devices that can be abused without consent.

On mobile, this problem is mostly solved by extreme isolation and permission policies. On desktop platforms however, this shouldn't be taken to the extreme, especially in Unix-like space where a lot of things rely on unrestricted view of the system. There should be a medium level where mostly gangerous features (or devices) are blocked off by permissions, assisted with good sandboxing tools.

A lot of progress was done in L4 microkernels and Plan 9. Both allow process namespaces where certain functionality can be added or removed independent of other parts of the system.

L4 goes further with capabilities, which represent some action or resource, can be created and exchanged between processes. Most notable kind is an endpoint, which can be called and executes a handler bound to it.

The ability of capabilities to be transferred allows to implement permissions of some sort. In a graphical environment there could be a service that handles access to various devices and asks for consent before transferring access to the caller. Example with camera access:

+----------+    request access   +---------+
|           | -----------------> |         |
| program a | <----------------- | service |
|           |  capability:       |         |
+-----------+  endpoint(record)  +---------+

Namespaces in L4 do not exist formally, but are possible because the new process can be tightly configured before it's spawned. It's possible to map only specific capabilities or freely change the address space, which allows granular control of what the process is allowed to do. For example, rerouting the '/' directory to some other place without the process noticing anything (like a chroot, but without special support from the kernel), or implementing an automatic ring buffer by mapping one page twice at subsequent locations. The possibilities are endless.

Not to forget that L4 is a microkernel design, compared to monolithic or modular kernels. Microkernels run all services directly in userspace and only with the resources they need. A serial driver only has the serial device registers mapped and can, at worst, only write something to that port in case of a bug or a vulnerability, instead of possibly breaking the entire system.

I think that a capability system like this is a great feature and can be the base of new and more secure systems.

There's also this new microkernel project, which I'm also sometimes working on. Check it out:

=> https://sr.ht/~sircmpwn/helios