This is Simon McVittie's software development blog. Main site: pseudorandom.co.uk

GNOME Developer Experience hackfest: xdg-app + Debian

Over the last few days I've been at the GNOME Developer Experience hackfest in Brussels, looking into xdg-app and how best to use it in Debian and Debian derivatives.

xdg-app is basically a way to run "non-core" software on Linux distributions, analogous to apps on Android and iOS. It doesn't replace distributions like Debian or packaging systems, but it adds a layer above them. It's mostly aimed towards third-party apps obtained from somewhere that isn't your distribution vendor, aiming to address a few long-standing problems in that space:

  • There's no single ABI that can be called "a standard Linux system" in the same way there would be for Windows or OS X or Android or whatever, apart from LSB which is rather limited. Testing that a third-party app "works on Linux", or even "works on stable Linux releases from 2015", involves a combinatorial explosion of different distributions, desktop environments and local configurations. Steam uses the Steam Runtime, a chroot environment closely resembling Ubuntu 12.04 LTS; other vendors tend to test on a vaguely recent Ubuntu LTS and leave it at that.

  • There's no widely-supported mechanism for installing third-party applications as an ordinary user. gog.com used to distribute Ubuntu- and Debian-compatible .deb files, but installing a .deb involves running arbitrary vendor-supplied scripts as root, which should worry anyone who wants any sort of privilege-separation. (They have now switched to executable self-extracting installers, which involve running arbitrary vendor-supplied scripts as an ordinary user... better, but not perfect.)

  • Relatedly, the third-party application itself runs with the user's full privileges: a malicious or security-buggy third-party application can do more or less anything, unless you either switch to a different uid to run third-party apps, or use a carefully-written, app-specific AppArmor profile or equivalent.

To address the first point, each application uses a specified "runtime", which is available as /usr inside its sandbox. This can be used to run application bundles with multiple, potentially incompatible sets of dependencies within the same desktop environment. A runtime can be updated within its branch - for instance, if an application uses the "GNOME 3.18" runtime (consisting of a basic Linux system, the GNOME 3.18 libraries, other related libraries like Mesa, and their recursive dependencies like libjpeg), it can expect to see minor-version updates from GNOME 3.18.x (including any security updates that might be necessary for the bundled libraries), but not a jump to GNOME 3.20.

To address the second issue, the plan is for application bundles to be available as a single file, containing metadata (such as the runtime to use), the app itself, and any dependencies that are not available in the runtime (which the app vendor is responsible for updating if necessary). However, the primary way to distribute and upgrade runtimes and applications is to package them as OSTree repositories, which provide a git-like content-addressed filesystem, with efficient updates using binary deltas. The resulting files are hard-linked into place.

To address the last point, application bundles run partially isolated from the wider system, using containerization techniques such as namespaces to prevent direct access to system resources. Resources from outside the sandbox can be accessed via "portal" services, which are responsible for access control; for example, the Documents portal (the only one, so far) displays an "Open" dialog outside the sandbox, then allows the application to access only the selected file.

xdg-app for Debian

One thing I've been doing at this hackfest is improving the existing Debian/Ubuntu packaging for xdg-app (and its dependencies ostree and libgsystem), aiming to get it into a state where I can upload it to Debian experimental. Because xdg-app aims to be a general freedesktop project, I'm currently intending to make it part of the "Utopia" packaging team alongside projects like D-Bus and polkit, but I'm open to suggestions if people want to co-maintain it elsewhere.

In the process of updating xdg-app, I sent various patches to Alex, mostly fixing build and test issues, which are in the new 0.4.8 release.

I'd appreciate co-maintainers and further testing for this stuff, particularly ostree: ostree is primarily a whole-OS deployment technology, which isn't a use-case that I've tested, and in particular ostree-grub2 probably doesn't work yet.

Source code:

Binaries (no trust path, so only use these if you have a test VM):

  • deb https://people.debian.org/~smcv/xdg-app xdg-app main

The "Hello, World" of xdg-apps

Another thing I set out to do here was to make a runtime and an app out of Debian packages. Most of the test applications in and around GNOME use the "freedesktop" or "GNOME" runtimes, which consist of a Yocto base system and lots of RPMs, are rebuilt from first principles on-demand, and are extensive and capable enough that they make it somewhat non-obvious what's in an app or a runtime.

So, here's a step-by-step route through xdg-app, first using typical GNOME instructions, but then using the simplest GUI app I could find - xvt, a small xterm clone. I'm using a Debian testing (stretch) x86_64 virtual machine for all this. xdg-app currently requires systemd-logind to put users and apps in cgroups, either with systemd as pid 1 (systemd-sysv) or systemd-shim and cgmanager; I used the default systemd-sysv. In principle it could work with plain cgmanager, but nobody has contributed that support yet.

Demonstrating an existing xdg-app

Debian's kernel is currently patched to be able to allow unprivileged users to create user namespaces, but make it runtime-configurable, because there have been various security issues in that feature, making it a security risk for a typical machine (and particularly a server). Hopefully unprivileged user namespaces will soon be secure enough that we can enable them by default, but for now, we have to do one of three things to let xdg-app use them:

  • enable unprivileged user namespaces via sysctl:

    sudo sysctl kernel.unprivileged_userns_clone=1
    
  • make xdg-app root-privileged (it will keep CAP_SYS_ADMIN and drop the rest):

    sudo dpkg-statoverride --update --add root root 04755 /usr/bin/xdg-app-helper
    
  • make xdg-app slightly less privileged:

    sudo setcap cap_sys_admin+ep /usr/bin/xdg-app-helper
    

First, we'll need a runtime. The standard xdg-app tutorial would tell you to download the "GNOME Platform" version 3.18. To do that, you'd add a remote, which is a bit like a git remote, and a bit like an apt repository:

$ wget http://sdk.gnome.org/keys/gnome-sdk.gpg
$ xdg-app remote-add --user --gpg-import=gnome-sdk.gpg gnome \
    http://sdk.gnome.org/repo/

(I'm ignoring considerations like trust paths and security here, for brevity; in real life, you'd want to obtain the signing key via https and/or have a trust path to it, just like you would for a secure-apt signing key.)

You can list what's available in a remote:

$ xdg-app remote-ls --user gnome
...
org.freedesktop.Platform
...
org.freedesktop.Platform.Locale.cy
...
org.freedesktop.Sdk
...
org.gnome.Platform
...

The Platform runtimes are what we want here: they are collections of runtime libraries with which you can run an application. The Sdk runtimes add development tools, header files, etc. to be able to compile apps that will be compatible with the Platform.

For now, all we want is the GNOME 3.18 platform:

$ xdg-app install --user gnome org.gnome.Platform 3.18

Next, we can install an app that uses it, from Alex Larsson's nightly builds of a subset of GNOME. The server they're on doesn't have a great deal of bandwidth, so be nice :-)

$ wget http://209.132.179.2/keys/nightly.gpg
$ xdg-app remote-add --user --gpg-import=nightly.gpg nightly \
    http://209.132.179.2/repo/
$ xdg-app install --user nightly org.mypaint.MypaintDevel

We now have one app, and the runtime it needs:

$ xdg-app list
org.mypaint.MypaintDevel
$ xdg-app run org.mypaint.MypaintDevel
[you see a GUI window]

Digression: what's in a runtime?

Behind the scenes, xdg-app runtimes and apps are both OSTree trees. This means the ostree tool, from the package of the same name, can be used to inspect them.

$ sudo apt install ostree
$ ostree refs --repo ~/.local/share/xdg-app/repo
gnome:runtime/org.gnome.Platform/x86_64/3.18
nightly:app/org.mypaint.MypaintDevel/x86_64/master

A "ref" has roughly the same meaning as in git: something like a branch or a tag. ostree can list the directory tree that it represents:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
    runtime/org.gnome.Platform/x86_64/3.18
d00755 0 0      0 /
-00644 0 0    493 /metadata
d00755 0 0      0 /files
$ ostree ls --repo ~/.local/share/xdg-app/repo \
    runtime/org.gnome.Platform/x86_64/3.18 /files
d00755 0 0      0 /files
l00777 0 0      0 /files/local -> ../var/usrlocal
l00777 0 0      0 /files/sbin -> bin
d00755 0 0      0 /files/bin
d00755 0 0      0 /files/cache
d00755 0 0      0 /files/etc
d00755 0 0      0 /files/games
d00755 0 0      0 /files/include
d00755 0 0      0 /files/lib
d00755 0 0      0 /files/lib64
d00755 0 0      0 /files/libexec
d00755 0 0      0 /files/share
d00755 0 0      0 /files/src

You can see that /files in a runtime is basically a copy of /usr. This is not coincidental: the runtime's /files gets mounted at /usr inside the xdg-app container. There is also some metadata, which is in the ini-like syntax seen in .desktop files:

$ ostree cat --repo ~/.local/share/xdg-app/repo \
    runtime/org.gnome.Platform/x86_64/3.18 /metadata
[Runtime]
name=org.gnome.Platform/x86_64/3.16
runtime=org.gnome.Platform/x86_64/3.16
sdk=org.gnome.Sdk/x86_64/3.16

[Extension org.freedesktop.Platform.GL]
version=1.2
directory=lib/GL

[Extension org.freedesktop.Platform.Timezones]
version=1.2
directory=share/zoneinfo

[Extension org.gnome.Platform.Locale]
directory=share/runtime/locale
subdirectories=true

[Environment]
GI_TYPELIB_PATH=/app/lib/girepository-1.0
GST_PLUGIN_PATH=/app/lib/gstreamer-1.0
LD_LIBRARY_PATH=/app/lib:/usr/lib/GL

Looking at an app, the situation is fairly similar:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master
d00755 0 0      0 /
-00644 0 0    258 /metadata
d00755 0 0      0 /export
d00755 0 0      0 /files

This time, /files maps to what will become /app for the application, which was compiled with --prefix=/app:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /files
d00755 0 0      0 /files
-00644 0 0   4599 /files/manifest.json
d00755 0 0      0 /files/bin
d00755 0 0      0 /files/lib
d00755 0 0      0 /files/share

There is also a /export directory, which is made visible to the host system so that the contained app can appear as a "first-class citizen" in menus:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /export
d00755 0 0      0 /export
d00755 0 0      0 /export/share
user@debian:~$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /export/share
d00755 0 0      0 /export/share
d00755 0 0      0 /export/share/app-info
d00755 0 0      0 /export/share/applications
d00755 0 0      0 /export/share/icons
user@debian:~$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /export/share/applications
d00755 0 0      0 /export/share/applications
-00644 0 0    715 /export/share/applications/org.mypaint.MypaintDevel.desktop
$ ostree cat --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master \
    /export/share/applications/org.mypaint.MypaintDevel.desktop
[Desktop Entry]
Version=1.0
Name=(Nightly) MyPaint
TryExec=mypaint
Exec=mypaint %f
Comment=Painting program for digital artists
...
Comment[zh_HK]=藝術家的电脑绘画
GenericName=Raster Graphics Editor
GenericName[fr]=Éditeur d'Image Matricielle
MimeType=image/openraster;image/png;image/jpeg;
Type=Application
Icon=org.mypaint.MypaintDevel
StartupNotify=true
Categories=Graphics;GTK;2DGraphics;RasterGraphics;
Terminal=false

Again, there's some metadata:

$ ostree cat --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /metadata
[Application]
name=org.mypaint.MypaintDevel
runtime=org.gnome.Platform/x86_64/3.18
sdk=org.gnome.Sdk/x86_64/3.18
command=mypaint

[Context]
shared=ipc;
sockets=x11;pulseaudio;
filesystems=host;

[Extension org.mypaint.MypaintDevel.Debug]
directory=lib/debug

Building a runtime, probably the wrong way

The way in which the reference/demo runtimes and containers are generated is... involved. As far as I can tell, there's a base OS built using Yocto, and the actual GNOME bits come from RPMs. However, we don't need to go that far to get a working runtime.

In preparing this runtime I'm probably completely ignoring some best-practices and tools - but it works, so it's good enough.

First we'll need a repository:

$ sudo install -d -o$(id -nu) /srv/xdg-apps
$ ostree init --repo /srv/xdg-apps

I'm just keeping this local for this demonstration, but you could rsync it to a web server's exported directory or something - a lot like a git repository, it's just a collection of files. We want everything in /usr because that's what xdg-app expects, hence usrmerge:

$ sudo mount -t tmpfs -o mode=0755 tmpfs /mnt
$ sudo debootstrap --arch=amd64 --include=libx11-6,usrmerge \
    --variant=minbase stretch /mnt http://192.168.122.1:3142/debian
$ sudo mkdir /mnt/runtime
$ sudo mv /mnt/usr /mnt/runtime/files

This obviously has a lot of stuff in it that we don't need - most obviously init, apt and dpkg - but it's Good Enough™.

We will also need some metadata. This is sufficient:

$ sudo sh -c 'cat > /mnt/runtime/metadata'
[Runtime]
name=org.debian.Debootstrap/x86_64/8.20160130
runtime=org.debian.Debootstrap/x86_64/8.20160130

That's a runtime. We can commit it to ostree, and generate xdg-app metadata:

$ ostree commit --repo /srv/xdg-apps \
    --branch runtime/org.debian.Debootstrap/x86_64/8.20160130 \
    /mnt/runtime
$ fakeroot ostree commit --repo /srv/xdg-apps \
    --branch runtime/org.debian.Debootstrap/x86_64/8.20160130
$ fakeroot xdg-app build-update-repo /srv/xdg-apps

(I'm not sure why ostree and xdg-app report "Operation not permitted" when we aren't root or fakeroot - feedback welcome.)

build-update-repo would presumably also be the right place to GPG-sign your repository, if you were doing that.

We can add that as another xdg-app remote:

$ xdg-app remote-add --user --no-gpg-verify local file:///srv/xdg-apps
$ xdg-app remote-ls --user local
org.debian.Debootstrap

Building an app, probably the wrong way

The right way to build an app is to build a "SDK" runtime - similar to that platform runtime, but with development files and tools - and recompile the app and any missing libraries with ./configure --prefix=/app && make && make install. I'm not going to do that, because simplicity is nice, and I'm reasonably sure xvt doesn't actually hard-code /usr into the binary:

$ install -d xvt-app/files/bin
$ sudo apt-get --download-only install xvt
$ dpkg-deb --fsys-tarfile /var/cache/apt/archives/xvt_2.1-20.1_amd64.deb \
    | tar -xvf - ./usr/bin/xvt
./usr/
./usr/bin/
./usr/bin/xvt
...
$ mv usr xvt-app/files

Again, we'll need metadata, and it's much simpler than the more production-quality GNOME nightly builds:

$ cat > xvt-app/metadata
[Application]
name=org.debian.packages.xvt
runtime=org.debian.Debootstrap/x86_64/8.20160130
command=xvt

[Context]
sockets=x11;
$ fakeroot ostree commit --repo /srv/xdg-apps \
    --branch app/org.debian.packages.xvt/x86_64/2.1-20.1 xvt-app
$ fakeroot xdg-app build-update-repo /srv/xdg-apps
Updating appstream branch
No appstream data for runtime/org.debian.Debootstrap/x86_64/8.20160130
No appstream data for app/org.debian.packages.xvt/x86_64/2.1-20.1
Updating summary
$ xdg-app remote-ls --user local
org.debian.Debootstrap
org.debian.packages.xvt

The obligatory screenshot

OK, good, now we can install it:

$ xdg-app install --user local org.debian.Debootstrap 8.20160130
$ xdg-app install --user local org.debian.packages.xvt 2.1-20.1
$ xdg-app run --branch=2.1-20.1 org.debian.packages.xvt

xvt in a container

and you can play around with the shell in the xvt and see what you can and can't do in the container.

I'm sure there were better ways to do most of this, but I think there's value in having such a simplistic demo to go alongside the various GNOMEish apps.


Acknowledgements:

Thanks to all those!

Discworld Noir in a Windows 98 virtual machine on Linux

Discworld Noir was a superb adventure game, but is also notoriously unreliable, even in Windows on real hardware; using Wine is just not going to work. After many attempts at bringing it back into working order, I've settled on an approach that seems to work: now that qemu and libvirt have made virtualization and emulation easy, I can run it in a version of Windows that was current at the time of its release. Unfortunately, Windows 98 doesn't virtualize particularly well either, so this still became a relatively extensive yak-shaving exercise.

discworldnoir-boxes.jpg

These instructions assume that /srv/virt is a suitable place to put disk images, but you can use anywhere you want.

The emulated PC

After some trial and error, it seems to work if I configure qemu to emulate the following:

  • Fully emulated hardware instead of virtualization (qemu-system-i386 -no-kvm)
  • Intel Pentium III
  • Intel i440fx-based motherboard with ACPI
  • Real-time clock in local time
  • No HPET
  • 256 MiB RAM
  • IDE primary master: IDE hard disk (I used 30 GiB, which is massively overkill for this game; qemu can use sparse files so it actually ends up less than 2 GiB on the host system)
  • IDE primary slave, secondary master, secondary slave: three CD-ROM drives
  • PS/2 keyboard and mouse
  • Realtek AC97 sound card
  • Cirrus video card with 16 MiB video RAM

A modern laptop CPU is an order of magnitude faster than what Discworld Noir needs, so full emulation isn't a problem, despite being inefficient.

There is deliberately no networking, because Discworld Noir doesn't need it, and a 17 year old operating system with no privilege separation is very much not safe to use on the modern Internet!

Software needed

  • Windows 98 installation CD-ROM as a .iso file (cp /dev/cdrom windows98.iso) - in theory you could also use a real optical drive, but my laptop doesn't usually have one of those. I used the OEM disc, version 4.10.1998 (that's the original Windows 98, not the Second Edition), which came with a long-dead PC, and didn't bother to apply any patches.
  • A Windows 98 license key. Again, I used an OEM key from a past PC.
  • A complete set of Discworld Noir (English) CD-ROMs as .iso files. I used the UK "Sold Out Software" budget release, on 3 CDs.
  • A multi-platform Realtek AC97 audio driver.

Windows 98 installation

It seems to be easiest to do this bit by running qemu-system-i386 manually:

qemu-img create -f qcow2 /srv/virt/discworldnoir.qcow2 30G
qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \
    -drive media=cdrom,format=raw,file=/srv/virt/windows98.iso \
    -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime

Don't start the installation immediately. Instead, boot the installation CD to a DOS prompt with CD-ROM support. From here, run

fdisk

and create a single partition filling the emulated hard disk. When finished, hard-reboot the virtual machine (press Ctrl+C on the qemu-system-i386 process and run it again).

The DOS FORMAT.COM utility is on the Windows CD-ROM but not in the root directory or the default %PATH%, so you'll have to run:

d:\win98\format c:

to create the FAT filesystem. You might have to reboot again at this point.

The reason for doing this the hard way is that the Windows 98 installer doesn't detect qemu as supporting ACPI. You want ACPI support, so that Windows will issue IDLE instructions from its idle loop, instead of occupying a CPU core with a busy-loop. To get that, boot to a DOS prompt again, and use:

setup /p j /iv

/p j forces ACPI support (Thanks to "Richard S" on the Virtualbox forums for this tip.) /iv is unimportant, but it disables the annoying "billboards" during installation, which advertised once-exciting features like support for dial-up modems and JPEG wallpaper.

I used a "Typical" installation; there didn't seem to be much point in tweaking the installed package set when everything is so small by modern standards.

Windows 98 has built-in support for the Cirrus VGA card that we're emulating, so after a few reboots, it should be able to run in a semi-modern resolution and colour depth. Discworld Noir apparently prefers a 640 × 480 × 16-bit video mode, so right-click on the desktop background, choose Properties and set that up.

Audio drivers

This is the part that took me the longest to get working. Of the sound cards that qemu can emulate, Windows 98 only supports the SoundBlaster 16 out of the box. Unfortunately, the Soundblaster 16 emulation in qemu is incomplete, and in particular version 2.1 (as shipped in Debian 8) has a tendency to make Windows lock up during boot.

I've seen advice in various places to emulate an Eqsonic ES1370 (SoundBlaster AWE 64), but that didn't work for me: one of the drivers I tried caused Windows to lock up at a black screen during boot, and the other didn't detect the emulated hardware.

The next-oldest sound card that qemu can emulate is a Realtek AC97, which was often found integrated into motherboards in the late 1990s. This one seems to work, with the "A400" driver bundle linked above. For Windows 98 first edition, you need a driver bundle that includes the old "VXD" drivers, not just the "WDM" drivers supported by Second Edition and newer.

The easiest way to get that into qemu seems to be to turn it into a CD image:

genisoimage -o /srv/virt/discworldnoir-drivers.iso WDM_A400.exe
qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \
    -drive media=cdrom,format=raw,file=/srv/virt/windows98.iso \
    -drive media=cdrom,format=raw,file=/srv/virt/discworldnoir-drivers.iso \
    -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime -soundhw ac97

Run the installer from E:, then reboot with the Windows 98 CD inserted, and Windows should install the driver.

Installing Discworld Noir

Boot up the virtual machine with CD 1 in the emulated drive:

qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \
    -drive media=cdrom,format=raw,file=/srv/virt/DWN_ENG_1.iso \
    -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime -soundhw ac97

You might be thinking "... why not insert all three CDs into D:, E: and F:?" but the installer expects subsequent disks to appear in the same drive where CD 1 was initially, so that won't work. Instead, when prompted for a new CD, switch to the qemu monitor with Ctrl+Alt+2 (note that this is 2, not F2). At the (qemu) prompt, use info block to see a list of emulated drives, then issue a command like

change ide0-cd1 /srv/virt/DWN_ENG_2.iso

to swap the CD. Then switch back to Windows' console with Ctrl+Alt+1 and continue installation. I used a Full installation of Discworld Noir.

Transferring the virtual machine to GNOME Boxes

Having finished the "control freak" phase of installation, I wanted a slightly more user-friendly way to run this game, so I transferred the virtual machine to be used by libvirtd, which is the backend for both GNOME Boxes and virt-manager:

virsh create discworldnoir.xml

Here is the configuration I used. It's a mixture of automatic configuration from virt-manager, and hand-edited configuration to make it match the qemu-system-i386 command-line.

Running the game

If all goes well, you should now see a discworldnoir virtual machine in GNOME Boxes, in which you can boot Windows 98 and play the game. Have fun!

Why polkit (or, how to mount a disk on modern Linux)

I've recently found myself explaining polkit (formerly PolicyKit) to one of Collabora's clients, and thought that blogging about the same topic might be useful for other people who are confused by it; so, here is why udisks2 and polkit are the way they are.

As always, opinions in this blog are my own, not Collabora's.

Privileged actions

Broadly, there are two ways a process can do something: it can do it directly (i.e. ask the kernel directly), or it can use inter-process communication to ask a service to do that operation on its behalf. If it does it directly, the components that say whether it can succeed are the Linux kernel's normal permissions checks (DAC), and if configured, AppArmor, SELinux or a similar MAC layer. All very simple so far.

Unfortunately, the kernel's relatively coarse-grained checks are not sufficient to express the sorts of policies that exist on a desktop/laptop/mobile system. My favourite example for this sort of thing is mounting filesystems. If I plug in a USB stick with a FAT filesystem, it's reasonable to expect my chosen user interface to either mount it automatically, or let me press a button to mount it. Similarly, to avoid data loss, I should be able to unmount it when I'm finished with it. However, mounting and unmounting a USB stick is fundamentally the same system call as mounting and unmounting any other filesystem - and if ordinary users can do arbitrary mount system calls, they can cause all sorts of chaos, for instance by mounting a filesystem that contains setuid executables (privilege escalation), or umounting a critical OS filesystem like /usr (denial of service). Something needs to arbitrate: “you can mount filesystems, but only under certain conditions”.

The kernel developer motto for this sort of thing is “mechanism, not policy”: they are very keen to avoid encoding particular environments' policies (the sort of thing you could think of as “business rules”) in the kernel, because that makes it non-generic and hard to maintain. As a result, direct mount/unmount actions are only allowed by privileged processes, and it's up to user-space processes to arrange for a privileged process to make the desired mount syscall.

Here are some other privileged actions which laptop/desktop users can reasonably expect to “just work”, with or without requiring a sysadmin-like (root-equivalent) user:

  • reconfiguring networking (privileged because, in the general case, it's an availability and potentially integrity issue)
  • installing, upgrading or removing packages (privileged because, in the general case, it can result in arbitrary root code execution)
  • suspending or shutting down the system (privileged because you wouldn't want random people doing this on your server, but should normally be allowed on e.g. laptops for people with physical access, because they could just disconnect the power anyway)

In environments that use a MAC framework like AppArmor, actions that would normally be allowed can become privileged: for instance, in a framework for sandboxed applications, most apps shouldn't be allowed to record audio. This prevents carrying out these actions directly, again resulting in the only way to achieve them being to ask a service to carry out the action.

Ask a system service to do it

On to the next design, then: I can submit a request to a privileged process, which does some checks to make sure I'm not trying to break the system (or alternatively, that I have enough sysadmin rights that I'm allowed to break the system if I want to), and then does the privileged action for me.

You might think I'm about to start discussing D-Bus and daemons, but actually, a prominent earlier implementation of this was mount(8), which is normally setuid root:

% ls -l /bin/mount
-rwsr-xr-x 1 root root 40000 May 22 11:37 /bin/mount

If you look at it from an odd angle, this is inter-process communication across a privilege boundary: I run the setuid executable, creating a process. Because the executable has the setuid bit set, the kernel makes the process highly privileged: its effective uid is root, and it has all the necessary capabilities to mount filesystems. I submit the request by passing it in the command-line arguments. mount does some checks - specifically, it looks in /etc/fstab to see whether the filesystem I'm trying to mount has the “user” or “users” flag - then carries out the mount system call.

There are a few obvious problems with this:

  • When machines had a static set of hardware devices (and a sysadmin who knew how to configure them), it might have made sense to list them all in /etc/fstab; but this is not a useful solution if you can plug in any number of USB drives, or if you are a non-expert user with Linux on your laptop. The decision ought to be based on general attributes of devices, such as “is removable?”, and on the role of the machine.
  • Setuid executables are alarmingly easy to get wrong so it is not necessarily wise to assume that mount(8) is safe to be setuid.
  • One fact that a reasonable security policy might include is “users who are logged in remotely should have less control over physically present devices than those who are physically present” - but that sort of thing can't be checked by mount(8) without specifically teaching the mount binary about it.

Ask a system service to do it, via D-Bus or other IPC

To avoid the issues of setuid, we could use inter-process communication in the traditional sense: run a privileged daemon (on boot or on-demand), make it listen for requests, and use the IPC channel as our privilege boundary.

udisks2 is one such privileged daemon, which uses D-Bus as its IPC channel. D-Bus is a commonly-used inter-process system; one of its intended/designed uses is to let user processes and system services communicate, especially this sort of communication between a privileged daemon and its less-privileged clients.

People sometimes criticize D-Bus as not doing anything you couldn't do yourself with some AF_UNIX sockets. Well, no, of course it doesn't - the important bit of the reference implementation and the various interoperable reimplementations consists of a daemon and some AF_UNIX sockets, and the rest is a simple matter of programming. However, it's sufficient for most uses in its problem space, and is usually better than inventing your own.

The advantage of D-Bus over doing your own thing is precisely that you are not doing your own thing: good IPC design is hard, and D-Bus makes some structural decisions so that fewer application authors have to think about them. For instance, it has a central “hub” daemon (the dbus-daemon, or “message bus”) so that n communicating applications don't need O(n²) sockets; it uses the dbus-daemon to provide a total message ordering so you don't have to think about message reordering; it has a distributed naming model (which can also be used as a distributed mutex) so you don't have to design that; it has a serialization format and a type system so you don't have to design one of those; it has a framework for “activating" run-on-demand daemons so they don't have to use resources initially, implemented using a setuid helper and/or systemd; and so on.

If you have religious objections to D-Bus, you can mentally replace “D-Bus” with “AF_UNIX or something” and most of this article will still be true.

Is this OK?

In either case - exec'ing a privileged helper, or submitting a request to a privileged daemon via IPC - the privileged process has two questions that it needs to answer before it does its work:

  • what am I being asked to do?
  • should I do it?

It needs to make some sort of decision on the latter based on the information available to it. However, before we even get there, there is another layer:

  • did the request get there at all?

In the setuid model, there is a simple security check that you can apply: you can make /bin/mount only executable by a particular group, or only executable by certain AppArmor profiles, or similar. That works up to a point, but cannot distinguish between physically-present and not-physically-present users, or other facts that might be interesting to your local security policy. Similarly, in the IPC model, you can make certain communication channels impossible, for instance by using dbus-daemon's ability to decide which messages to deliver, or AF_UNIX sockets' filesystem permissions, or a MAC framework like AppArmor.

Both of these are quite “coarse-grained” checks which don't really understand the finer details of what is going on. If the answer to "is this safe?” is something of the form “maybe, it depends on...”, then they can't do the right thing: they must either let it through and let the domain-specific privileged process do the check, or deny it and lose potentially useful functionality.

For instance, in an AppArmor environment, some applications have absolutely no legitimate reason to talk to udisks2, so the AppArmor policy can just block it altogether. However, once again, this is a coarse-grained check: the kernel has mechanism, not policy, and it doesn't know what the service does or why. If the application does need to be able to talk to the service at all, then finer-grained access control (obeying some, but not all, requests) has to be the service's job.

dbus-daemon does have the ability to match messages in a relatively fine-grained way, based on the object path, interface and member in the message, as well as the routing information that it uses itself (i.e. the source and destination). However, it is not clear that this makes a great deal of sense conceptually: these are facts about the mechanics of the IPC, not facts about the domain-specific request (because the mechanics of the IPC are all that dbus-daemon understands). For instance, taking the udisks2 example again, dbus-daemon can't distinguish between an attempt to adjust mount options for a USB stick (probably fine) and an attempt to adjust mount options for /usr (not good).

To have a domain-specific security policy, we need a domain-specific component, for instance udisks2, to get involved. Unlike dbus-daemon, udisks2 knows that not all disks are equal, knows which categories make sense to distinguish, and can identify which categories a particular disk is in. So udisks2 can make a more informed decision.

So, a naive approach might be to write a function in udisks2 that looks something like this pseudocode:

may_i_mount_this_disk (user, disk, mount options) → boolean
{
    if (user is root || user is root-equivalent)
        return true;

    if (disk is not removable)
       return false;

    if (mount options are scary)
       return false;

    if (user is in “manipulate non-local disks” group)
        return true;

    if (user is not logged-in locally)
       return false;

    # https://en.wikipedia.org/wiki/Multiseat_configuration
    if (user is not logged-in on the same seat where the disk is
            plugged in)
       return false;

    return true;
}

Delegating the security policy to something central

The pseudocode security policy outlined above is reasonably complicated already, and doesn't necessarily cover everything that you might want to consider.

Meanwhile, not every system is the same. A general-purpose Linux distribution like Debian might run on server/mainframe systems with only remote users, personal laptops/desktops with one root-equivalent user, locked-down corporate laptops/desktops, mobile devices and so on; these systems should not necessarily all have the same security policy.

Another interesting factor is that for some privileged operations, you might want to carry out interactive authorization: ask the requesting user to confirm that the action (which might have come from a background process) should take place (like Windows' UAC), or to prove that the person currently at the keyboard is the same as the person who logged in by giving their password (like sudo).

We could in principle write code for all of this in udisks2, and in NetworkManager, and in systemd, ... - but that clearly doesn't scale, particularly if you want the security policy to be configurable. Enter polkit (formerly PolicyKit), a system service for applying security policies to actions.

The way polkit works is that the application does its domain-specific analysis of the request - in the case of udisks2, whether the device to be mounted is removable, whether the mount options are reasonable, etc. - and converts it into an action. The action gives polkit a way to distinguish between things that are conceptually different, without needing to know the specifics. For instance, udisks2 currently divides up filesystem-mounting into org.freedesktop.udisks2.filesystem-mount, org.freedesktop.udisks2.filesystem-mount-fstab, org.freedesktop.udisks2.filesystem-mount-system and org.freedesktop.udisks2.filesystem-mount-other-seat.

The application also finds the identity of the user making the request. Next, the application sends the action, the identity of the requesting user, and any other interesting facts to polkit. As currently implemented, polkit is a D-Bus service, so this is an IPC request via D-Bus. polkit consults its database of policies in order to choose one of several results:

  • yes, allow it
  • no, do not allow it
  • ask the user to either authenticate as themselves or as a privileged (sysadmin) user to allow it, or cancel authentication to not allow it
  • ask the user to authenticate the first time, but if they do, remember that for a while and don't ask again

So how does polkit decide this? The first thing is that it reads the machine-readable description of the actions, in /usr/share/polkit-1/actions, which specifies a default policy. Next, it evaluates a local security policy to see what that says. In the current version of polkit, the local security policy is configured by writing JavaScript in /etc/polkit-1/rules.d (local policy) and /usr/share/polkit-1/rules.d (OS-vendor defaults). In older versions such as the one currently shipped in Debian unstable, there was a plugin architecture; but in practice nobody wrote plugins for it, and instead everyone used the example local authority shipped with polkit, which was configured via files in /etc/polkit-1/localauthority and /etc/polkit-1/localauthority.d.

These policies can take into account useful facts like:

  • what is the action we're talking about?
  • is the user logged-in locally? are they active, i.e. they are not just on a non-current virtual console?
  • is the user in particular groups?

For instance, gnome-control-center on Debian installs this snippet:

polkit.addRule(function(action, subject) {
        if ((action.id == "org.freedesktop.locale1.set-locale" ||
             action.id == "org.freedesktop.locale1.set-keyboard" ||
             action.id == "org.freedesktop.hostname1.set-static-hostname" ||
             action.id == "org.freedesktop.hostname1.set-hostname" ||
             action.id == "org.gnome.controlcenter.datetime.configure") &&
            subject.local &&
            subject.active &&
            subject.isInGroup ("sudo")) {
                    return polkit.Result.YES;
            }
});

which is reasonably close to being pseudocode for “active local users in the sudo group may set the system locale, keyboard layout, hostname and time, without needing to authenticate”. A system administrator could of course override that by dropping a higher-priority policy for some or all of these actions into /etc/polkit-1/rules.d.

Summary

  • Kernel-based permission checks are not sufficiently fine-grained to be able to express some quite reasonable security policies
  • Fine-grained access control needs domain-specific understanding
  • The kernel doesn't have that information (and neither does dbus-daemon)
  • The privileged service that does the domain-specific thing can provide the domain-specific understanding to turn the request into an action
  • polkit evaluates a configurable policy to determine whether privileged services should carry out requested actions
still aiming to be the universal operating system

Debian's latest round of angry mailing list threads have been about some combination of init systems, future direction and project governance. The details aren't particularly important here, and pretty much everything worthwhile in favour of or against each position has already been said several times, but I think this bit is important enough that it bears repeating: the reason I voted "we didn't need this General Resolution" ahead of the other options is that I hope we can continue to use our normal technical and decision-making processes to make Debian 8 the best possible OS distribution for everyone. That includes people who like systemd, people who dislike systemd, people who don't care either way and just want the OS to work, and everyone in between those extremes.

I think that works best when we do things, and least well when a lot of time and energy get diverted into talking about doing things. I've been trying to do my small part of the former by fixing some release-critical bugs so we can release Debian 8. Please join in, and remember to write good unblock requests so our hard-working release team can get through them in a finite time. I realise not everyone will agree with my idea of which bugs, which features and which combinations of packages are highest-priority; that's fine, there are plenty of bugs to go round!

Regarding init systems specifically, Debian 'jessie' currently works with at least systemd-sysv or sysvinit-core as pid 1 (probably also Upstart, but I haven't tried that) and I'm confident that Debian developers won't let either of those regress before it's released as Debian 8.

I expect the freeze for Debian 'stretch' (presumably Debian 9) to be a couple of years away, so it seems premature to say anything about what will or won't be supported there; that depends on what upstream developers do, and what Debian developers do, between now and then. What I can predict is that the components that get useful bug reports, active maintenance, thorough testing, careful review, and similar help from contributors will work better than the things that don't; so if you like a component and want it to be supported in Debian, you can help by, well, supporting it.


PS. If you want the Debian 8 installer to leave you running sysvinit as pid 1 after the first reboot, here's a suitable incantation to add to the kernel command-line in the installer's bootloader. This one certainly worked when KiBi asked for testing a few days ago:

preseed/late_command="in-target apt-get install -y sysvinit-core"

I think that corresponds to this line in a preseeding file, if you use those:

d-i preseed/late_command string in-target apt-get install -y sysvinit-core

A similar apt-get command, without the in-target prefix, should work on an installed system that already has systemd-sysv. Depending on other installed software, you might need to add systemd-shim to the command line too, but when I tried it, apt-get was able to work that out for itself.

If you use aptitude instead of apt-get, double-check what it will do before saying "yes" to this particular switchover: its heuristic for resolving conflicts seems to be rather more trigger-happy about removing packages than the one in apt-get.

Posted
Debian activities

Most of my recent Debian work has been the usual pkg-telepathy stuff (mainly in experimental while we're frozen), and hacking on the Quake III engine.

Having started working on OpenArena DFSG-freeness as random bug-squashing a year ago, I've accidentally ended up taking over the package. The situation for Debian 6.0 (squeeze) looks something like this:

  • openarena contains both the engine (a modified ioquake3, which I modified further for Debian) and the game logic compiled to native code
  • openarena-data has had the bytecode game logic stripped out, since no Free compiler can produce it; the bytecode has been replaced with stub files that direct our modified engine to load native code

The plan for Quake III in Debian 7.0 (wheezy) has already started in experimental:

  • ioquake3 contains the ioquake3.org port of the Quake III engine, which I've modified to be able to run either Quake III Arena or OpenArena based on runtime options
  • openarena contains the game logic compiled to native code (in experimental it still contains source code for the engine, but it's no longer patched or compiled, and I'll hopefully drop it entirely after the next upstream release), and launcher scripts to run it under the ioquake3 engine
  • openarena-data is much like it is now, although it might move to data.debian.org if that happens
  • quake3 (currently in the NEW queue) contains launcher scripts to run Quake III Arena under the ioquake3 engine, requiring a non-distributable quake3-data package
  • game-data-packager version 23 or later can build the quake3-data package for local use, if you give it pak0.pk3 (from the CD-ROM) and the latest Quake III patch

I haven't been doing as much QA work as the RCBW crowd, but here's some halfway-recent bug squashing in the hope that it motivates others:

I also started looking at the series of bugs regarding Flash not compiled from source, but I mostly got distracted by all the webapps having a contrib/ directory containing a million embedded code copies...

Older posts:

Announcing gfcombinefs
Posted
RC Bugs of W48-W49
Posted
RC Bugs of W47
Posted
telepathy-qt4 0.2.0: first shared library release
Posted
DSpam and how not to do it
Posted
Converting Debian packaging from bzr to git
Posted
Why you shouldn't block on D-Bus calls
Posted
Encrypted root filesystem on a Debian laptop
Posted
Compiling against uninstalled versions of libraries
Posted
Integrating Enemies of Carlotta with Postfix
Posted
Speeding up builds with ccache and icecc
Posted
Spec-writing On A Plane
Posted
\o/
Posted
Implementing D-Bus services with telepathy-glib
Posted
Here's how it works
Posted
A magical xrandr incantation
Posted

This blog is powered by ikiwiki.