(The much-delayed part 2 in an ongoing series: shiny things in telepathy-glib)

Over a year ago I explained my first major round of work on telepathy-glib, with which it could be used to implement D-Bus services. I started to write this post shortly after, but then got distracted by a bee or something; recent discussion about having a D-Bus binding in GLib has finally prompted me to finish it.

The second major round of work on telepathy-glib dealt with implementing client code, and went sufficiently well that telepathy-qt4 (currently under development) uses essentially the same model.

Again, while telepathy-glib is intended for Telepathy implementors, I think TpProxy's ideas are generically useful, particularly for "large" D-Bus APIs.

Some background: dbus-glib and libtelepathy

GObject-CRITICAL: Assertion failed: proxy->convenient

The life-cycle of a DBusGProxy (PD clipart credit: karderio and Andy)

The API provided by dbus-glib for client code is, er, rather less than ideal. The basic object is DBusGProxy, which has a few problems.

There's one per interface, rather than one per object; this leads to a maze of twisty GObjects, all suspiciously similar. Our old client library, libtelepathy, got round this by subclassing DBusGProxy to make helper objects called TpChan, TpConn etc., which were simultaneously the proxy for the "main" interface (Channel, Connection etc.), and a factory for proxies for the other interfaces (like Channel.Type.Text and Channel.Interface.Password). This still wasn't ideal, but it was a start.

If you use a DBusGProxy in any way after it has emitted the destroy signal, this is considered to be invalid use of the API, and dbus-glib asserts all over the place. We'd rather it didn't do that - remote objects becoming invalid is a fact of life, and if you don't immediately discard all your references to those objects, remote operations on them should just give you a runtime error. You have to be prepared to handle such runtime errors already, because any IPC call can fail at any time.

The main API for calling methods uses varargs and is rather subtle. Notably, the calling convention for variant parameters (of D-Bus signature 'v') is not consistent with the calling convention for any other container. For any other container, for instance a GPtrArray *, you pass in a GPtrArray ** pointing to a GPtrArray * variable containing NULL; on success, a pointer to a new GPtrArray is written into that variable. For variant parameters, though, you must pass in a GValue * pointing to a zero-filled GValue, and the result will be written into that GValue.

dbus-binding-tool generates thin wrappers around this method-calling API which provide type-safety, although they can't directly specify a non-default timeout.

As for signals, before binding to a signal, you have to tell dbus-glib about its signature. This tells dbus-glib how it should push the signal arguments onto the C stack (in GObject terminology: how it should marshal them), to match the signature of the signal-handler function.

The documentation claims that this isn't necessary for objects supporting the Introspect method, but that's untrue - on the other hand, if the documented behaviour actually happened, I would consider this to be a critical bug in dbus-glib, since it would enable remote objects to crash dbus-glib clients (by causing signal arguments to be pushed onto the stack in a way that does not match the signature of the signal handler).

Of course, the signal-adding function takes varargs, and has to be called exactly once per signal per object. If you don't, dbus-glib asserts when you connect to the signal; if you call it twice, dbus-glib asserts anyway.

TpProxy

TpProxy is a GObject subclass vaguely inspired by Python's dbus.proxy.ProxyObject. It isn't a subclass of DBusGProxy, which lets us encapsulate dbus-glib almost completely - in fact, all the references to DBusGProxy are in a separate header, proxy-subclass.h, only used by subclasses of TpProxy (as opposed to normal library API users).

TpProxy instances are per-remote-object (per-object-path) like in dbus-python, rather than per-interface like in dbus-glib or QtDBus. As explained below, I believe that this is the most natural object mapping for D-Bus interfaces; in practice it tends to lead to clearer code for us, since one remote object often maps nicely to one UI element (of course, if your objects only have one interface, this approach is no better or worse than dbus-glib's).

TpProxy provides support for the following common D-Bus things:

  • Calling methods
  • Connecting to signals
  • Objects becoming invalid
  • Objects with optional interfaces
  • Extensibility

It doesn't currently provide any particular help with Properties - the Properties interface is just another interface as far as we're concerned. We find that, in practice, that's not a problem (even though we use D-Bus properties extensively).

In telepathy-qt4 we basically reinvented TpProxy in C++, as Tp::DBusProxy. I've put in some comments below illustrating how the general ideas we used in TpProxy translate into a different object mapping.

Calling methods

Method calls on TpProxy are done through auto-generated wrapper functions, much like DBusGProxy. These look something like this:

typedef void (*tp_cli_channel_callback_for_close) (TpChannel *proxy,
    /* any 'out' arguments would go here - Close() doesn't have any */
    const GError *error, gpointer user_data,
    GObject *weak_object);

TpProxyPendingCall *tp_cli_channel_call_close (TpChannel *proxy,
    gint timeout_ms,
    /* any 'in' arguments would go here - Close() doesn't have any */
    tp_cli_channel_callback_for_close callback,
    gpointer user_data,
    GDestroyNotify destroy,
    GObject *weak_object);

Methods for some interfaces, like DBus.Properties, are available on TpProxy itself; others, like Telepathy's Channel.anything, are only available on a particular subclass, like TpChannel.

Things to note here:

  • It's an asynchronous call with a callback. General design principle: "this is IPC, get over it". We don't try to hide the fact that IPC is taking place by doing pseudo-blocking calls.

  • The timeout is explicit (although you can always pass -1 to let libdbus pick a "reasonable" default for you). This is because the remote service might not respond, and API users should be at least vaguely aware of this: "this is IPC, get over it".

  • The callback isn't a GCallback; it has an explicit signature, for some semblance of type-safety (it's not much, but it's better than nothing).

  • The callback takes a GError, because any IPC call can fail for any reason.

  • A TpProxyPendingCall * is returned. This is a pointer to an object that only exists until just after the callback is called (so you can ignore it if you don't want it), with a method you can call to cancel the method call (with hindsight this is not ideal, and if I was writing this API now, I'd use gio's GCancellable). "Cancel" is perhaps slightly misleading: the method call still takes place (it was already in libdbus's outgoing queue) but the result gets ignored and the callback isn't called. This is mostly so you an object can forget all the calls it was busy making at the time that the results become irrelevant (e.g. the object is destroyed).

  • There is an extra weak_object argument, which is weakly referenced and is passed to the callback: in practice, most users of telepathy-glib use this as well as or instead of user_data. If the weak_object runs out of references, the callback is automatically cancelled, which means that in practice, explicit cancellation is almost never needed.

  • The generated code has as its first argument a suitable subclass of TpProxy, in this case TpChannel (we tell the code generator about subclasses).

  • You can't see it in the API, but the TpProxy gains an extra ref while a method call is in progress, so you never have to worry about what happens if the proxy gets unreffed during a method call.

  • There's an explicit destructor for the user_data, which is called even if you "cancel" the method call.

In telepathy-qt4, these methods are on small generated helper classes similar to dbus-python's dbus.Interface, one per interface (we provide accessors for them on the non-generated DBusProxy subclass). Instead of using callbacks, we return a temporary QObject per call, which emits a Qt signal on success or failure (that's how all async calls in QtDBus work).

Connecting to signals

Similarly, signals have some generated functions:

typedef void (*tp_cli_channel_signal_callback_closed) (TpChannel *proxy,
    /* any arguments would go here - Closed doesn't have any */
    gpointer user_data, GObject *weak_object);
TpProxySignalConnection *tp_cli_channel_connect_to_closed (TpChannel *proxy,
    tp_cli_channel_signal_callback_closed callback,
    gpointer user_data,
    GDestroyNotify destroy,
    GObject *weak_object,
    GError **error);

API notes for these:

  • The callback doesn't take a GError any more, because signals can't fail; the weak_object and the user_data are the same as for method calls
  • This isn't actually a GObject signal at all: those aren't particularly typesafe, and lack namespacing. This might not be very binding-friendly, we haven't tried doing that yet... it might be worth introducing a signal with details, like in dbus-glib, for which these functions are "C bindings".
  • The returned TpProxySignalConnection lets you disconnect from the signal, just like the TpProxyPendingCall above; it can safely be ignored
  • Similarly, if the weak_object dies, then the signal automatically disconnects
  • It is possible for connecting to a signal to fail: this happens if the proxy doesn't actually have the requested interface. If this happens, it's graceful, not a crash. In practice, the flow of code is almost always such that error can safely be NULL because the interfaces are already known.

In telepathy-qt4 the per-interface generated helper classes emit Qt signals. In many cases, we end up proxying these signals by responding to them in the hand-written Tp::DBusProxy subclass by emitting a different signal, in order to give them a nicer representation.

Becoming invalid

This is IPC, and any error can happen to you at any time. We found that it was useful to have a general concept of "this object is no longer useful", so we introduced the concept of an object becoming invalidated.

The invalidation reason is stored on the object as a GError. If this GError is NULL (as it is initially), then the object is still expected to be valid; if it's non-NULL, the remote object has vanished and the local proxy no longer works.

There are several ways a TpProxy can become invalid:

  • its bus name falls off the bus (the process exited or crashed)
  • application-specific reasons (usually triggered by a D-Bus signal in a subclass, like the Telepathy Channel's Closed signal)
  • the GObject gets disposed

Method calls on an invalidated object always fail, with the invalidation reason as the error. This means that calling a method on a TpProxy should never crash your process. The method's callback has to be prepared to handle an error in any case, because this is SpartaWIPC, so there's no loss in giving it another error.

Connecting to signals on an invalidated object always fails, and any existing signal connections are disconnected when it becomes invalidated. This means that if a new remote object appears, and it happens to have the same object path, bus name etc. as the old one, you won't get its signals.

A TpProxy can either be bound to a unique bus name or a well-known bus name. In the equivalent code in telepathy-qt4, we call this "stateful" or "stateless" - these names don't capture the intention perfectly, but they're the best we could come up with.

A proxy for a stateful API like Telepathy's Channel should always bind to a unique name - when the unique name falls off the bus, the TpProxy becomes invalidated automatically, representing the fact that the Channel you had no longer exists, and nothing can be a perfect replacement for it (the VoIP call has already been terminated, you've already left the chatroom, or whatever - if you make another Channel, that's e.g. a distinct VoIP call).

A proxy for a stateless API like Telepathy's ConnectionManager, where an exiting process can disappear, be service-activated again, and still have exactly the same API, should bind to the well-known name. Proxies for well-known names aren't invalidated when the process exits.

Qt doesn't have a natural equivalent for GError, so the invalidation reasons in telepathy-qt4 are just a pair of strings, the namespaced name and the message - basically a libdbus DBusError. The principle is the same, though.

Optional interfaces

As mentioned above, TpProxy instances are per-remote-object (per-object-path) like in dbus-python, rather than per-interface like in dbus-glib or QtDBus. I believe that this is the most natural object mapping for the D-Bus object model, because the D-Bus interfaces on a remote object can behave like any of these:

  • classes and subclasses (Telepathy.Channel.Type.Text is a Telepathy.Channel)
  • shared interfaces (some Telepathy.Channel objects, of many Types, implement Telepathy.Channel.Interface.Group; any D-Bus object can implement DBus.Properties)
  • optional features (some Telepathy.Channel.Type.Text objects implement Telepathy.Channel.Interface.ChatStates, some do not)
  • discoverable extensions (some Telepathy.Channel.Type.Text objects implement Telepathy.Channel.Interface.Messages, which is more or less Text 2.0)

The way in which interfaces are discovered is also variable. The "classic" D-Bus way to discover interfaces would be to call Introspect. Telepathy services still support that style of introspection, but we don't use it for anything beyond d-feet:

  • Introspect returns a blob of XML, which you have to parse into something useful; if it was in a D-Bus data structure, you'd already have some sensible data structure.
  • Finding out more information about the supported interfaces than their names isn't very useful: in a static language like C you already need to know the methods, signals, properties and their signatures in order to write and compile your code, and in a dynamic language like Python consumers of Introspect end up being remotely crashable by services with unexpected method signatures (this is one of dbus-python's big mistakes in my opinion).
  • If a method that you want to call turns out to be missing, the worst that can happen (in a competently written service) is that it fails with a D-Bus error - and you have to handle D-Bus errors anyway, because this is IPC and anything could fail.
  • Static bindings like dbus-glib have poor support for removing interfaces for which C code exists at runtime, whereas Telepathy needs features like an IM connection that might or might not support avatars (we don't get to find out until we've successfully connected to the server).

Instead, our older interfaces have a method on the "base class" (Channel, Connection etc.) called GetInterfaces, which just returns an array of strings. Newer interfaces (like Account), and older interfaces that have been ported (like Channel), have a property called Interfaces which is, again, an array of strings - this lets us combine the download of the interfaces list with downloading other basic information in a GetAll call.

To cater for all these ways to use interfaces, the TpProxy base class has very basic support for interface discovery - you can ask it whether it supports an interface, and you can tell it that it does, in fact, support an interface.

Asking which interfaces are supported is directly useful for library users; it's also used as a check by the generated method-call and signal-connection stubs, which return a canned error (Telepathy.Error.NotImplemented in Telepathy's case) if the object isn't known to have the interface.

Telling TpProxy that it does support an interface is intended for use by subclasses, and might have been protected if we were writing C++: what happens in practice is that a concrete subclass like TpChannel knows how to discover the fully supported interfaces for this particular object, does so, and calls methods on the TpProxy to tell it which interfaces can work.

Extensibility

The API stubs used by TpProxy to call methods and bind to signals are generated from XML documents containing an augmented form of D-Bus introspection data. These XML documents contain various Telepathy-specific extensions, but none of the extensions are language-specific - telepathy-glib, telepathy-python and telepathy-qt4 all operate from exactly the same XML specification. telepathy-qt4's different object-mapping did require us to tighten up some rules that were previously only conventions, but even so, the format is the same. I think this is vitally important - it just doesn't scale to have one set of language-specific annotations in a D-Bus API for each language that will have bindings.

One thing that we make heavy use of is what I call "Ugly_Case", which is camel-case with underscores at word boundaries. This gives us a simple, unambiguous and foolproof rule to use in code generators whenever we generate any of the popular conventions for identifiers:

  • CamelCase: delete underscores
  • javaCamelCase: force everything before the first underscore to lower case, then delete underscores
  • lower_case: force everything to lower case, retain underscores
  • UPPER_CASE: force everything to upper case, retain underscores

dbus-glib, by contrast, applies a complex and subtly buggy algorithm to the CamelCase D-Bus names, which results in our SendDTMF method being mapped into GLib as send_dt_mf, and requires that telepathy-glib's Python reimplementation of dbus-binding-tool uses a bug-for-bug compatible implementation (we ended up reimplementing dbus-binding-tool in order to generate the TpProxy-based bindings).

Our code generation tools are still rather ad-hoc, because so is our spec format, but for draft interfaces it's possible to copy them into individual projects and use them to generate additional methods for TpProxy or any subclass. This is how we implement unfinished APIs like geolocation in Empathy, for instance.

The Interfaces or GetInterfaces hook described above is entirely usable for these extension interfaces, and in fact that's how they're set up.

In telepathy-qt4, the generated method stubs are genuine C++ methods, so we can't just append methods to an existing class: this is why we have one helper class per interface, so individual clients can generate a helper class for their particular version of an unfinished API, and instantiate an object of this class attached to a particular Tp::DBusProxy.

When an interface becomes stable, it can be added to the set of interfaces for which code is generated in telepathy-glib or telepathy-qt4, at which point any client or service that was already using the final draft of that interface can easily be ported to the library version using sed (the API remains the same).

Becoming ready

The concept of being "ready" does not directly exist in TpProxy, but is implemented in TpChannel, TpConnection and TpConnectionManager. The idea is that a freshly created proxy object isn't really fully usable yet - you have to connect to basic signals and recover the initial state of the remote object, as well as checking which extension interfaces are supported. In many well-designed D-Bus APIs you can do this initial setup with a few signal connections and a DBus.Properties.GetAll call.

After doing that initial setup (which has to be done asynchronously, because it's IPC), the TpProxy subclass has a local cache of the remote object's state (not necessarily in the same format as the representation on the bus), which can be accessed synchronously at any time, and will be updated in response to change-notification signals.

TpChannel and subclasses like it have explicit support for checking for readiness, and having a callback called when the object is ready or invalidated (whichever happens first). The "ready" property also supports the GObject notify signal.

telepathy-qt4 took this concept and ran with it - there is library support for objects that can become ready, including a mixin. If I had as much time to work on telepathy-glib as I wish I had, this would be one of the first telepathy-qt4 features to be added to telepathy-glib.

Features becoming ready

Going beyond that, in an extensible framework like Telepathy it's probable that not every client understands (or wants to understand) every feature of every object, so it's highly inefficient for every proxy object to subscribe to all the change notification signals and cache all the remote state on the off-chance that the library user wants them.

In the API for TpContact (which is not actually a TpProxy for various reasons) we introduced the concept of optional features; telepathy-qt4 extended this idea throughout the library. The idea is that each library user knows which of the optional features are "interesting" to it, and makes a single asynchronous call (which expands into a series of D-Bus calls inside the library) to download the state for core functionality plus all of the selected features, and subscribe to change notification for all of those.

This is another telepathy-qt4 thing that telepathy-glib still needs to catch up with; until someone works out how to make the clone() syscall apply to programmers, I fear all libraries are doomed to lag behind how their designers want them to look :-)

This document

Since this document ended up quite long, it might be useful to adapt it into documentation. To support that, this work is licensed under the Creative Commons Attribution 3.0 Unported License. Alternatively, at your option, you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.