Cairo, Xlib, and the Shared Memory Extension

Update: As mentioned in a comment, XShm isn’t always available, for any number of reasons (most usually a remote X server). In this case the program would fall back to the normal method of images. This is why I choose to implement the XShm as a separate surface. The surface creation will fail if XShm isn’t available,  so the application can fallback to a more suitable surface. (And yes, the fact that it’s faster is bloody obvious; I’m as surprised as anyone that Cairo didn’t use it already. ;) )

Maybe I’m just naive, but designing a graphics API such that all image data had to be sent over a socket to another process every time the image needed to be drawn seems like complete idiocy. Unfortunately, that is precisely what the X Window System forces a program to do, and exactly what Cairo does when drawing images in Linux—a full copy of the image data, sent to another process no less, every time it is drawn. One would think there would be some room for improvement.

The X11 LogoUnsurprisingly, others felt the same way about X, and decided to write an extension, Xlib Shm or XShm for short, that allows images to be placed in a shared memory segment from which the X server reads which allows the program to avoid the memory copy. GTK already makes use of the XShm extension, and it seems like a good idea to see if Gecko couldn’t do the same.

An Xlib Shm Surface

There have been previous talks about putting such an extension into Cairo, but none of them have gone farther than a proposed patch. So over the past couple weeks I’ve looked into exploring XShm and Cairo to see whether a clean API can be made for Cairo and to measure just how real the potential speed improvements are.

I implemented a new back end surface for Cairo, cairo_xlib_shm_surface_t. It is designed with one specific use case in mind: the fast rendering of images loaded from files. Generally, one would create the XShm surface, and decode the image directly into the surface, then paint the surface to the screen. It is minimally invasive on the Xlib surface back end, adding only one if block into a single function, and so should have no impact on performance when compiled into Cairo but not used.

The patch was created relative to the git repository for Cairo, taken out on June 10, 2008. You can find the patch at http://www.ericbutler.net/demos/mozilla/2008/cairo-xshm-test/xshm.diff. Configure with the flag --enable-xlib-shm to build it, and include cairo-xlib-shm.h for the functions to create an XShm surface.

Testing

A C program was used to test the new backend for speed. The test compared rendering speeds of painting an image surface on an Xlib surface (the current method of rendering images in Firefox) to painting an Xlib Shm surface on an Xlib surface. The test allocates space for some number of images and fills them all with data. It then draws all the images to the screen, syncs with the X server to make sure drawing is complete, and lastly destroys all the images.

The test was run with images of various sizes, with alpha blending and without. The tests were run on an HP nc8430 laptop with an ATI card running Ubuntu 8.04 using the fglrx drivers. Two different items were measured:

  1. The total time to create, draw, and destroy all the images.
  2. Only the time to draw the images.

Results

Each chart indicates the size of the image, video driver used, measurement taken, and bit depth. The x-axis represents the number of images rendered each test, and the y-axis is the total time of the operation in milliseconds, averaged over 100 tests. The “image” category is a Cairo image surface, and “xshm” is my Cairo Xlib Shm surface. Only a small sampling is shown here, a page with more results can be found at http://www.ericbutler.net/demos/mozilla/2008/cairo-xshm-test/results.html.

Analysis

There are a few observations that can be made instantly from the data:

  • Drawing with XShm is a lot faster. The average speed of the drawing with XShm for any image size was somewhere between 40 and 70 percent the speed of a standard image, which is a pretty good improvement. Image-heavy sites could potentially see a noticeable decrease in page draw time.
  • Creating an XShm surface costs more than an image surface. There appears to be a constant (or at least sub-linear) but noticeable cost to creating a shared memory segment. The total time results for the small image are actually quite worse for XShm, which seems to indicate that there is some lower bound where the cost of setting up the shared memory segment outweighs the benefits gained from the draw call. However, even for images as small as 200×200, even if the image is only drawn once, XShm still provides a net gain in performance. And, of course, the more an image is drawn per creation, the less this overhead matters.

So it seems that XShm is worth pursuing. Further research will need to be done to see how XShm performs with other video drivers such as vesa. I will be exploring how to use this surface in Thebes and image library. Hopefully this can result in a speedup for Firefox on Linux and any and some other Cairo applications that use the Xlib backend.

X11 logo image taken from http://commons.wikimedia.org/wiki/Image:X11.svg.

Tags: , , , , , , , ,

15 Responses to “Cairo, Xlib, and the Shared Memory Extension”

  1. Steven Says:

    As a Firefox user on Linux, Thank you. Very promising work.

  2. Boris Says:

    Eric, I assume that when X and cairo are running on different machines we’ll just keep doing what we do now?

    Also, I was under the impression that we stored our image pixmaps on the server side precisely to avoid this sort of issue. Is that not the case?

  3. Eric Says:

    In response to the first question, if XShm is unavailable it will just fallback to using the normal method. This could happen for obvious reasons (remote X server) or for other reasons (interprocess memory segment creation failures).

    As for optimizations that already exist, even if the images were stored in a server pixmap, you still have to send the data to the server at least once. And as the results show, XShm would still result in a performance gain in that case.

  4. ohxten Says:

    Great info. Maybe this is why FF seemed laggier on *nix than Windows.

  5. Z_God Says:

    Is this why GTK over remote X11 is so rediculously slow? Because it is never tested without XShm?

  6. Tim Says:

    Web pages often involve many small images, so I am wondering if there will be a big improvement in general? It seems you could greatly improve the performance if you did not have to setup and tear down the shared memory segment each time. Could you use a custom pool allocator that reused the same shared mem ptr?

  7. markus Says:

    This sounds like a great news but I am sure some Xorg guys will hate it … :)

  8. vsync Says:

    Neat stuff. Good work, look forward to even faster Firefox and the like!

    Unrelated question, but you might be a good person to ask. I notice on my machine here that after some days of X11 usage including heavy Firefox browsing etc, my Xorg process crawls up to 900MB or so memory usage. Could this be from pixmap caching? Is there any way to tell what’s using the memory and purge it?

    It persists even after I close all apps, until I restart the X server itself.

  9. Bastiaan Says:

    Did you ever talk to any of the Cairo people about this?

    If you had, you might have noticed that they have an experimental branch with XShm support. However, the verdict is still out whether or not this actually benefits performance in the long run /for internal usage in Cairo/.

    They have done a lot of benchmarking. You might wish to simply continue that if you wish to work on this.

  10. Bodo Linux grafični programi postali hitrejši? « lukov blog Says:

    [...] popravek za knjižnico cairo, ki skrbi za vektorski izris grafike za Gtk+ programe kot za Firefox. Raziskava, ki jo je opravil in njeni rezultati pa so osupljivi, saj so s popravljeno cairo knjižnico, ki [...]

  11. Sean Says:

    Z_God: try changing your theme. GTK+ doesn’t do anything particularly bad for remote connections (neither does it do things particularly well, either), but a bad theme can easily slaughter your performance, as it might cause an unnecessary number of image draws and other operations to be performed.

    markus: given that the Xorg guys wrote the Xshm extension for this very reason, I doubt they will hate it. They’re not as stupid as this assholish blog post makes them out to be. The author might try to remember that X was originally designed in a day and age when image data was not anywhere close to the size it is today, and that without that network-centric design we wouldn’t have the power and flexibility of networked X that we have and use today. It isn’t “idiocy” to design X the way it is — it is at worst an oversight of the application and toolkit authors who ignore the Xshm extension.

    vsync: try xrestop, which exists for that very purpose. keep in mind that Linux is going to report a lot of memory usage for X11 that has nothing to do with your actual usage of RAM, such as video memory usage. Also keep in mind how memory allocation works, and that a large variety of memory allocations will never be released to the OS until the application ends. malloc() grows the heap for smaller block requests, and unless _all_ blocks in the end of the heap are freed, the application cannot shrink the heap. Larger allocations (like some — but not all — image data) are often allocated using a different technique that makes it possible to immediately free the memory back to the OS when the application is done with the block.

  12. William Lahti Says:

    Why do you not mention XRender… just using normal xlib pixmap surfaces to do this!? I mean, certainly XSHM surfaces would be very useful where you really must keep a few image surfaces around, but in most cases (especially for static graphics files) you can blit them from the image surface into your xlib pixmap and then destroy the original image surface. I have NO CLUE why exactly firefox isn’t using such a method— perhaps they need a cairo coach :-) ?

  13. William Lahti Says:

    Ahh I see you’ve already replied to that point. Well, sending the data one time is a HUGE difference from transferring per each change or even per each frame. In fact, XSHM looks extremely effective compared against images but how about you post a few benchmarks comparing xlib+xrender vs xshm and THEN we will see how impressive it is!

  14. Dustin Says:

    vsync: X will grow to INSANE memory usage if you have pixman 0.11.4, because it doesn’t free redrawn regions or something to that effect. Upgrade pixman and everything may be fine.

  15. Eric Butler’s Weblog » Blog Archive » Adventures in X Programming Says:

    [...] Eric Butler’s Weblog something about programming and game making and stuff like that « Cairo, Xlib, and the Shared Memory Extension [...]

Leave a Reply