Why did I write yet another package manager?

Batis is a package manager for desktop applications on Linux. Linux users are probably already used to using several different package managers - apt, pip, npm, etc. - so why on earth do we need one more?

Firstly, having many package managers is not a problem. There are many different kinds of things we want to install. Pip, for instance, knows how to install python packages; if you want to install a ruby package, you'll need another tool. The atom text editor includes a package manager, apm, purely for its own extensions, and other extensible applications, like browsers, are quietly doing their own package management too. You can try to build one package manager to rule them all, as Linux distros do, but that system has to be more complex. ‘Do one thing well’ applies to package managers too.

The best way to distribute desktop applications on Linux at present is through distro repositories. The idea is that power users from each distro who like your application will prepare packages, making it easily available for other users of the same distro. But this has several drawbacks for application developers:

  • You’re not in control of your distribution channels - if a distro doesn’t like your app, they can block their users’ easy access to it.
  • Feature releases of your app are tied to releases of the entire distro. Launch version 2.0 at the wrong time, and even the keen Ubuntu users who update on release day it comes out won’t see your new features for six months. Some users will be stuck on old versions for years. There are exceptions, but this is the norm in the major Linux distributions.
  • If you want to bypass the distros and host your own repositories, you have to deal with several different packaging systems - at a minimum, .deb and .rpm packages. Desktop Linux is a small market anyway, and this fragmentation makes it even more painful to target.
  • Your install instructions either have to be vague, or contain a table of commands for different distros (like 0 A.D. or git-cola). Neither is ideal for less technical users.
  • Distro packages can usually only be installed as root. This often doesn’t matter for desktop use, where the user is usually the owner of the computer, but it can be an annoying restriction.

The tools for packaging modules in many programming languages can also be used to package command line applications. You've probably seen utilities that you install using pip (e.g. Nikola), npm (bower), or gem (Jekyll). But these tools don't know about creating menu entries or file associations, so they're not great for distributing graphical applications.

Many applications forgo all of these packaging mechanisms, and distribute tarballs or zip files from their own website (e.g. Powder Toy, PyCharm, Visual Studio Code). This is the starting point for Batis. Batis adds a consistent way to install and uninstall applications, so that developers can focus on their applications, not on rewriting Linux install code. A Batis package is a regular tarball with some extra files, so there’s no need to build another tarball for users without Batis. You even get a free install.sh script inside your package for those users to run.

Batis adds one extra layer above tarball downloads, the index file. This is a JSON file containing the URLs of the tarballs for download, along with some basic metadata. Batis uses this to select the best build to install - for instance if you have separate builds for 64 bit and 32 bit systems. In future versions, the index will also be used to check for updates to installed applications.

So, Batis is a distro-agnostic way for users to get applications directly from developers. It works with the standard mechanisms to integrate applications into the desktop environment. And it's an evolutionary improvement on distributing plain tarballs.

Batis website—for more information

So you want to write a desktop app in Python

This is an overview of the best tools and the best resources for building desktop applications in Python.

First things first. You can build great desktop applications in Python, and some are widely used (like Dropbox). But you'll have to find your own way much more than you would using Microsoft's or Apple's SDKs. The upside is that, with a bit of legwork to package it appropriately, it's quite feasible to write a Python application that works on all the major platforms.

GUI toolkits

The first thing you'll need to choose is a GUI toolkit.

Qt logo
  • For traditional desktop UIs, Qt is a clear winner. It's powerful, looks native on all the major platforms, and has probably the biggest community. There are two different Python bindings: PyQt is older and more mature, but it's only free if your application is open source (licensing), while PySide is newer and more permissively licensed (LGPL). I refer to the main Qt docs a lot - the C++ examples mostly translate to Python quite well - but both PyQt's and PySide's docs contain some useful information. Qt Designer is a drag and drop interface to design your UI; you can compile its .ui files to Python modules with the pyuic command line tool.

Qt Designer in action

Kivy logo
  • For attractive, tablet-style interfaces, Kivy is the right choice. It's a fairly young but promising system. If you want to bring your application to tablets and smartphones, then Kivy is the only option that I'm aware of. More info
  • When you want a basic GUI and don't care about aesthetics, Tkinter is a simple option. It's installed as part of Python. Python's own tkinter documentation is rather minimal, but it links to a bunch of other resources. This site is my favourite - it hasn't been updated in years, but then neither has Tkinter (except that in Python 3, you import tkinter rather than import Tkinter).
  • pygame is popular for building simple 2D games. There are also frameworks for 3D graphics (pyglet, Panda3d), but I don't know much about them.
  • An increasingly popular option is to write your application as a local web server, and build the UI in HTML and Javascript. This lets you use Python's large ecosystem of web frameworks and libraries, but it's harder to integrate with desktop conventions for things like opening files and window management. CEF Python lets you make a window for your application, based on Google Chrome, but I haven't tried that.

A couple of alternatives I wouldn't recommend unless you have a reason to prefer them: GTK is popular on Linux, but it looks ugly on other platforms. The older pygtk bindings have excellent documentation; the newer PyGObject system, which supports recent versions of GTK and Python, doesn't (though it's getting better). wx seems to have a good community, but development is slow, and new projects that could have used it now mostly seem to pick Qt.

Packaging and Distribution

This is probably the roughest part of making an application in Python. You can easily distribute tools for developers as Python packages to be installed using pip, but end users don't generally have Python and pip already set up. Python packages also can't depend on something like Qt. There are a number of ways to package your application and its dependencies:

  • Pynsist, my own project, makes a Windows installer which installs a version of Python that you specify, and then installs your application. Unlike the other tools listed here, it doesn't try to 'freeze' your application into an exe, but makes shortcuts which launch .py files. This avoids certain kinds of bugs.
  • cx_Freeze is a freeze tool: it makes an executable out of your application. It works on Windows, Mac and Linux, but only produces the executable for the platform you run it on (you can't make a Windows exe on Linux, for example). It can make simple packages (.msi for Windows, .dmg for Mac, .rpm for Linux), or you can feed its output into NSIS or Inno Setup to have more control over building a Windows installer.
  • PyInstaller is similar to cx_Freeze. It doesn't yet support Python 3 (update: it does now, since October 2015), but it does have the ability to produce a 'single file' executable.
  • py2app is a freeze tool specifically for building Mac .app bundles.
  • py2exe is a Windows-only freeze tool. Development stopped for a long time, but at the time of writing there is some recent activity on it.

Linux packaging

Although some of the freeze tools can build Linux binaries, the preferred way to distribute software is to make a package containing just your application, which has dependencies on Python and the libraries your application uses. So your package doesn't contain everything it needs, but it tells the package manager what other pieces it needs installed.

Unfortunately, the procedures for preparing these are pretty complex, and Linux distributions still don't have a common package format. The main ones are deb packages, used by Debian, Ubuntu and Mint, and rpm packages, used by Fedora and Red Hat. I don't know of a good, simple guide to packaging Python applications for either - if you find one or write one, let me know.

You can get users to download and install your package, but if you want it to receive updates through the package manager, you'll need to host it in a repository. Submitting your package to the distribution's main repositories makes it easiest for users to install, but it has to meet the distro's quality standards, and you generally can't push new feature releases to people except when they upgrade the whole distribution. Some distributions offer hosting for personal repos: Ubuntu's PPAs, or Fedora's Fedorapeople repositories. You can also set up a repository on your own server.

If you don't want to think about all that, just make a tarball of your application, and explain to Linux users next to the download what it requires.


  • Threading: If your application does anything taking longer than about a tenth of a second, you should do it in a background thread, so your UI doesn't freeze up. Be sure to only interact with GUI elements from the main thread, or you can get segfaults. Python's GIL isn't a big issue here: the UI thread shouldn't need much Python processing time.
  • Updates: Esky is a framework for updating frozen Python applications. I haven't tried it, but it looks interesting.

ASTsearch - code searching that knows about code

This weekend's hack is a tool for searching Python code.

ASTsearch source code on Github

What's wrong with grep, you might ask? Let's try to find every division in IPython's codebase:

$ grep --include "*.py" -rF "/" .
config/loader.py:        after applying any insert / extend / update changes
config/configurable.py:                    # ConfigValue is a wrapper for using append / update on containers
config/tests/test_loader.py:        argv = ['--a=~/1/2/3', '--b=~', '--c=~/', '--d="~/"']
config/tests/test_loader.py:        self.assertEqual(config.a, os.path.expanduser('~/1/2/3'))
config/tests/test_loader.py:        self.assertEqual(config.c, os.path.expanduser('~/'))
config/tests/test_loader.py:        self.assertEqual(config.d, '~/')

In all, it finds 1685 lines, and very few of them are actual division. You could write a regex that tries to ignore comments and strings, but now you have two problems.

Let's do the same with ASTsearch:

$ astsearch "?/?"
 646|        shalf = int((string_max -5)/2)

1254|        return h / i

 347|        whalf = int((width -5)/2)

The output is 89 lines, and when spacing and filenames are removed, there are 46 results, all of which represent division operations.

In this case, grep produced a lot of false positives. In other cases, it will have false negatives—results that you wanted but didn't find. a=1 won't match a= 1, and "this" won't match 'this'. For simple cases, regexes can help (a\s*=\s*1), but they soon get unwieldy. ASTsearch is insensitive to how you format your code: even statements split over several lines are easy to find.

How does it work?

The string pattern—?/? in the example above—is turned into an AST pattern. ASTs, or Abstract Syntax Trees, are a structured representation of a formal language such as Python source code.

? is a wildcard, so ?/? means "anything divided by anything". I picked ? for this because it's not used in Python syntax, so it doesn't stop you writing more specific search patterns.

Some more patterns:

  • a = ? - Something is assigned to a
  • class ?(TemplateExporter): ? - A subclass of TemplateExporter
  • for ? in ?: ? \nelse: ? - A for loop with an else clause

Then it walks the directory, parsing each file with a .py extension using Python's built in parser. The standard library ast module contains the tools to parse the code and walk the AST, and astcheck, another tool I wrote, can compare AST nodes against a template.

Besides the command line interface, you can also use ASTsearch as a Python module (import astsearch). It's possible to define complex search patterns in Python code that can't be written at the command line. See the docs for some more details.

What's the catch?

ASTsearch only works on Python files, and Python files that are entirely valid syntax (that's Python 3 syntax for now). If just the last line can't be parsed, it won't find any matches in that file.

It's slower than grep, because what it's doing is much more complex, and grep is highly optimised. But Python's parser is doing most of the hard work, and that's written in C. On my laptop, scanning the IPython codebase (about 100k lines of code) takes about 3.5 seconds—definitely not instant, but far faster than I can think about even a couple of results.

There are search patterns you can't express at the command line. For instance, you can't match function calls with a specific number of arguments (but you can find function definitions with a given number of arguments: def ?(?, ?): ?). I might extend the pattern mini-language once I've got a feel for what would be useful.

How do I install it?

pip install astsearch

Readable Python coroutines

Quick exercise: write a piece of code that, each time you pass it a word (a string), tells you if you've passed it that word before. If you're reading a post with a title like this, it shouldn't take you more than a few minutes. For bonus points, have an option to ignore case, so it counts 'parrot' and 'Parrot' as the same word.

What did you go for? A function with a global variable (yuck!)? A class with a method? A closure?

How about a coroutine? Here's what that would look like:

def have_seen(case_sensitive=False):
    seen = set()

    res = None
    while True:
        word = (yield res)
        if not case_sensitive:
            word = word.lower()

        res = (word in seen)

And here's how you would use it:

>>> hs = have_seen()
>>> next(hs)  # prime it
>>> hs.send('Hello')
>>> hs.send('World')
>>> hs.send('hello')

Coroutines in Python are based on the generator machinery - see the yield keyword in there? PEP 342, "Coroutines via Enhanced Generators", added the necessary features to Python 2.5, but it's not a very well known part of the language. And it's not hard to see why - the code above isn't as clear as it should be:

  • Emitting and receiving a value happen in the same yield expression. So rather than yielding the response at the bottom of the loop, we have to store it in a variable and jump back to the top of the loop.
  • The coroutine has to emit a value before it can receive one, even though there's nothing it really wants to emit. That's why we set res = None before the loop, and why the caller has to prime it by calling next(hs) before using it. It's easy to write a decorator that calls next for you, but that doesn't make the code inside the coroutine any clearer.

So the standard Python syntax is rather awkward. But we can make it clearer by using a bit of wrapper code. The trick is separating sending a value from receiving one:

from coromagic import coroutine, receive

def have_seen2(case_sensitive=False):
    seen = set()

    while True:
        word = (yield receive)
        if not case_sensitive:
            word = word.lower()

        yield (word in seen)

We no longer need the res variable. Instead, we alternate between two uses of yield: a receiving yield, where we send the wrapper a token to indicate that we're ready for a new value, and a sending yield, where we don't expect to get a value back. The caller can use this in exactly the same way as the original coroutine, except that the wrapper primes it automatically, so there's no need to call next(hs).

The wrapper expects a receiving yield first, and at most one sending yield after each receiving yield. If a receiving yield is followed by another receiving yield, without a sending yield inbetween, None is returned to the caller, just like a function without a return statement.

Handling exceptions

If either of our coroutines above raises an exception, we can't keep using that coroutine:

>>> hs.send(12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "coro_ideas.py", line 8, in have_seen
    word = word.lower()
AttributeError: 'int' object has no attribute 'lower'
>>> hs.send('hi')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

I've got a solution of sorts for that, although it still feels a bit awkward. The coroutine can request a context manager to catch exceptions:

from coromagic import get_exception_context

def have_seen3(case_sensitive=False):
    exception_context = (yield get_exception_context)
    seen = set()

    while True:
        with exception_context:
            word = (yield receive)
            if not case_sensitive:
                word = word.lower()

            yield (word in seen)

The context manager co-ordinates with the wrapper to suppress the exception inside the coroutine, but raise it to the caller:

>>> hs3 = have_seen3()
>>> hs3.send(12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./coromagic.py", line 28, in send
    raise self.last_exc
  File "./coro_ideas.py", line 47, in have_seen3
    word = word.lower()
AttributeError: 'int' object has no attribute 'lower'
>>> hs3.send('hi')

Now the error doesn't stop us processing valid input afterwards.

Who cares about coroutines?

I find them interesting on their own. But this isn't just academic - there are cases where coroutines can be the clearest way to write something.

The have_seen example could easily be written with a class or a closure. Coroutines come into their own for making state machines. With a class or a closure, the state has to be stored in a variable, and you need a lookup table to decide how to behave in each state. A coroutine can store the state as the point where its code is executing.

It's hard to come up with an example of this that's both realistic and short, but here's my attempt. We're writing a plugin for a chat application, which lets any chatter say "password foo", silencing everyone until someone guesses "foo". The application just passes us each message, and expects a True/False response saying whether it should be broadcast.

def password_game():
    while True:
        # Normal chatting
        while True:
            msg = (yield receive)
            if msg.startswith("password "):
                password = msg[9:]
                yield False
            yield True  # Broadcast

        # Waiting for someone to guess the password
        while (yield receive) != password:
            yield False # Don't send messages
        yield True   # Show everyone the password once it has been guessed

In IPython, we have some coroutines for input processing. For instance, the transformer to strip prompts from pasted code processes the first two lines in a prompt-detection state. Then it moves into a prompt-stripping state if it detected a prompt, or a no-op state if it didn't.

The pattern of sending and receiving is also reminiscent of writing a thread with input and output queues, and waiting for values on those queues. But threads are messy: you have to deal with synchronisation and shut them down safely. Calling a cororoutine is as deterministic as calling a function: it runs, returns a value, and the calling code carries on. Of course, that means that coroutines themselves don't run in parallel. But you can use them to build clever things like tulip, which will become the asyncio module in Python 3.4. Tulip can suspend one coroutine and run others while it waits for data, and then resume it when the data it needs is ready.

The best resource on coroutines in Python is this excellent course by David Beazley.

Coromagic source code

This is the module used in the examples above.


from functools import wraps

# Tokens
receive = object()
get_exception_context = object()

class CoroutineWrapper(object):
    last_exc = None

    def __init__(self, generator):
        self.gen = generator

        ready = next(self.gen)
        if ready is get_exception_context:
            ready = self.gen.send(ExceptionContext(self))
        assert ready == receive

    def send(self, arg):
        self.last_exc = None

        res = self.gen.send(arg)
        if res is receive:
            res = None
            assert next(self.gen) is receive

        if self.last_exc is not None:
            raise self.last_exc

        return res

def coroutine(genfunc):
    """Decorator for a generator function to wrap it as a coroutine."""
    def wrapped(*args, **kwargs):
        return CoroutineWrapper(genfunc(*args, **kwargs))

    return wrapped

class ExceptionContext(object):
    def __init__(self, corowrapper):
        self.corowrapper = corowrapper

    def __enter__(self):

    def __exit__(self, type, value, tb):
        if type is None:
        if type is GeneratorExit:
            return False

        # Pass other exceptions to the wrapper, and silence them for now
        self.corowrapper.last_exc = value
        return True