Context Resuming

Poor souls like me who work on their open source projects in their spare time sometimes suffer from a form of contextus resumis. You know, when you sit down, think of all the time you can finally spend on your favorite project, and then, realize, horror-struck, that you don’t know how to resume the task you were working on two weeks ago?

This is particularly an issue when you are working on core tasks that affect most parts of the project and that have deep design implications. My guess is that they are also the kind of tasks that cause contextus resumis: small tasks can (and should) generally be completed in one coding session.

Sure, there are software solutions like Mylyn that can make your IDE look like the way it was when you started to work on your task, but I always found that this kind of solution did not work well for system-wide tasks. What would Mylyn do? Open up all source files of Py4J? Anyway, Mylyn is not an option right now because it cannot connect to SourceForge’s trac installations, a problem that has been known for 9 months now.

One obvious solution is to divide your big task into smaller tasks (I know, you wanted to shout this from the beginning). But I’m currently changing the network and threading models of Py4J and these two models cannot be separated from each other. They also impact both the Java and the Python sides and these changes are part of a bigger redesign effort to enable Java code to callback Python code (more on this in the next post).

Do you have any tips or tricks to share?

Interacting with Eclipse through Py4J

One of the reasons I created Py4J is that I want to reuse the two Java projects I developed in recent years: Semdiff and Partial Program Analysis (PPA). These two technologies are built on top of Eclipse and I wanted a way to access Eclipse from a Python interpreter. Jython was not an option because I use a library, LXML, that is not compatible with Jython.

I created an Eclipse plug-in/feature/update site that embeds Py4J and that enables developers to access Eclipse. The update site will be release with Py4J 0.3, but early adopters can checkout the relevant projects from the subversion repository (look for projects starting with net.sf.py4j).

Once you include the net.sf.py4j plug-in in your dependencies, you can just create a GatewayServer instance like in the example on the front page.

Then, in python, you can interact with Eclipse:

>>> gateway = JavaGateway()
>>> ResourcePlugin =
>>> workspaceRoot = ResourcePlugin.getWorkspace().getRoot()
>>> project1 = workspaceRoot.getProject('Project1')
>>> project1.isOpen()
Help on class ResourcesPlugin in package org.eclipse.core.resources:

ResourcesPlugin extends org.eclipse.core.runtime.Plugin {
|  Methods defined here:
|  start(BundleContext) : void


You do not need to add any other plug-in to your plug-in dependencies (e.g., org.eclipse.core.resources): Py4J can access any class defined in any plug-in loaded in Eclipse.

Instead of using the Py4J plug-in, you could just add the Py4J jar file to your Eclipse plug-in. If you use the jar file, you need to add the following property to your plug-in manifest file to make sure that Py4J can access the class declared in other plug-ins:

Eclipse-BuddyPolicy: global

Indeed, in Eclipse, every plug-in has its own class loader so Py4J cannot load the classes of other plug-ins by default. Adding this property enables Py4J to load plug-in classes and you can even access plug-ins that are not in your plug-in’s dependencies.

Experimenting with protobuf

I just spent a couple of hours experimenting with Protobuf to see if it could replace the current text-based protocol used by Py4J. Protobuf is a library from Google that makes it easy to serialize a structure composed of native fields (e.g., boolean, integer, double, string in UTF-8) into a binary stream. The structure can then be serialized/deserialized by programs written in Java, C++, Python, and .NET. I’ve been considering moving to Protobuf since the first version of Py4J but I wanted to invest my effort in user-visible features first.

After looking at the documentation of Protobuf and trying it, I found that the Java API is well developed, but that the Python API still lags behind (e.g., there is no built-in way in Python to send and receive a message over a stream with the size of the message first… This is a required feature if messages are exchanged over sockets). Although this is not a show stopper, I also did a small performance test where I serialized and deserialized 1000*3 messages using Protobuf and my custom text protocol using a Java Client and a Java Server over a local socket. I repeated this little experiment 10 times.

To my surprise, there was no significant time difference between the two: the text protocol was 500 ms faster, but over 44 seconds, this does not matter much to me right now. It should be noted that I tried to serialize worst-case messages for the text protocol. For example, a large integer is represented as multiple characters in the text protocol (e.g., 2’000’000 would take 7 bytes) whereas it takes no more than 4 bytes with Protobuf.

Before doing this performance test, I also tried to serialize typical messages that are sent with Py4J in my unit tests and they were always smaller in size when serialized with my text protocol.

There is no doubt in my mind that in the long term, Protobuf might outperform my text protocol. But as long as the Python API does not improve, I don’t have the motivation to spend long hours converting my protocol for no obvious performance gain instead of developing more useful features (e.g., callbacks).

Welcome to Py4J’s development blog

Welcome to Py4J’s development blog. We will post news and stories about Py4J here. Because Py4J is written in Java and Python, expect a fair amount of ramblings about the difference between these two languages 🙂

If you want to comment on Py4J or discuss the direction the project is taking, do not hesitate to post a comment on this blog, write an email to the mailing list or fill a feature request.