Interacting with Eclipse through Py4J

One of the reasons I created Py4J is that I want to reuse the two Java projects I developed in recent years: Semdiff and Partial Program Analysis (PPA). These two technologies are built on top of Eclipse and I wanted a way to access Eclipse from a Python interpreter. Jython was not an option because I use a library, LXML, that is not compatible with Jython.

I created an Eclipse plug-in/feature/update site that embeds Py4J and that enables developers to access Eclipse. The update site will be release with Py4J 0.3, but early adopters can checkout the relevant projects from the subversion repository (look for projects starting with net.sf.py4j).

Once you include the net.sf.py4j plug-in in your dependencies, you can just create a GatewayServer instance like in the example on the front page.

Then, in python, you can interact with Eclipse:

>>> gateway = JavaGateway()
>>> ResourcePlugin = gateway.jvm.org.eclipse.core.resources.ResourcesPlugin
>>> workspaceRoot = ResourcePlugin.getWorkspace().getRoot()
>>> project1 = workspaceRoot.getProject('Project1')
>>> project1.isOpen()
True
>>> gateway.help(ResourcePlugin)
Help on class ResourcesPlugin in package org.eclipse.core.resources:

ResourcesPlugin extends org.eclipse.core.runtime.Plugin {
|  
|  Methods defined here:
|  
|  start(BundleContext) : void
...

Note

You do not need to add any other plug-in to your plug-in dependencies (e.g., org.eclipse.core.resources): Py4J can access any class defined in any plug-in loaded in Eclipse.

Instead of using the Py4J plug-in, you could just add the Py4J jar file to your Eclipse plug-in. If you use the jar file, you need to add the following property to your plug-in manifest file to make sure that Py4J can access the class declared in other plug-ins:

Eclipse-BuddyPolicy: global

Indeed, in Eclipse, every plug-in has its own class loader so Py4J cannot load the classes of other plug-ins by default. Adding this property enables Py4J to load plug-in classes and you can even access plug-ins that are not in your plug-in’s dependencies.

Experimenting with protobuf

I just spent a couple of hours experimenting with Protobuf to see if it could replace the current text-based protocol used by Py4J. Protobuf is a library from Google that makes it easy to serialize a structure composed of native fields (e.g., boolean, integer, double, string in UTF-8) into a binary stream. The structure can then be serialized/deserialized by programs written in Java, C++, Python, and .NET. I’ve been considering moving to Protobuf since the first version of Py4J but I wanted to invest my effort in user-visible features first.

After looking at the documentation of Protobuf and trying it, I found that the Java API is well developed, but that the Python API still lags behind (e.g., there is no built-in way in Python to send and receive a message over a stream with the size of the message first… This is a required feature if messages are exchanged over sockets). Although this is not a show stopper, I also did a small performance test where I serialized and deserialized 1000*3 messages using Protobuf and my custom text protocol using a Java Client and a Java Server over a local socket. I repeated this little experiment 10 times.

To my surprise, there was no significant time difference between the two: the text protocol was 500 ms faster, but over 44 seconds, this does not matter much to me right now. It should be noted that I tried to serialize worst-case messages for the text protocol. For example, a large integer is represented as multiple characters in the text protocol (e.g., 2’000’000 would take 7 bytes) whereas it takes no more than 4 bytes with Protobuf.

Before doing this performance test, I also tried to serialize typical messages that are sent with Py4J in my unit tests and they were always smaller in size when serialized with my text protocol.

There is no doubt in my mind that in the long term, Protobuf might outperform my text protocol. But as long as the Python API does not improve, I don’t have the motivation to spend long hours converting my protocol for no obvious performance gain instead of developing more useful features (e.g., callbacks).

Welcome to Py4J’s development blog

Welcome to Py4J’s development blog. We will post news and stories about Py4J here. Because Py4J is written in Java and Python, expect a fair amount of ramblings about the difference between these two languages 🙂

If you want to comment on Py4J or discuss the direction the project is taking, do not hesitate to post a comment on this blog, write an email to the mailing list or fill a feature request.