Py4J 0.10.5 released

Py4J 0.10.5 has just been released on PyPI, Maven Central, and bintray (p2 eclipse repository).

This is a backward-compatible release (unless you relied on rounded floats sent from Python 2.7 to Java).

  • Python side: added path for pip install –user
  • Python side: doubles are no longer truncated in Python 2.7
  • Python side: passing integers larger than Long.MAX_VALUE no longer stalls connection.
  • Java side: spurious server error signal no longer sent when GatewayServer is shutting down.
  • Java side: allow disabling automatic connection cleanup in CallbackClient.
  • Java side: return types of Python proxies are correctly converted to the expected type (e.g., a double can be converted to a float)
  • tickets closed for 0.10.5 release

As always, this release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

Py4J 0.10.4 released

Py4J 0.10.4 has just been released on PyPI, Maven Central, and bintray
(p2 eclipse repository).

This is a backward-compatible released with two bugfixes:

  • Setting a value in a Java array now correctly converts the Python type to Java type. For example, it is now possible to set a value in a java float array. Before that change, Python type and Java Array value type had to exactly match.
  • Java side: the serverError callback in the GatewayServerListener is no longer called when “java.net.SocketException: Socket closed” exception is raised when the GatewayServer is shutting down. The server_connection_error signal on the Python side was already ignoring that spurious error.
  • Tickets closed for 0.10.4 release

I knew I was being too optimistic about my plans for 0.11 and I now have to face the fact that I won’t be able to release Py4J 1.0 before the end of the year. I’ve been busy with many other community activities (i.e., reviewing conference papers) and I just resumed my work on creating a new binary protocol. I’m still very happy with how things are shaping up so stay tuned!

As always, this release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

Py4J 0.10.3 released

Py4J 0.10.3 has just been released on PyPI, Maven Central, and bintray (p2 eclipse repository).

This is a backward-compatible released with a focus on small utilities on the Python side and preparation steps for the major performance work of 0.11.0.

  • Python side: Added java_path option in launch_gateway. If None, will detect whether JAVA_HOME is set and use “JAVA_HOME/bin/java” instead of “java” to launch the JVM.
  • Python side: added “create_new_process_group” in “launch_gateway”. If True, will launch the JVM in a new process group which (1) prevents signals sent to the parent Python process to propagate to the child JVM process, and (2) does not kill the Java process if the Python process dies. This is a useful option if you want to interrupt a long-running Java method call from Python and you launched the JVM using launch_gateway. Such interruption has always been possible if you launched the JVM outside of Python.
  • Python side: introduced a small signals library. Users can now connect to signals emitted by the CallbackServer, which mirrors the events sent by GatewayServer on the Java side.
  • Python side: added “get_java_class” function which returns the java.lang.Class of a JavaClass. Equivalent to calling .class in Java, but from Python.
  • Python side: fixed the project root setup.py, which allows users to install Py4J with pip from the git repository. The root setup.py relied on compiled jars that are no longer provided. It now uses gradlew to build the required jars during the installation. Works on both Linux and Windows 🙂
  • Python side: fixed type conversion when passing a large negative integer.
  • Java side: added defensive programming to prevent concurrent modification of the listeners list (in case a listener removes itself after receiving an event).
  • Both sides: added more memory leak tests and fixed a potential memory leak related to listeners.
  • Both sides: added support for IPv6.
  • Created an official benchmark program to track Py4J speed. The results are available as a Google sheet and charts.
  • Eclipse: Replaced “Eclipse-BuddyPolicy: global” by “DynamicImport-Package: *” for greater compatibility with other OSGi frameworks. Thanks to @scottslewis and @jonahkichwacoders for their work on that.
  • GitHub 0.10.3 milesetone

In parallel to this release, I started testing various alternatives to the text protocol used by Py4J to support faster binary transfers. It is extremely easy to make small mistakes that double or triple the time to perform small operations, but I believe I found a new transport strategy that can significantly speed up large binary transfer while keeping the same performance for other operations.

0.11.0 will thus introduce a new transport mechanism and a non-backward compatible change that will allow Python classes implementing Java interfaces to easily implement equals/toString/hashCode methods. I hope to be able to make a release in two months, but considering the size of the task, this may take longer.

As always, this release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

Py4J 0.10.2 released

Py4J 0.10.2 has just been released on PyPI, Maven Central, and bintray (p2 eclipse repository).

This is a backward-compatible release with a focus on building a stronger test suite.

  • Both sides: added memory management options to disable garbage collection. This is useful if you create many short-lived py4j client/server pairs.
  • Both sides: fixed ClientServer to allow users to create multiple ClientServer instances. Thanks to @jonahkichwacoders for reporting the bug and helping diagnosing the issue.
  • Both sides: it is now possible to specify a python entry point when creating a CallbackServer. The CallbackClient on the Java side can then access the python entry point and drive the conversation. See the Advanced Topics guide for more information.
  • Both sides: fixed memory leak issue with ClientServer and potential deadlock issue by creating a memory leak test suite.
  • Both sides: fixed retry logic by only retrying if an error occurs on write (send command). Thanks to @jonahkichwacoders for raising the issue.
  • Both sides: the assemble gradle task, the Java test suite and the Python test suite now runs correctly on Windows.
  • Java side: added GatewayServerBuilder and ClientServerBuilder to ease the creation of these instances with many options. Thanks to @jonahkichwacoders.
  • A link to the contributing guide now appears when opening pull requests or issues.
  • Github 0.10.2 milestone .

This was a difficult release to make because I had to track down and then work around many subtle variations between the various versions of Python, Java, Windows, Linux, and Mac OSX. I’m also trying to clean up the code as much as possible for a 1.0 release.

If you did not already see it, I published a blog post on the roadmap to 1.0.

The next release will focus on binary transfer (e.g., transferring a numpy array) and I’ll need your help in shaping the API. I’ll post on this mailing list when I’m ready 🙂

As always, this release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

The Py4J Roadmap to 1.0

Py4J, a bidirectional bridge between Python and Java, has come a long way since the first release in December 2011 and yet, almost 7 years later, it still hasn’t reached the mythical 1.0 release. Let’s make sure that by December 2016, we reach this important milestone!

I released Py4J 0.10.0 in April 2016 and this marked an important milestone from a project maintenance perspective: Py4J has now coding conventions and a reliable build process with automated code quality checks for both Python and Java. The Java API is also relying more on interfaces and the architecture of what Py4J 1.0 should look like is finally stabilizing. Here are the two final steps I am planning to go through to get to 1.0:

Py4J 0.11 – Planned release date: August 2016

In addition to small features and bug fixes, there are two main features I want to add in this release:

Efficient transfer of binary data

A contributor already added a feature that allows the Python side to read raw bytes sent by the Java side through the same socket used to exchange commands between Python and Java. Although the API needs to be improved, I want to build on this feature to improve the API and provide a similar feature to the reverse direction (reading raw bytes sent by Python to Java). My hope is that this feature can then be extended by other contributors to implement things such as transferring large numpy arrays between Python and Java.

Stress testing – Memory leaks, thread leaks, connectivity testing

A few bugs reported in the past releases (0.9.2, 0.10.0, and 0.10.1) made me realize (1) how diverse the use cases of Py4J were and (2) how easy it was to make programming mistakes that would create memory leaks or negatively impact connectivity and performance. For example, one user reported he was quickly creating and shutting down ClientServer instances and another reported that calling a method with a 10 MB parameter was a standard use case for him. I already started creating a small benchmark to track the progression or regression of performance between releases, but I need to push this idea further by creating test suites that measure whether a release introduced a leak. I currently run these tests manually by modifying the code and I need a way to automate these testing, outside of the regular test suite.

I do not want to focus on performance too much for now, but having a robust benchmark and stress testing suite will be very helpful for a 1.0 release and will allow me or other contributors to work on performance once the API and architecture is stable.

Py4J 1.0 – Planned released date: December 2016

After 0.11.0 is released, I hope that users will try out the new binary/stream feature and I expect I’ll have to work on some of its kinks. There are three main areas I want to work on for 1.0:

Moving to the org.py4j namespace

Currently, the Java package is “py4j” and the Maven artifact is “net.sf.py4j” because artifacts are usually fully qualified and Py4J started its life on SourceForge. Between 0.8.2.1 and 0.9, I moved the Py4J project from SourceForge to py4j.org, so when I released Py4J as two OSGi bundles in 0.10.1, I selected a name that was in line with the new project namespace: org.py4j.java and org.py4j.python.

1.0 is thus a good time to change the main Java package to org.py4j and move from net.sf.py4j to org.py4j.java on Maven. This should help standardize the Py4J name and it is more in line with usual Java coding conventions. This will unfortunately break the code of anyone using Py4J, but I expect the change to be straightforward for most users (add org. in your import statements or just use the Optimize/Organize Import feature in your IDE).

Deprecating some of the current API

One comment I often hear from users is that it’s difficult to wrap their head around the Py4J process and memory model. When users want to contribute, they also struggle with the name of the main classes. What is a “Gateway”? Why is the Python side having a “callback server” when Java is initiating the calls and Python is making the callbacks to Java? I experimented with a few names and I believe that the terms “PythonClient” and “JavaServer” for the Java side and “PythonServer” and “JavaClient” for the Python side are easier to understand for users than “JavaGateway”, “GatewayServer”, “CallbackClient”, and “CallbackServer”. They are also highly representative of the Py4J model compared to Jython or JPype.

With 1.0, I want the API to be more approachable and I want the classes that everyone uses to have more meaningful names. BUT I do not want to put too much pressure on existing users and I’ll do my best to keep the existing classes and just deprecate them with pointers to the new ones.

If you have ideas or opinions on how to name the various classes, do not hesitate to hop on the mailing list. I’ll also be posting the names I believe are the best and gather feedback before making a final decision. 

A new web site and new documentation

I believe that Py4J has relatively good documentation that strikes a balance between reference documentation (API doc, Javadoc), and a manual with how tos and examples. But it utterly fails in guiding new users, especially if they are new to Java or Python, in creating their first program with Py4J.

I want to focus on the “getting started” experience more so that new users can do cool things very quickly in a few lines of code. Most of the existing documentation will be reused and reorganized, but having clear walkthroughs for beginners and the smallest possible working code example on the front page will help grow the user base AND hopefully decrease the number of questions I get related to how to run javac!

And let’s face it, responsive web sites were just beginning in 2009, but they are now the norm and py4j.org is unusable on a mobile device while RTD has already solved this problem 🙂

What does 1.0 mean?

I want releases after 1.0 to maintain backward compatibility for the main classes so users do not have to adapt their code. Py4J has been mostly backward compatible throughout its history, but the latest releases broke interfaces, a change needed to make the codebase more extensible.

After 1.0, there are a few areas I want to work on, but it will depend on the needs and interests of the community:

Performance

This is a very large topic because the use cases vary a lot and optimizing one use case might penalize another one. I have a few ideas though: using small caches to reduce the number of times we use the Java reflection API, exploring the use of a binary protocol (protobuf performance has improved a lot in Python since I last tried it), taking a few public use cases of Py4J and profiling them to find what is slow, relying on efficient byte[] transfer to pass large integers, floats, etc.

Better support for JVM languages and Java 8

I sometimes get reports of people writing programs in Groovy or Scala and having a hard time using Py4J. I also get questions about “new” features of Java that do not have a clear mapping with Py4J. Having a few examples of programs in Groovy, Scala and programs using new Java features will go a long way toward increasing the feature set of Py4J.

The role of funding on Py4J

Let’s close this roadmap by mentioning the role of funding on Py4J. I work on Py4J in my spare time and with a young kid and a relatively new company, I don’t have much spare time! When a company like kichwacoders funds Py4J to see a feature implemented, it allows me to spend continuous hours at my work on Py4J and I can tackle difficult problems that cannot be divided in 30-minute/1-hour buckets. It benefits all users because I often have to think about the overall architecture and then, when I resume my work on Py4J in my spare time, I believe my contributions look more focused and directed toward a structured goal instead of looking like a bunch of unrelated patches and quick fixes.

I am not asking for donations: keep those for people who really need it. But if your company or institution uses Py4J and you want a new feature or you want to make sure a feature on the roadmap is implemented, consider contracting Resulto, the company I work for. We are an incorporated business and produces an invoice so it’s easy for a company to expense.

Comments? Questions on the roadmap?

If you have any comments to share about this roadmap or if you believe important features should be part of 1.0, do not hesitate to share your thoughts on the mailing list or privately at barthelemy at infobart dot com.

Thanks for your interest in Py4J: the community has grown a lot and I am trying my best to be a good project steward.

Py4J 0.10.1 released

Py4J 0.10.1 has just been released on pypi and maven central.

This is a backward-compatible release with important bugfixes and features:

  • Major performance fix: the Python side is now using default buffering when reading responses from the Java side. This is particularly important if you transfer large parameters (large strings or byte arrays). A simple benchmark found that repeatedly sending 10 MB strings went from 99 seconds to 1 second. Thanks to @kaytwo for finding this bug and suggesting a fix.
  • Both the Java and the Python libraries are now available as OSGi bundles. Thanks to kichwacoders for funding the work.
  • The 0.10.0 jar uploaded to PyPI wrongly required Java 8. The Java compatibility has been restored to 1.6. Thanks to @agronholm for finding this bug.
  • Added the __version__ attribute in the py4j package to conform to PEP396. Thanks to @lessthanoptimal for reporting this bug.
  • Github 0.10.1 milestone

This release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

Py4J 0.10.0 Released

Py4J 0.10.0 has just been released on pypi and maven central.

This is a mostly backward-compatible release with many new features:

  • Added a new threading model that is more efficient with indirect recursion between Java and Python and that enables users to control which thread will execute calls. See the advanced topics guide for more information. Thanks to kichwacoders for funding the implementation and providing the initial idea.
  • Added TLS support to encrypt the communication between both sides. Thanks to @njwhite.
  • Added initial byte stream support so Python can consume Java byte streams more efficiently. Support is still preliminary and subject to change in the future, but it provides a good base to build on. See these a Python unit test and a Java example class for a small example. Thanks to @njwhite.
  • Java side: converted build script from ant to gradle. Introduced Java coding conventions and static code analysis. See the new Java Coding Conventions for more details. ant is still supported for now, but it will be removed when Py4J reaches 1.0.0.
  • Java side: it is now possible to build a osgi bundle and an Eclipse update site from Py4J source. See the documentation section about using Py4J with Eclipse
  • Github 0.10.0 milestone

This release is backward compatible for the user public API, but the internal protocol has changed, new interfaces have been extracted and a few Java interfaces (e.g., py4j.Command) used to extend Py4J have been modified.

This release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

In the coming weeks, I’ll release a public roadmap to Py4J 1.0, which I want to release before the end of 2016.

Py4J 0.9.2 Released

Py4J 0.9.2 has just been released on pypi and maven central.

This is a backward-compatible release with a few bugfixes:

  • Python side: added a guard condition in object finalization to
    prevent exceptions when the program exits (long standing bug!).
  • Python side: The daemonize_redirect flag is not set to True by default to preserve backward compatibility prior to 0.9.
  • Java side: Py4J will use the current thread’s classloader instead of
    the root classloader to load a class from a fully qualified name. This
    behavior is configurable globally in py4j.reflection.ReflectionUtil.
    Thanks to @JoshRosen.
  • Documentation: made a simpler and easier to understand example of
    callback (Java calling Python)
    .
  • Github 0.9.2 milestone

If you see any backward-incompatible changes, do not hesitate to fill a bug report.

This release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

The next release major release, 0.10, will bring two exciting features
to Py4J: a new optional threading model that is more efficient with
recursion and callbacks and support for reading a byte stream from
Java by a Python client.

Unfortunately, the 0.10 release will break backward compatibility if
you extend Py4J (custom commands, GatewayServer listener,
GatewayServer extension), but it will be easy to adapt your code. If
you simply use Py4J by creating instances of GatewayServer and
JavaGateway, your code will work without any changes.

Py4J 0.9.1 Released

Py4J 0.9.1 has just been released on pypi and maven central.

This is a backward-compatible release with many important bugfixes:

  • Python side: it is now possible to retrieve the listening address and port of the CallbackServer. This is useful if CallbackServer is bound to port 0.
  • Python side: The daemonize_redirect flag is not set to True by default to preserve backward compatibility prior to 0.9.
  • Python side: JavaGateway.shutdown() no longer raises unecessary NoneType exceptions.
  • Python side: if you attempt to access an inexistent object on the Java side, you will receive a more meaningful exception.
  • Python side: the callback server was not correctly closing sockets and it was possible to leak sockets until no more were available. This has been fixed.
  • Java side: the finalization code telling the Python side that it can garbage collect a python proxy should not longer block (major bug fix).
  • Java side: After GatewayServer is launched, it is now possible to change the address:port where the CallbackClient connects.
  • Added a comment in an empty init file so 7zip does not report on error on Windows (go figure 🙂 )
  • We moved from Travis CI to Circle CI and the automated tests now reliably pass.
    tickets closed for 0.9.1 release

If you see any backward-incompatible changes, do not hesitate to fill a bug report.

This release has been made possible by the generous contributions from many users. Every bug reports, patches, pull requests, ideas or help on the mailing list is greatly appreciated.

Professional Services for Py4J

In the last years, I received several requests on GitHub or on my private email to implement difficult or niche features or to provide commercial support.

I’m happy to announce that the company that I work for, Resulto, has agreed to provide professional services for Py4J. This means that if you want (1) a feature to be implemented quickly, (2) a special license that comes with support, or (3) custom integration with your code, you can hire Resulto to do the job.

I, Barthelemy, will not stop developing Py4J for free, quite the opposite. It’s just that I don’t have the time to invest on large or niche features that require uninterrupted hours of R&D/thinking. In the end, this should greatly benefit Py4J users because new features will be introduced in the codebase and bugs will inevitably be fixed along the way.

If you are interested in professional services for Py4J, please get in touch with us at py4j@resulto.ca

Short FAQ about this offer:

1. Will you stop supporting Py4J or implementing features for free?

No. I’ll continue with exactly the same schedule: intense bursts of open source activity followed by answering bug reports and merging pull requests only. If you want a complex feature to be implemented and I don’t have the time on my personal time, you are welcome to implement it and make a pull request.

2. If I hire Resulto to implement a feature, how does the copyrights and licensing work?

Unless the feature integrates with a proprietary part of your system, we prefer to make the work open source following Py4J’s implicit contributor agreement.

3. Who is Resulto?

We are a small software development company that started more than a year ago. Our main product is a personalized marketing/customer retention platform, LoyalAction. We also offer professional services to build custom platforms and infrastructure. If you hire Resulto to work on Py4J, I’ll likely lead the effort, but other developers may also participate.

As a bonus, we are located in Canada, so our currency is extremely cheap these days 😀

Do not hesitate to contact me (personal: barthelemy@infobart.com, work: py4j@resulto.ca) if you have more questions or if you are interested in the services Resulto can provide.