Py4J 0.8.2.1 Released

Hi,

this is a minor bugfix release.

This release includes the following bugfixes:

  • Fixed constructors not being able to pass proxy (python classes implementing Java interfaces).
  • Java 6 compatibility was restored in compiled jar file.
  • Fixed unit tests for JDK 8.
  • Added a few extra paths to find_jar_path.

The details of the specific issues for Py4J 0.8.1 are available on GitHub.

Installing Py4J is one pip away: pip install py4j

Again, feel free to contact me or to write a feature request on GitHub!

For more information about Py4J:

Cheers,
Barthélémy

Py4J 0.8.1 Released

Hi,

this is a minor release featuring the long awaited availability of Py4J on maven central repository.

This release includes the following new features:

  • Fixed a bug in type inference when interface hierarchy is deeper than abstract class hierarchy.
  • Added a utility method is_instance_of in py4j.java_gateway to determine if a JavaObject is an instance of a type.
  • Released Py4J in central Maven repository. I’m still waiting for a final confirmation from sonatype before they enable the automatic syncing of Py4J maven releases.

The details of the specific issues for Py4J 0.8.1 are available on GitHub.

Installing Py4J is one pip away: pip install py4j

Again, feel free to contact me or to write a feature request on GitHub!

For more information about Py4J:

Cheers,
Barthélémy

Py4J 0.8 Released

After two years, Py4J 0.8 has just been released!

Although I merged pull requests and fixed bugs during these two years, my Ph.D., wedding, and new job made it difficult to find the time to make a proper release. All happy events that made me busy life busier 🙂

This release includes the following new features:

  • Major fix to the Java byte[] support. Thanks to @agronholm for spotting this subtle but major issue and thanks to @fdinto from The Atlantic for providing a patch!
  • Ability to fail early if the py4j.java_gateway.JavaGateway cannot connect to the JVM.
  • Added support for long primitives, BigDecimal, enum types, and inner classes on the Java side.
  • Set saner log levels
  • Many small bug fixes and API enhancements (backward compatible).
  • Wrote a section in the FAQ about security concerns and precautions with Py4J.
  • Added support of Travis-CI and cleaned up the test suite to remove hardcoded paths.

The specific issues are discussed on GitHub.

Installing Py4J is one pip away: pip install py4j

Again, feel free to contact me or to write a feature request on GitHub!

For more information about Py4J:

Cheers,
Barthélémy

 

Py4J Backlog, Bytes, and Open Source

Since my Ph.D. thesis is being printed right now, I thought I could give a status update on Py4J.

One Py4J contributor/user reported a problem with how Py4J handles byte arrays almost a year ago. Because Py4J was treating byte arrays as any other arrays (i.e., a reference), access to individual cells in the arrays were costly (one roundtrip per access). Byte arrays are special beasts because when you go down to the level of bytes, you usually want the raw power and the hanging rope that come with it: you certainly don’t want the programming language or a particular library to stand in your way. Because Py4J uses a String protocol (e.g., newlines are used as separators), transferring raw bytes would require a lot of modifications and would introduce a special case that would need more code than the usual case.

I thus implemented a naive solution that just shifted the byte by 8 bit, to make sure that I could still use my dear newlines. The same person came back at me a few months later though, and introduced me to the concept of UTF-16 surrogates and how Java did not like these special pairs of characters, even in UTF-8, the default encoding for Py4J.

I boosted the priority of this issue, but because I had started a new job and I was trying to finish my thesis during the weekends (advice: this is the fastest way to end up in an asylum), I did not have the time nor the strength to find a solution. Fortunately, a contributor from The Atlantic made a nice Christmas present to Py4J users: he implemented a fix using Base64 and opened a pull request. I merged the pull request in January, but I’m still fighting with some test glitches caused by the difference between Python 2 and Python 3. The Open Source community has been very kind to me and I have been fortunate to receive significant contributions from Py4J users in the past (Python 3 support anyone?). Because I am working for a company that is sympathetic to open source contributions, I will make sure in the near future that the effort behind the various Py4J patches were not in vain.

There are currently 5 open issues that I need to close before releasing 0.8, but all issues have some work in progress so I am confident that I will go through this backlog soon. After that, I will try to come back to a regular release cycle.

Py4J 0.7 Released!

Py4J 0.7 has just been released!

This release includes the following new features:

  • Major refactoring to support Python 3. Thanks to Alex Grönholm for his patch.
  • The build and setup files have been totally changed. Py4J no longer requires Paver to build and everything is done through ant. The setup.py file only uses distutils.
  • Added support for Java byte[]: byte array are passed by value and converted to bytearray or bytes.
  • Py4J package name changed from Py4J to py4j.
  • Bug fixes in the Python callback server and unicode support.

The specific issues are discussed on GitHub.

Installing Py4J is one pip away: pip install py4j

Although I’m still using Py4J everyday, I do not need many more features so future development will be mostly driven by feature requests and bug reports. Feel free to contact me or to write a feature request on GitHub!

Cheers,
Barthélémy

 

P.S.

I shall blog soon about a small and hidden feature that I introduced in the Py4J eclipse default server this morning and that made it into 0.7…

Py4J 0.7 soon to be released!

Just a quick post to announce that 0.7 will be released as soon as pypi renames the py4j package from Py4J to py4j (note the uppercases in the first name 🙁 ). The release will introduce compatibility with Python 3.0, thanks to the huge effort of a generous contributor! There are also a few fixes to bugs reported by users in this version.

Stay tuned!

Py4J 0.6 Released!

Py4J 0.6 has just been released.

This release includes the following new (and great) features:

  • New exception, Py4JJavaError, that enables Python client programs to access instance of Java exception thrown in the Java client code.
  • Improved Py4J setup: warnings are no longer displayed when installing Py4J.
  • Bug fixes and API additions.

In case you did not notice, Py4J moved to github so contributing is now easier than ever!

I plan to do at least another release (0.7) before Py4J leaves beta and moves to 1.0.

About Py4J
Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine. Methods are called as if the Java objects resided in the Python interpreter and Java collections can be accessed through standard Python collection methods. Py4J also enables Java programs to call back Python objects. Py4J is distributed under the BSD license.

Py4J and Exceptions

Py4J 0.6 is almost ready to be released, thanks to Jakub L. Gustak who submitted important bug reports, feature requests, and patches. I have been trying to polish Py4J in the latest releases to make the API more consistent and predictable and the biggest “feature” of 0.6 will no doubt be how Py4J treats Exceptions.

Currently, exceptions can be raised in four places: (1) in the Py4J Python code, (2) in the Py4J Java code, (3) in the Java client code, and (4) in the network stack. An exception might be raised in the Py4J code if the client code is not correct, for example, if the client tries to call from Python a Java method that does not exist. Before 0.6, Py4J raised a Py4JError in cases 1,2,3 and a Py4JNetworkError (a subtype of Py4JError) in case 4. Moreover, if the Java exception was raised on the Java side, the Java stack trace was copied, as a string, in the Py4JError.

There are two issues with this approach. First, the client does not have access to the exception instance on the Java side, and this exception may have some important fields and methods that can help the error recovery. Second, it is very difficult for the client to determine at runtime the source of the error.

Starting from 0.6, Py4J will raise three types of exceptions: Py4JNetworkError in case #4, Py4JJavaError in case #3, Py4JError in cases #1 and #2. Py4JNetworkError and Py4JJavaError will be a subtype of Py4JError (so a client can implement a catch all). Py4JJavaError will also have a method that will return the instance of the Java exception and Py4JError will still display the Java stack trace for case #2.

Stay tuned for 0.6!

 

Py4J code moves to GitHub

It is now official. Py4J code has been moved to GitHub. The issues have been transferred there too thanks to a handy trac-to-github python script. I will be shutting down the trac instance on SourceForge. The web site, mailing list, and alternate download site will still remain on SourceForge (Py4J has always been available on pypi as well).

The decision to move to a Distributed Version Control System (DVCS) was a no brainer. I tried mercurial for another open source project I work on, Qualyzer, and DVCS are clearly superior when it comes to merging and enabling collaboration. The speed boost is also quite welcome (thank you local commit).

The main question was thus: mercurial or git? Actually, the real question was, bitbucket or github? This was not an easy decision, but in the end, the numerous outages and bugs of bitbucket and the higher number of potential collaborators on github convinced me to go with github. In addition, github seems speedier and they developed a tool, hg-git, that allows mercurial users to work with github repositories.

I must say that I was disappointed with the issue tracker at github which is slightly inferior to the bitbucket tracker (only tags, no milestones, no automatic reference between commit and issues except when closing issues, no complex queries, etc.) and I just learned that their wiki engine was only recently moved to an implicit git repository: I believe bitbucket wikis have always been backed by a mercurial repository… Anyway, I just wanted to say this because I have read many posts bashing bitbucket over github and I did not want to sound like one of them…

 

 

Py4J 0.5 Released!

Oh Yeah! It’s another release of Py4J and there are many interesting features that made their way in:

  • The ability to import packages (e.g., java_import(gateway.jvm, 'java.io.*'))
  • Support for pattern filtering in JavaGateway.help() (e.g., gateway.help(obj,'get*Foo*Bar'))
  • Support for automatic conversion of Python collections (list, set, dictionary) to Java collections.
  • Two Eclipse features: one embeds the Py4J Java library. The other provides a default GatewayServer that is started when Eclipse starts. Both features are available on the new Py4J Eclipse update site: http://py4j.sourceforge.net/py4j_eclipse
  • New module decomposition: there are no more mandatory circular dependencies among modules.
  • Have a look at the Trac 0.5 milestone for a detailed description of the issues fixed in this release!

This feature marks the return to a predictable and regular schedule: one release every two months. I am particularly proud that I could make this release given my crazy and stressful schedule these last days 😉

I should restate that I use Py4J every day since version 0.4 and most of the features that made it painful to use are now completed. The next release will focus on source code cleanup (be more pythonic), a few corner case features that are difficult to implement but that will make Py4J a killer application, and better unit test organisation. As usual, comments are welcome and if there is any feature you would like to see in the next release, now would be a good time to make your voice heard!