A New Chapter for PyPy

pansa2 · on Aug 13, 2020

From pmatti’s comment at https://old.reddit.com/r/Python/comments/i8ksfc/a_new_chapte...:

> PyPy core dev here. PyPy will always remain free and open source. No worries there. The question we and many other open source projects are trying to deal with is how to fund our developers under that constraint. We felt that it was time to explore other alternatives.

dwheeler · on Aug 13, 2020

Weird. There is no information on what they are changing to, only what they are changing from. I would like to see what they plan to do instead.

fijal · on Aug 13, 2020

That will likely be the contents of the next blog post

apendleton · on Aug 13, 2020

I've had great success with using pypy in production, but found that it was sometimes tricky to figure out how to fully leverage the performance wins on the table, especially if the regular way you might do a thing in cpython depended on a C extension that was either unsupported on pypy, or supported but slow. I could imagine a real demand for people with the specialized expertise necessary to do this for commercial customers, so I wonder if the plan here is for pypy be funded by this kind of consulting, and maybe exist under an entity that can make that happen (i.e., that's allowed to do for-profit consulting work)?

acdha · on Aug 13, 2020

That’s what I’d guess at, something like Red Hat: you pay us for support & consulting for OSS work you don’t want to have in-house developers for.

humanistbot · on Aug 13, 2020

So I have no direct knowledge of this case, but this seems to be an issue over fiscal sponsorship. Like a lot of smaller and younger open source projects, PyPy had delegated financial and accounting responsibility to the Software Freedom Conservancy, who acts as their fiscal sponsor. SFC is a registered non-profit with professional accountants, taking donations on behalf of lots of projects including PyPy and distributing it to the project on request from the project's leaders. This is a common model in open source, there are similar organizations that do this like Software in the Public Interest, NumFocus, or the Open Source Collective.

One issue that happens is when fiscal sponsors take a cut on all revenue/donations raised by the project in exchange for their services, which is 10% for SFC [1]. Fiscal sponsors also have policies around what they will reimburse, which can become as stringent and as corporate travel policies [2]. I can totally understand a project getting frustrated at their fiscal sponsor and wanting to either start their own and do it all themselves or find another sponsor.

[1] https://sfconservancy.org/projects/apply/

[2] https://sfconservancy.org/projects/policies/conservancy-trav...

jamesdutc · on Aug 13, 2020

The admin fees you refer to are typically assessed in the context of large, institutional grants. These are grants from private organisations like the Gordon and Betty Moore Foundation, the Alfred P. Sloan Foundation, the Chan Zuckerberg Initiative, or government agencies like the US National Science Foundation. These grants have significant financial accounting requirements. There are also many other legal or operational costs associated with these grants.

When projects run these grants through parent institutions like universities, the typical admin fee is >40%. In some cases, it can be as high as 60%. Many projects are eager to enter fiscal sponsorship agreements with organisations like NumFOCUS, because so much more of their grant money goes to funding the work.

In the case of NumFOCUS, admin fees do not adequately cover the staff requirements to manage these grants. NumFOCUS "loses money" when servicing the administrative needs of these grants & it takes this responsibility onto itself solely for the betterment of the projects.

Rather than assess administrative fees similar to universities, NumFOCUS uses its other fundraising—corporate donations, event (i.e., PyData) sponsorship, individual giving—to finance its operations.

source:

- I serve on the NumFOCUS board of directors as its co-chair.

- I presented on this topic last year at the NumFOCUS Annual Summit to an audience of core developers from projects like Julia, Jupyter, Pandas, NumPy, AstroPy, &c.

- NumFOCUS budgets are public, and all of the above information can be corroborated from materials published on https://numfocus.org/

riffraff · on Aug 13, 2020

Is anyone here using pypy for their daily job? What do you use it for?

rciorba · on Aug 13, 2020

I did. Inherited a legacy web app that did stupid things in Python in memory (basically search and aggregation).

I realized a rewrite was the best course of action, but in the meanwhile the old thing had to stay up and running, and as the volume of data increased, it started to run in to HTTP timeouts as often requests took longer than 2 minutes.

I moved the thing to PyPy, and got about a 30% speedup from that. Only one lib had to be replaced with a pure python alternative, as it was using a C extension.

It bought me enough time to finish the new implementation (duplicate the data in Elasticsearch, hey presto from over a minute to about a second to get results).

For some workloads PyPy's JIT can do wonders.

PaulHoule · on Aug 13, 2020

I parse big XML and similarly structured files, convert them into RDF, puff them up into a (still RDF but with a lot of blank nodes) hypergraph so I can load the content into a single database and be able to trace that these two facts are related and come from this part of document A and that part of document B.

I have document parsing and SPARQL queries that can take a few minutes that I'd like to run frequently so I can keep all parts of the system up to date.

I've only benchmarked it a bit, but I found I got approximately the five times speed-up that PyPy promised. This is with PyPy based on Python 3.6. I think PyPy is switching to cffi as the way to connect to C code so most native code "just works" now.

I had to backport my code from Python 3.8; Python 3.6 lacks contextvars, but there is a polyfill for that, otherwise there was no problem.

I stayed away from PyPy for a long time because it was tied to Python 3.5 which was busted in various ways. One of those was that the filesystem path objects were half-implemented, you should have been able to pass them into anything from the stdlib that expected a string path and at that time you couldn't. Little accidents like that can slow down a technology like PyPy from being adopted.

rciorba · on Aug 13, 2020

> I think PyPy is switching to cffi as the way to connect to C code so most native code "just works" now.

As far as I know extensions need to be written for cffi specifically.

cffi is a newer way of writing C extensions, developed by the PyPy project. It was designed to have a smaller&cleaner interface to let you call C code from Python. Here's Armin Rigo talking about it at EuroPython: https://www.youtube.com/watch?v=ejUzVcvTLgI

The CPython way of writing extensions is documented here: https://docs.python.org/3/extending/extending.html It seems to require you to deal with the internals of the CPython interpreter (deal with PyObject structs, reference counting, etc).

I know PyPy has some support for CPython extensions, but it has to emulate some internals and it's slower as a result.

crazypython · on Aug 13, 2020

Did you use `__slots__` to store data pointers on the object itself, instead of in Python's hash table (today, a Hash Array Mapped Trie)?

rciorba · on Aug 13, 2020

Don't remember the details of the legacy app, but I don't recall seeing that. I think it just used dicts for the data and stored that in a blist.sortedlist https://pypi.org/project/blist/

blist was the one C dependency which I replaced with a pure python alternative http://www.grantjenks.com/docs/sortedcontainers/

laurencerowe · on Aug 13, 2020

For algorithmic code PyPy can provide substantial speedups over CPython. I've used PyPy in code fingerprinting large bioinformatics files and seen big speedups. I've also tried porting a webapp processing JSON from CPython and seen no perceptible speedup.

red2awn · on Aug 13, 2020

The JSON library is probably a C-extension so PyPy won't make it any faster.

nealmcb2 · on Aug 17, 2020

Apparently that isn't always the case. See the PyPy Status Blog: PyPy's new JSON parser https://morepypy.blogspot.com/2019/10/pypys-new-json-parser.... which talks about being more efficient with both deserialization and memory.

laurencerowe · on Aug 13, 2020

It's a long time since I looked, but profiling my code not much time was spent parsing / serializing JSON. Most of the time spent was manipulating dicts/lists in Python which cPython is already pretty good at since the whole language seems to basically be implemented in terms of dicsts. I don't think PyPy has the hidden class optimizations of JS engines which are able to find speedups in these types of cases.

aldanor · on Aug 13, 2020

For algorithmic/numerical code, especially if you have to deal with numpy-related data, Numba has a much easier barrier for entry, plus you remain with cpython while speeding up computation-intensive code by a few orders of magnitude.

laurencerowe · on Aug 13, 2020

Looks like numba has cffi support now so it would be an option. If I can dig out the code (it was about 5 years ago) I'll probably try adapting it to numba to benchmark it against pypy.

chucky_z · on Aug 13, 2020

I wrote a daily utilized utility (probably still in use) that made good use of PyPy, it was pretty slow and after quick profiling I found that type check functions (PyMySQL) were being called A LOT of times. Literally changing the runtime from python3 to pypy was something like an 8x overall speedup.

NovemberWhiskey · on Aug 13, 2020

We have a grid compute infrastructure for a specialized runtime environment with business-logic rules for scheduling priorities and partitioning of the compute cluster.

The control plane was implemented in Python and Twisted (event driven I/O framework for the unfamiliar), which was fit for purpose at the original scale running CPython (few thousand compute nodes).

As the number of compute nodes scaled up, we developed hotspots in ser/des of control messages, which ultimately started to affect overall cluster efficiency.

Switching to PyPy gave us an immediate substantial performance boost without really having to redo any code at all (just some FFI stuff that was probably wrongly implemented in the first place).

Eventually we realized we were going to out-scale even that (at the hundreds-of-thousands of compute node level) and ended up with a Scala/Akka reimplementation, but moving to PyPy from CPython got us a lot of free breathing room.

pyuser583 · on Aug 13, 2020

Reminds me of the tweets by Jeff and Mackenzie Besos announcing their divorce: super happy and optimistic, but containing no information at all.

unexpected · on Aug 13, 2020

I mean, why should the general public have information about their divorce at all?

smitty1e · on Aug 13, 2020

"We love dirty laundry."--Don Henly

pyuser583 · on Aug 16, 2020

A chunk of their “private” information had already been injected into the public sphere.

One of the downsides of an open society is the possibility unwanted publicity.

They issued a statement to “control the narrative”.

fnord123 · on Aug 13, 2020

Besos means kisses in Spanish. What a cute name!

mkl · on Aug 13, 2020

It's a typo of Bezos. Bezos may be Spanish in origin though, as it's the surname of his Cuban adoptive father.

tough · on Aug 13, 2020

Yeah Jeff came on vacation to spain and found a distant relative called Bezos too in a small town nearby our capital.

I think he paid them a visit, seen on local tv.

LOL source in spanish: https://www.abc.es/sociedad/20130810/abci-jeff-bezos-villafr...

adonese · on Aug 13, 2020

it is quite similar to the Arabic word بوس (Baus). I did some wiki search and found it originates from the latin word beso[1]. Languages are so awesome!

[1]: https://en.wiktionary.org/wiki/beso

sgillen · on Aug 13, 2020

I’m not familiar with the situation, but I guess this means the pypy team can try to monetize the project somehow now?

satisfaction · on Aug 13, 2020

They likely mean that there will always be a project called "PyPy" that will be free software but they will move to an open core model and introduce "PyPyPro" where are the cutting edge development will take place, PyPy may or may not be crippled.

mwcampbell · on Aug 13, 2020

I think funding of ambitious open-source projects such as PyPy will continue to be a problem as long as these projects use pushover (a.k.a. permissive) licenses such as the MIT license. It's time for these developers to take a stand for what is fair, by relicensing to a copyleft license such as Parity [1] and selling proprietary licenses to companies that can and should pay.

[1]: https://paritylicense.com/ (not affiliated, I just think it's a good and fair license)

chriswarbo · on Aug 13, 2020

I've not heard of this license before, and a search for "parity" comes up empty on https://www.gnu.org/licenses/license-list.en.html and https://opensource.org/licenses/alphabetical so it's not Free Software or Open Source.

It might be free software or open source (i.e. in spirit), but I don't have the patience to read potentially-dubious licenses (that's what the FSF and OSI are for!)

infogulch · on Aug 13, 2020

I'll add in https://choosealicense.com/appendix/ as a place where it would be nice for Parity License to be documented for comparison reasons.

mwcampbell · on Aug 13, 2020

Though I think the Parity License is a clearer license than FSF-endorsed copyleft licenses such as the GPL, my point still stands if you substitute one of those licenses.

tbenst · on Aug 13, 2020

Could someone explain key differences between Parity and GPL? I haven’t encountered this before

entropicdrifter · on Aug 13, 2020

I'm far from an expert on this, so other commenters please correct me if I'm wrong:

GPL requires that any project that uses a GPL project as a part of it must also be free and open source.

It looks like Parity requires that you pay a licensing fee to the original creator of the Parity licensed project if you're using it for non-open-source reasons. So, it can be used for private/for-profit projects, but you have to pay for it in that case, whereas open source projects can use the code for free.

infogulch · on Aug 13, 2020

How is that different from just licensing as GPL, and offering to purchase a dual license for private projects?

deliveryboyman · on Aug 14, 2020

Where do you see the part in the license that discusses a licensing fee?

Animats · on Aug 13, 2020

PyPy will remain a free and open source project, but the community's structure and organizational underpinnings will be changing and the PyPy community will be exploring options outside of the charitable realm for its next phase of growth ("charitable" in the legal sense -- PyPy will remain a community project).

In other words, volunteers can contribute but others get to monetize?

People like you helping people like us help ourselves - Processed World.

james412 · on Aug 13, 2020

The idea of competent volunteers contributing is wonderful as a hypothetical, but if such things existed in reasonable numbers then this funding issue would likely be moot in the first place.

The reality is that the PyPy folk (all single-digit-number of them) have fought tooth and nail to keep the project going for well over a decade. I can't begin to imagine how much highly skilled labour has been poured in by such a small concentration, all for little more than praise and repute on a handful of IT forums.

Give them a break

no_wizard · on Aug 13, 2020

In essence these projects live and die with funding. Donations just aren’t enough to pay the bills for full time developers there isn’t any real alternative.

I wish there was more corporate giving to foundations that could handle this sort of thing but we never built that culture in software unfortunately.

I don’t think it’s fair to frame this negatively at all, really misses the nuance of these situations

R0b0t1 · on Aug 13, 2020

I do think he captured it quite well -- that they are leaving it as a community project but only directly monetizing it for some people feels, well, wrong? It might be more neutral if they allocate funds for bounties and let anyone claim them, with the core developers obviously being able to address most bounties the fastest.

pjmlp · on Aug 13, 2020

Alternatively FOSS generation could pay for their tools instead of expecting free beer everywhere, then such projects wouldn't need these kind of gymnastics.

ed_elliott_asc · on Aug 13, 2020

Imagine that your main source of income is trying to implement something faster than some random person on the internet - that wouldn’t be fun.

What if you work on something but just before you finish someone else submits the same code.

What if their code is bug ridden but gets the bounty, that would be frustrating.

__s · on Aug 13, 2020

No need for it to feel wrong

There are a few people who basically run PyPy development. They can do as they please. It's open source, so if you're so against it, you can make a "nobody profits" fork. Most outside contributions to open source projects are made by people who wanted to scratch an itch & then let the existing maintainers maintain that improvement. Their reward is the great software. This is still there so long as PyPy commits to remaining freely available

R0b0t1 · on Aug 13, 2020

Commercializing the project proper seems incorrect. If they want to take on consulting due to their experience that seems far better.

I never said nobody can make money.

kubanczyk · on Aug 13, 2020

Enough said: "nobody profits" fork is the solution. /s

Kednicma · on Aug 13, 2020

To add some substance, I used to have PyPy commit ability. I also have contributed almost no code to PyPy. This isn't for lack of wanting; my project has produced several interesting RPython modules which could plausibly be shared with other folks. It's because PyPy's core contributors, the dozen or so post-academic compiler engineers, are incredibly prolific and skilled compared to the rest of the contributor base. They outproduce me. Compare: One person implemented PyPy's massive-subset-of-Python typechecker, one person produced Nuitka's broken typechecker, and a small community team produced MyPy's conservative typechecker. The PyPy version's by far the best, including translation to C and a JIT generator and allowing nearly any sort of codegen to a high-level GC'd Java-like data model.

The tragedy is that the Python ecosystem broadly doesn't use PyPy and doesn't contribute much to it, neither code nor cash. Our compiler engineers are just as good as the folks working on CPython (and there's some overlap), but don't enjoy the powerful deep-pocketed corporate support.

newen · on Aug 13, 2020

I imagine companies have made at least millions from using PyPy. And the core developers have not seen much of that monetization.

fxtentacle · on Aug 13, 2020

I'm surprised that they didn't mention numba.jit, which solves the same basic problem as pypy (faster numpy calculations) but in a different way that is easier to mix with existing python frameworks.

For example, TensorFlow with numba preprocessing is easy, just install both packages and it'll work. TensorFlow with pypy requires a 5 hour compile and 40 GB of temporary storage. Plus some source code fiddling inside TF, if I remember correctly.

Even as an open source project, pypy should honestly consider who they're competing with for users and funding.

oefrha · on Aug 13, 2020

PyPy is targeted at general purpose workloads, NumPy support is totally an afterthought, so the basic problem it set out to solve was/is definitely not faster numpy calculations. Numba on the other hand is way more targeted at numeric workloads.

Besides, I don't see why they should mention any other project in a post announcing their departure from the Conservancy. The only surprising thing is no mention of the funding model they're moving to, other than a rather vague hint, "exploring options outside of the charitable realm".

fxtentacle · on Aug 13, 2020

Based on your comment, I would guess that you never tried out numba. Of course, it can also do general python and loop optimizations. And in my experience, numba worked for every case where I couldn't get pypy to work.

And I stand by my opinion that that is something that the pypy developers should consider: is this actually usable as a solution to practical problems? Or is there something else that people use instead? If so, why? Analyzing your competition is usually a good way to learn about your own strengths and weaknesses.

oefrha · on Aug 13, 2020

> Based on your comment, I would guess that you never tried out numba.

Well, you guessed wrong.

> it can also do general python and loop optimizations.

Yes, it can be used in general purpose workloads, with varying degrees of success. But its main purpose is made abundantly clear:

Accelerate Python Functions

Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled __numerical algorithms__ in Python can approach the speeds of C or FORTRAN.

Built for Scientific Computing

...

https://numba.pydata.org/

> ... Analyzing your competition is usually a good way to learn about your own strengths and weaknesses.

Except this is an announcement on their funding situation, so strengths and weaknesses are completely irrelevant, unless Numba has a particularly interesting funding model. (The funding model is government grants and corporate sponsorship, so, not particularly interesting.)

wrmsr · on Aug 13, 2020

> Built for Scientific Computing

I mean the thing's called 'numba' lol.

I always liken Pypy to HotSpot in that to this day the numerical performance of the latter isn't spectacular and nobody really cares - it's built to handle the harder job of making vast tangled codebases of non-numerical application code run fast, not just tight math loops which are already handled perfectly well by other more specialized tools.

ed_elliott_asc · on Aug 13, 2020

If someone announces that they are exploring options then they are told “not enough info”

If someone explores other options then announces changes they are told “should have said they were exploring options”

oefrha · on Aug 13, 2020

Usually people explore other options before cutting off the current lifeline.

nhumrich · on Aug 13, 2020

i don't think pypy is a numba.jit "competitor". Most people I know who use it, use it for pure python things, typically web servers, and nothing to do with machine learning or data science.

rurban · on Aug 13, 2020

Weird that Simon Cross completely lost us here. He normally can write meaningful stuff. See e.g. http://hodgestar.za.net/blog/

"All good things must come to an end". No, they don't.

staz · on Aug 13, 2020

Someone responded anonymously in the comments :

> @intgr the wind-down with the SFC hasn't been smooth and this is the politically-neutral, agreed-by-both-parties post. PyPy remains the same free and open-source project. Essentially we just switched to a different money-handler. We're announcing it in the next blog post.

rurban · on Aug 13, 2020

On the other hand, this is a good explanation: https://sfconservancy.org/news/2020/aug/12/pypy-transition/

asah · on Aug 13, 2020

$220,000 over nine years for multiple developers. Ouch.