Why Aren’t More Users More Happy With Our VMs? Part 1 (2018)

pizlonator · on April 13, 2020

It would be better if VM comparisons included JSC rather than, or in addition to, V8. JSC tends to outperform V8 so if you find a pathology in V8 it’s just not so surprising. It would be more interesting if you found a pathology in JSC.

I think that the use of small benchmarks obscures what’s going on. The VM is trying to win in the average. It’s like a professional gambler. Observing that the VM did something dumb for a program is like observing that a professional gambler lost a bet. That’s not interesting. In a game of chance, even a really great strategy will have its outliers.

I think that to understand the quality of a VM you have to throw millions of lines of code at it and see if the optimizing JIT can consistently produced speedups or at least produced speedups more often than not using some aggregate metric. As someone who studies the behavior of JSC on million line code bases, I can tell you that a pretty good outcome is if only a small number of functions experience an “upside down” effect from optimization and ends up running slower over time.

Finally, the whole search for a methodology to pinpoint warmup is broken. It’s pure brain damage. VMs need to be fast even for small programs that don’t have a chance to warmup. Startup time is absolutely important. So it’s a methodological antipattern to even try to find the warmup.

The questions worth asking are:

- for some program, how long does it take to run that program. Start to finish. No ignoring warmup.

- how long does it take to run some very long program or the average running time of a small program averaged over many iterations

- some percentile of behavior, like the 99th, to get an average of the janky behavior.

Ideally you measure all of those things and include both short running and long running programs.

This tells you how good a VM is.

If you’re doing math or methodology to identify the warmup point then you’re effectively biasing your experiment to forgive VMs for bad behavior so long as that bad behavior happens early. Nothing could be sillier. Users care about the perf of their VMs at startup not just in steady state.

Anyway, that’s the way I like to do optimizations in JSC.

Jasper_ · on April 13, 2020

> - for some program, how long does it take to run that program. Start to finish. No ignoring warmup.

This methodology likely comes from Java, which has long-running server applications. "How long does it run" is often "until someone hits ^C". Here, startup cost can be slow as long as the peak performance is fine. It's accepted that the first minute or two of the server are slow, but that's small compared to the month or so that the server will be running for.

> This tells you how good a VM is.

I think papers like this approach it from the wrong angle. I don't care about the VM's theoretical peak performance. I care about being able to measure and track performance in a reliable way. Put simply, I'm fine with bad codegen as long as I can consistently measure it. Feel free to improve it, but adding to sometimes give me good codegen, unreliably, is much more frustrating than bad codegen. But this seems to be the way the VMs are going, with things like probabilistic profiling.

If I refactor my code and replace for(let i = 0; i < L.length; i++) with for(const i of L), what's the cost? Will performance go up or down? We don't have tools or metrics to handle that right now. How can I ensure my codegen is good won't regress?

I work on a particularly demanding website in my free time ( https://noclip.website/#smg/AstroGalaxy , unfortunately won't run in WebKit due to missing WebGL 2 ), and performance varies drastically from Chrome release to release, and I do extensive testing with node.js to make sure that I'm getting good codegen.

pizlonator · on April 13, 2020

I know that the warmup skipping comes from Java. It was a mistake there. Saying that it’s because Java is for servers is a lame excuse and may be getting it backwards - maybe Java only succeeded on servers because all the tuning ignored warmup.

I hear ya that having tools would be great - but the best speedups do come about from probabilistic methods so it would be weird to rely on whatever a profiler told you.

offmycloud · on April 13, 2020

For those who aren't familiar, JSC is JavaScriptCore, the built-in JavaScript VM in WebKit, Apple's browser engine.

lioeters · on April 13, 2020

Thank you, I was able to find more information about JSC on WebKit's site: https://trac.webkit.org/wiki/JSC

Also a standalone build (kinda old) for various platforms: https://github.com/Lichtso/JSC-Standalone

titzer · on April 13, 2020

Not measuring VM startup time has a long tradition in papers. It was the Original Sin (TM).

evmar · on April 13, 2020

This blog post, and your reply, both touch on how difficult it is to measure performance. But then in the same where you point out that there are many different valid ways you could measure performance, you also make the broad claim that "JSC tends to outperform V8". What are you basing that on?

pizlonator · on April 13, 2020

These days we use JetStream 2 (our design) and Speedometer 2 (collaborative design between WK and Chromium folks) as the main big benchmarks but it’s not the only thing we measure and tune.

V8 used to have their own JS benchmark, Octane, but they retired it at about the same time as we beat them on it. So JSC is fast enough to make other people retire their benchmarks.

And by the way if you are interested in what we think of as good methodology you should read about JetStream 2: https://webkit.org/blog/8685/introducing-the-jetstream-2-ben...

alfalfasprout · on April 13, 2020

The reality is... it's been decades and while JVM languages can be pretty fast, I have yet to see many non-contrived examples where the VM based language consistently outperforms competently written but not heavily optimized C++. Even then, extensive tuning is done to the VM. Heck, with the advent of Go you now have another great higher level language that consistently outperforms Java/Scala, has top notch garbage collection, and doesn't make you deal with a bloated VM.

JIT is very nice in theory. It's great in certain applications (eg; in very tightly scoped domains like accelerating linear algebra). Its proponents always talk about how it allows for optimizations that would be too costly or difficult when doing AOT compilation. But the operational complexity to get it to actually perform at that level on a production language VM (eg; oracle's JVM) is often its undoing.

continuational · on April 13, 2020

According to The Computer Language Benchmarks Game, Go performance is in the same ballpark as Java, and sometimes several times slower:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

norswap · on April 13, 2020

Honest question: isn't the benchmarks game unrepresentative because it encourages submission of heavily optimized non-idiomatic programs?

Edit: looked at the source, which is easily accessible. The Java examples are reasonable (if slightly performance-minded, but nothing shocking).

qqssccfftt · on April 13, 2020

> top notch garbage collection

No it doesn't. The Go GC is intentionally very simple and optimised for one specific metric, where as the set of JVM garbage collectors allow you to optimise for the metric that matters to you and are tuneable for the requirements of your application. The JVM has state of the art garbage collection; Go has My First GC Algorithm.

titzer · on April 14, 2020

> The Go GC is intentionally very simple.

This is simply not true. The Go GC is an example of a sophisticated non-moving concurrent collector.

> optimised for one specific metric

This is only partly true. Low pause time is definitely the highest priority metric, but throughput still matters. The Go GC probably has the lowest pause times of any production GC these days, outside of perhaps nonpublic custom solutions, e.g. successors to IBM's Metronome sold to specific customers.

wtetzner · on April 13, 2020

> Heck, with the advent of Go you now have another great higher level language that consistently outperforms Java/Scala

Citation needed? Java may not beat C++ in most use cases, but if you need a garbage collected language, it's likely the fastest you're going to find. But you get high memory usage in return.

EdwardDiego · on April 13, 2020

> Go ... has top notch garbage collection

Is ballast[1] still required for certain use cases?

[1]: https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...

jerf · on April 13, 2020

I think if you implicitly define "top-notch garbage collection" as "never requires any tweaking ever in any performance profile" you've limited your "top-notch garbage collection" down to the empty set. There aren't even any "manual" memory management techniques that can stand up to that definition, honestly. Even in "manual memory management" languages there are situations where you end up having to say "just give me a big slab of memory and walk away, please" (arena management, etc.).

(Not a defense of Go's GC, necessarily. I'm not sure I'd call it "top-notch". A lot of its advantage over Java was that it got to look at Java and make different decisions, and by having a lot more values in the language with fewer references, part of the reason it tends to do a lot better than Java in terms of memory usage is just that it gave itself an easier memory management problem in the first place. Java GCs are frightfully good, yes, but to some extent they are that good because they have to be. Java at the very, very beginning was not designed for the sort of usage it has today (it was very originally a set-top box language, not a Big Iron language), so the language does some things that stress its GC. Go's doesn't have to be that intricate to be still quite good, so it isn't. And note difference between "quite good" and "top notch".)

the8472 · on April 13, 2020

Ballast isn't gc tuning in the sense of changing some configuration, it's doing something nonsensical in the application code itself to alter gc behavior.

In java you would simply set -Xms to increase the min heap size and then that additional memory would also be available to the application and not wasted as in the Go case.

jerf · on April 13, 2020

It's not "wasted" in the Go case. On modern systems, the ballast is only a few numbers in the virtual memory table as long as you don't touch it, which isn't that hard. The original blog post confirm it: https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i... search for "Now onto 2. Won’t this use up 10Gib of my precious RAM?"

It may be "nonsensical" but on the grand scale of "things done for memory management's sake" I find it unimpressive. (If 10GB were actually allocated and unavailable, I would find it impressive.)

EdwardDiego · on April 14, 2020

Point is, with a mature GC, you don't need to change your code to change runtime GC behaviour - in fact, in JVM land, any usage of `System.gc` or similar in code is very much a code smell.

To give a broad example, and to repeat the parent comment, I can avoid "ballast" by setting -Xms. But it goes further, if I have an existing JVM code base, and want to run it using a GC that exhibits similar low GC pause time to the Go GC, assuming I'm compatible with Java 12+ I can use the command line option -XX:+UseShenandoahGC without changing my code.

If Shenandoah isn't producing the behaviour I want, I can change to another algorithm with -XX:+UseG1GC for example.

I totally understand that Go's trying to avoid such complexity, but to quote Python's zen, complex is better than complicated. And I consider writing code to influence underlying runtime behaviour to be needlessly complicated.

ddevault · on April 13, 2020

It occurs to me that, in practical terms, the "steady state" of performance with the increasingly large blobs of JavaScript we find littered throughout the web tends to impact the user more than other JITs. As the user navigates from page to page, the VM is reset, a fresh set of minified blobs is downloaded and JITed, and a core or two is pinned to do so. That translates pretty directly to a hotter phone, less battery life, and a frustrated user. Sure, when your JVM startup is 0.1% of the runtime of your program, it's not as big of a deal, but when it's more like 20% of the time a user spends using the program (their web browser), it's a lot worse, and has slim potential for improvement.

ehnto · on April 13, 2020

It has been really frustrating watching the frontend-web evolve. Take Jira for example, the platform is written in Java yet all my time is spent waiting for all the widgets to jiggle their way into existence. No matter how fast your server is, if you have to do 10-20 network requests to hydrate your frontend architecture the network time alone is going ruin your perceived performance. Your server could deliver sub 5ms response times, with 50ms Time to First Byte on the initial request, and it would still feel like wobbly molasses. The JVM is the least of their worries.

pizlonator · on April 13, 2020

JS VMs try to be really good and smart about startup time like when you visit a page for the first time.

The main trick is to interpret the short running code.

That said, some JITing makes page load faster. But only if the JIT only kicks in for functions that run more than some amount of time and the JIT is very cheap to run.

smabie · on April 13, 2020

Haven't we learned that VMs aren't worth it? The amount of engineering resources it takes to design a half-way decent one is just absolutely staggering. We all know that theoretically they can be faster than native code, but how many people have actually experienced that? The costs are too high: the warm-up costs, the engineering costs, the complexity costs. If every single language just generated LLVM IR and compiled it, we'd be in a lot better place and we'd probably have better, or at least more predictable, performance.

Like more resources have probably been put into the JVM than any other VM or compiler on earth and what has it given us, exactly? Performance is still worse than C and with homogenized operating systems, as well as the move towards the web, the portability guarantees don't feel very important. If I'm missing something, please tell me!

ncmncm · on April 13, 2020

The same can also be said about GC. Every year somebody is promoting a new Incremental Realtime Generational garbage collector that is almost acceptable for serious work, if you ignore enough real-world overheads. People writing code that has to manage resources besides memory have found that, given the right core language facilities, managing memory too is no bother, but managing other resources while fighting GC is intolerable. Of vourse you need destructors.

Meanwhile, C performance has not been a worthy goal in decades. Getting "as fast as C", if you could get there, would still leave you firmly in second or third place. Thr computers are not getting much faster anymore, but the problems are getting much bigger and the networks much faster, so performance matters more each year than the last.

noir_lord · on April 13, 2020

> The computers are not getting much faster anymore

True for each core on a CPU (broadly) but not true for the computer as a whole unit, we've seen an explosion of core counts on x86 desktops/laptops over the last 5 years (helped by the resurgent AMD forcing Intel to stop putting out incremental upgrades to 2C/4T on laptop and 4C/8T on desktop every year).

> People writing code that has to manage resources besides memory have found that, given the right core language facilities, managing memory too is no bother,

If we assume that it is 'no bother' as stated that doesn't address the 8C/16T elephant in the room in that for maximum throughput on a modern machine we need to orchestrate running across multiple cores properly and that's a tougher problem to crack in a general purpose popular language without an explosion in complexity to the programmer.

lokedhs · on April 13, 2020

That's a True Scotsman fallacy. I guess all those millions upon uncountable millions of lines of Java, Erlang, C#, etc code in production is not serious? Because it's running in a VM?

hedora · on April 13, 2020

What percentage of those lines run with an incremental realtime GC?

The only such Java GC I know of is part of the Azul Zing VM.

WorldMaker · on April 13, 2020

I believe that's why the poster referenced the "No True Scotsman" fallacy. It entirely depends on your definitions and you can keep endlessly narrowing your definitions. All of them GC, right? All of them are doing some form or another of incremental, right? (Some in all generations, others only in some generations, based on balancing and tuning needs.) Do all of them have realtime characteristics? Yes and no. None of them are allowed to stop the world, certainly, but does that mean that all of them or any of them can be called "realtime"? It's a semantic hedge forest.

smabie · on April 13, 2020

The difference is that GC provides so much more value than a VM. For example, without GC, functional programming is practically impossible, unless you consider linear types the solution (I'm not aware of a FP language that uses them instead of GC, so I can't really comment on their viability).

InfiniteRand · on April 13, 2020

Performance is I think for most people a matter of good-enough performance, most of the time, with good-enough and most varying wildly depending on context.

Regarding the move towards the web, I think portability still matters, because a lot of web developers write code and potentially test code on Windows and but deploy on Linux for production.

smabie · on April 13, 2020

Except it's not good enough. Every JVM project I've worked on had to work around the unpredictability of the VM: doing things like on a new deploy hitting all your endpoints a 100 times in order to warm things up. Or the curious cases when a trivial change in one part of your program somehow affects the performance characteristics of another. VMs don't provide enough benefits to justify the large burden of actually using them effectively.

overgard · on April 13, 2020

Ugh at risk of being the grumpy old programmer: jit code is just never going to touch actual handwritten native code in performance. It's been like 50 years. If we care about performance we need languages for it.

hansvm · on April 13, 2020

I mostly agree, but compilers intentionally don't explore every optimization path to save compilation time, and a JIT is able to trace the program for real data and find places that could benefit from additional attention.

I don't think JITs are the only way to address that trade-off in compilation time vs runtime performance metrics, and whether their benefits are worth their other costs is an interesting, program-dependant question, but they have strictly more information available than a naive compiler and can potentially use that to make better decisions.

jjnoakes · on April 13, 2020

Any languages explore both? Compile to native and run with a jit which profiles and optimizes further based on runtime behavior?

jdsully · on April 13, 2020

Its called profile guided optimization, although they don’t run with an actual JIT. The big 3 C compilers all have it.

The_rationalist · on April 13, 2020

It is clearly less powerful, it analyse one sample and extrapolate from it.

Every user should have it's own PGO and regularly refresh it to be competitive with the advantage of JIT profiling

btrask · on April 13, 2020

This is a completely legitimate comment (which was 'dead' when I saw it). PGO really is less powerful than JIT.

A compiled language with a "micro-JIT" (e.g. for v-tables) seems like an interesting idea to me.

fulafel · on April 13, 2020

PGO as done in C family compilers of today can't make speculative optimizations and fall back to recompilation, so it's much weaker.

tracker1 · on April 13, 2020

Just curious what the big 3 you're considering are?

- gcc - clang - msvc - intel

jdsully · on April 13, 2020

Yes in that order. Maybe you could say big 4, but I find Intel is only popular in smaller niches.

pizlonator · on April 13, 2020

No fundamental reason why you cannot do this. Maybe Graal can even do this for Java since it can both compile to native and do profile guided optimization (but I don't know if they ever combine the two).

Here's a reason why this might not be a super great thing to do, if you're thinking of it from a language design standpoint:

- Profiling and rejitting costs you memory and makes all code pay a tax. The biggest tax is the safepoint/OSR tax. It costs significant memory and some small amount of time (maybe the time cost you add from comprehensive OSR support is like 1%-5% overall, but I'm not sure, because it's hard to isolate this cost and measure it). So you don't want to build a system that supports JITing unless it's going to give you big wins. That implies languages with lots of dynamic typing.

- The languages that most benefit from JITs (the win they get consistently overcomes the overhead of JITing) are the ones that have so much dynamic typing features that compiling them to native is a super hard problem.

Java is one of the few languages that is both dynamic enough to benefit from JITs but static enough to be possible to compile to native. And even for Java the native compilation is a lot of effort to get right.

Note that "possible to compile to native" to me means: compiling to native produces something that performs well so there exists some benefit to actually doing it. Like, I would expect compiling JavaScript to native to produce something that doesn't perform well at all.

So, basically, the issue is that most languages are either in the "benefit of JIT is smaller than cost of JIT" category because they have adequately static typing or in the "must have JIT and cannot compile statically" category because they don't have static typing. Not a lot of languages are in the sweet spot where doing both would help, but such languages exist (Java) and there's no reason why they can't do what you suggest and they may already do it (seems like a natural thing for Graal to do).

charleslmunger · on April 13, 2020

You can get a bunch of the benefit by applying FDO:

https://llvm.org/devmtg/2013-04/novillo-slides.pdf

voldacar · on April 13, 2020

i think there are some common lisp implementations that do this

pizlonator · on April 13, 2020

Totally true. JITs are just about making the dynamic languages perform better than if they were interpreted. If you want native perf you need a statically typed and ahead of time compiled language.

fulafel · on April 13, 2020

I agree that we need to design languages that take into account advanced optimizations, but we're pretty far from the limit of compiler optimizations - partly because many languages have inconvenient semantics for it. We haven't even begun to leverage data layout optimizations and there's a lot of things about feedback directed partial application style transformations that haven't made it out of academia to production compilers.

In some sense we can always fully specialize a program to its current input and make a lot of radical changes to data layout and represesentation. But in languages like C/C++ where programs can "see" struct/array layout, member order, pointers as integer values that have fixed order, spacing, grouping based on the program specified data types, this is prohibitively complicated.

pso · on April 13, 2020

It seems that the authors are not measuring what they think they are, or have explained it poorly. Most transitions from interpreter to JIT show speedups of x10 to x100, eg luajit or V8. How is it possible that the variation of V8 (as an example), according to their numbers is showing improvements of only a few percent, when it should be orders of magnitude faster after transition? My conclusion, they are measuring variations after warmup.

All of the warmup, and transitions from interpreter, to JIT, to optimised JIT , happen inside the first few micro or milliseconds of EVERY one of their thousands of process iteration. Their measurements are ALL of the system variation of the VM after warm up has taken place. The VM is optimizing within the first 1-1000 inner loops occuring at the start of EACH process iteration. For most working programmers, a variation of a few percent on a running system AFTER warm-up in "steady-state peak performance", and before any I/O takes place (because language benchmarks avoid I/O), would not be an issue. If it is an issue, then the article perhaps demonstrates that a compiled language would offer less variation.

The benchmarks listed range from a shortest of around 0.4s for fannkuch/hotspot/linux, up to 1.8s for n-body, pypy, linux. This 'long-running' benchmark code (of .4 to 1.8s ), by definition, has to include multiple inner loops/hot code, which is quickly optimized, otherwise benchmark code would have to be millions of lines long, in order to have a sufficient runtime length. Tests need to run for at least tenths of a second, for cross language comparisons, since JITted languages take some iterations to warm-up.

hedora · on April 13, 2020

Their first iteration is an entire run of the underlying benchmark. Subsequent iterations are reusing the same VM. They run each plot multiple times, and reboot between plots.

They’re trying to show that “warmed up steady state” isn’t something that reliably exists.

pso · on April 13, 2020

Yes, I know. But the tone of the whole article, is as if, they've found deep flaws across many VMs. They call something 'warmup" which I think has little or nothing to do with the JIT, but is unaccounted variations in the whole running system.

The final graph shows a binary trees program in C, with a 6% variation between "in process executions", and no steady state, it seems logical that most VMs will show the same or worse variation.

The "warmed-up steady state" does exist, but not if they define it so narrowly. All of their iterations and timings are running at x30 to x100 interpreted speed, the only 'cold' interpreted code is in a few microseconds of the first loops of an execution.

Spivak · on April 13, 2020

Stupid question for people who know more about these things. Why can’t we fight the warmup time by running an already warmed up snapshot of the program? Or say dumping some data structure when it hits steady state to give hints to the JIT the next time it runs?

jayd16 · on April 13, 2020

Android does this with ART. They use JIT and profiling to generate AOT binaries dynamically. I believe this also lets Android update the runtime in a way that invalidates the AOT binaries. They are simply be regenerated as needed.

https://source.android.com/devices/tech/dalvik/jit-compiler

electrum · on April 13, 2020

Azul's ReadyNow! for their commercial Zing JVM does this: https://www.azul.com/products/zing/readynow-technology-for-z...

lstamour · on April 13, 2020

If you look for "AOT" or "Ahead-of-time" you'll find examples in both .NET and Java, but as far as I know they're either largely experimental or limited to newer code (not backwards-compatible with all code). But I haven't looked too deeply into it. Dumping data structures to give hints for next time reminds me of something I read recently on the topic, but drat, I can't remember it right now.

cortesoft · on April 13, 2020

Isn't that just compiling, then?

dodobirdlord · on April 13, 2020

Sure, but something like Java already has to be compiled. Throw a few more minutes in there, maybe run the test suite a couple thousand times.

thayne · on April 13, 2020

But many languages with VMs are intended to be portable across processor architectures, and a snapshot would be architecture-dependent.

Also, the charecteristics of your test suite may be very different than how it is run in production.

afiori · on April 13, 2020

I would say those are reasonably easy to solve, you just need to offer platform specific trained binaries and add a realistic set of stress tests with which you can train the VM.

pizlonator · on April 13, 2020

That would be great but it’s hard because the trick, at least in JS VMs, is to have the JIT specialize based on the heap.

So you’d need a heap snapshot or some way to link the generated code to a different heap.

Maybe not impossible, just hard enough that it’s not widespread.

anp · on April 13, 2020

IIRC V8 can now provide deterministic JIT+heap images in addition to preparsed bytecode images.

EDIT: worth noting that other JS VMs like JSC have these APIs for bytecode too, but i can’t remember off hand whether they can do generated code too.

hamburglar · on April 13, 2020

It does seem like you could do JIT with a persistent cache which stores the JIT output along with a key that's a hash of all the relevant system parameters like CPU model and VM parameters like heap size. This would mean that the typical case of re-running a program in the same environment would be pre-warmed.

pizlonator · on April 13, 2020

It’s much harder than that because the JIT is speculating on what lots of objects in the heap are doing, including watchpointing them to constant fold properties. It’s not clear what the key should be in that case.

Still not impossible but I want to be clear on what exactly makes this hard. CPU model for example is not what makes it hard.

saagarjha · on April 13, 2020

> including watchpointing them

I assume this isn't literally using hardware watchpoints?

pizlonator · on April 13, 2020

blinkingled · on April 13, 2020

IBM OpenJ9 JVM does that with the AOT feature. It's good for startup time and the AOT code can be still further optimized by the JIT. https://www.eclipse.org/openj9/docs/aot/

rovr138 · on April 13, 2020

Kind of like memchached, https://en.wikipedia.org/wiki/Memcached

fulafel · on April 13, 2020

Heap layout changes between runs due to OS obfuscation (ASLR).

pizlonator · on April 13, 2020

I would bet that a VM isn’t going to be deterministic enough to produce the same heap twice even without aslr. Just building a deterministic JS engine seems like a super hard problem.

krona · on April 13, 2020

The V8 snapshot feature is designed for this. https://v8.dev/blog/custom-startup-snapshots

fulafel · on April 13, 2020

We can and there are many implementations of it, but the relatively rare application of this approach suggests it's not a big enough win to justify the cost.

jakear · on April 13, 2020

I believe this is what Deno tries to do.

https://deno.land/

rb808 · on April 13, 2020

dotnet does this, caches compiled assemblies then also tunes them. https://www.geeksforgeeks.org/what-is-just-in-time-jit-compi...

wildmusings · on April 13, 2020

Emacs does this. See https://emacs.stackexchange.com/questions/2364/what-is-the-f...

darksaints · on April 13, 2020

I love intellij idea, but it's pretty damn slow for a lot of things (though never quite as bad as eclipse). I have always wondered if the JVM is optimizing for tasks that happen at startup, at the expense of performance during use. It would be a shame if the slow performance simply came down to the JVM profiling at startup and determining that the core function of the entire program was to index the code in your workspace.

qqssccfftt · on April 13, 2020

In my experience, IntelliJ is as fast as VS Code. It'll never be as fast as a terminal editor because it has actual features as opposed to just being a text editor.

Outside of startup (which has gotten so much better in the last years I don't even see the splash screen anymore) and initial indexing upon project creation or library downloading, it's perfectly fast enough.

tracker1 · on April 13, 2020

In my own experience IntelliJ is slower than VS Code to use... It will depend on what you're doing as VS Code has an interesting plugin module to not hang up the UI/Editor itself...

Not all the plugins respond in time while typing, but I can at least keep typing and it continues to function with basic editing (general worst case).

jsiepkes · on April 13, 2020

Slow compared to? VSCode? Visual Studio? Vim?

ccmcarey · on April 13, 2020

From experience, slower than VSCode by quite a bit, and both VSCode and Jetbrains are of course slower than Vim by an order of magnitude.

CalChris · on April 13, 2020

There are a couple of talks associated with this post and paper.

Virtual Machine Warmup Blows Hot and Cold

https://www.youtube.com/watch?v=LgCHAU8ZB00

Why Aren't More Users More Happy With Our VMs?

https://www.youtube.com/watch?v=cmrzOkEM9fc

lokedhs · on April 13, 2020

People are complaining about VM performance, while at the same time I'm guessing a fair number of the same people happily writes production code in Python. It's hard to find a slower language than Python.

People are not using VM-based languages because they want to beat the numeric performance of hand-optimised Fortran. They use it because of the memory safety, compatibility and performance which is on par with or even better than other languages (at least when it comes to the JVM) for the tasks that they want to perform, which is for most programmers not going to be numeric simulation.

jonathanstrange · on April 13, 2020

I have nothing against lean VMs/JIT compilation like LuaJit or the way Racket compiles on the fly, but I do avoid Java VMs for various reasons:

- Dependency hell and deployment problems: It's hard to make correct assumptions about which VM version is available on which platform. Pre-installed versions interfere with side-installed versions, and there is a ton of software that requires older Java VMs to work properly. It's a huge mess.

- Potential for losing future OS support: Apple, Microsoft, and others may at any time decide to block Java VM or no longer support it on their platform. That means you have to bundle your software with a Java VM, e.g. Crashplan has done this, making installation and deployment even more difficult.

- A thousand past problems on Linux: Various versions of OpenJDK and Oracle's java in combination with user software written in Java have caused massive problems on my Linux machines during the past 15 years, from causing extreme slowdowns to freezing the desktop until you hard reset.

Nothing else has given me as much troubles on Linux than Java, not even proprietary graphics card drivers and kernel extensions. Whatever the Java VM does, if it can freeze your whole system just because you run desktop software like Jabref, then there is something wrong with it.

qqssccfftt · on April 13, 2020

t. person who has not used java since Java 1.5

jonathanstrange · on April 14, 2020

My post wasn't intended to be taken as a statement about Java, the programming language, it's about VMs/Java implementations. Unfortunately I have software to run on the Java VM.

jasonhansel · on April 13, 2020

I suspect that part of the issue is that when devs are writing and testing their code, they rarely keep the VM running for long enough to reach peak performance. So the performance still feels slow to developers, and performance issues still block the develop/reload/test cycle.

AceJohnny2 · on April 13, 2020

This is what overnight "regression" & "aging" (where you run the test suite over and over, to try and capture those rarely-seen corner cases) tests are supposed to capture.

I'd be surprised if the VM developers didn't run these.

More likely, the environment the VMs are run in has changed in the decades since their development: amount of RAM, cache size, latency of one subsystem over another....

pizlonator · on April 13, 2020

Some VMs are a lot better than others at startup.

Some programs don’t “run long enough” in the way the VM needs even when users run those programs.

im3w1l · on April 13, 2020

The title doesn't do the surprising conclusion justice.

"When we set out to look at how long VMs take to warm up, we didn’t expect to discover that they often don’t warm up. But, alas, the evidence that they frequently don’t warm up is hard to argue with."

By not warming up they refer to instances when early performance is higher than later performance or when the performance doesn't settle.

afiori · on April 13, 2020

> By not warming up they refer to instances when early performance is higher than later performance or when the performance doesn't settle.

which are both cases where you would be disappointed were you to use the VM as a server.

I think their point is that VMs are more unpredictable than many realize and also intrinsically unpredictable in some cases.

quotemstr · on April 13, 2020

It's psychological. Most users don't really care how fast something runs once the JIT warms up: instead, they care about the latency of their starting the program to the time they can use it and use this latency as a psychological measure of the performance of the program as a whole. Stupid? Yes. But that's what actually happens.

The JVM has always had a slow startup path. Much slower-overall systems like Python don't. That's why people don't complain about Python performance and do complain about Java performance.

daxterspeed · on April 13, 2020

These tests could be on a website of their own. I'd love to see how the results change with time and how competing implementations compare in terms of behavior (not necessarily performance), V8 especially has changed a lot but it wouldn't surprise me if it still ran into issues.