Accelerating JavaScript (In the Browser)

18 hours ago 1

Javascript used to be made fun of all the time because it was the slowest kid in gym class. And it could only have one thread. But this only motivated JS even more! This is that story....

📜 A brief(er) history of Javascript

It is perhaps what used to be the biggest impediment to JavaScript bettering itself, that has uniquely situated it to accel. And in those fires of incompatibility, the seeds of its performance were forged.

While other languages do have other competing implementations, JavaScript is one of the few languages whose performance is driven by consumers (rather than producers). Sure, as a consumer of R, a scientist wants it to be fast. But improvements to the language often benefit everyone and commercialization instead comes in the form of professional services and products.

For JavaScript, being so essential to the web browsing experience, its performance is the product in a sense. Since slight differences can make a browser feel more responsive. And that feeling can be sold.

⚔️🛡️ The Great Browser Wars

It was in the Second Browser War of the early aughts that JavaScript’s fate would be irrevocably changed…¹

While likely unbeknownst to all at the start (except for maybe a few industry insiders or smth), by the end of the decade the writing for a high(er) performance JavaScript was on the proverbial wall.

CUT TO:

And by the end of the decade, following the end of that decade (which brings us to now), Javascript would become one of the most optimized and performant dynamic/scripting languages out there.

🏎️💨 Vroom Vroom

So now that we are all caught up on the historical context of how JavaScript has gotten here, let’s get into our options for squeezing as much performance out of it as we can. What follows, are the various options of leveraging browser native APIs/technologies to accelerate JavaScript execution.

This post does NOT cover optimizations for JavaScript runtimes/VMs/implementations (i.e. how could you make a faster V8)

The techniques at your disposal can be (roughly) grouped into the following types of optimizations:

Parallel execution (with web workers usually)
GPU acceleration
Efficient Arrays and linear algebra routines
Source compilation + optimization (asm.js, WebAssembly)

And each of these is suited particularly well to overcome a specific performance bottleneck (or use case).

💉 Picking your Performance Poison

The first step to improving performance, is understanding what type of bottleneck you have. Again, the flowchart is your quickest path to choosing the right technique for your particular problem, but that begs the question: how do I know which node to stop at?

Sometimes it might be somewhat intuitive/obvious what the bottleneck is, like if you are trying to do a word count across a 1TB file in the browser for some reason. But often applications are complex enough that it is hard to know a priori where the slow downs are happening (or there may be multiple compounding bottlenecks).

Enter profiling (the good kind). Profiling in a web browser is slightly different than profiling a standalone program, since there are many externalities that affect how your code actually executes in a browser.

To help untangle this web² of performance, browsers often have fairly advanced developer tools built in. They are your friends.

Thankfully, the general principles and process of profiling are similar to debugging any (non-browser) language. When I am trying to pinpoint a performance bottleneck, I like to imagine I am playing twenty questions with my browser.

Is it you or is it me (and my code)?
Are my calculations taking too long?
Do I have enough memory?
etc.
etc.

For this post we are not going to dive into the nuances of performance profiling, since it is a fairly complex process that can vary a lot depending on environment, OS, browser, the network, other applications, etc. And often, it is a lot of trial and error. But the resources have lots of good advice from people much wiser than me. For this post, we will proceed in the hypothetical:

What can I do if I am running out of memory?
Can WebGL help me when I have a lot of data to process?
etc.
etc.

Which brings us to our first potential solution.

🏄 Streaming the Web

If you are running into latency or memory problems, streaming the data can often help. On the JavaScript side, this looks like an incremental algorithm that updates a shared data structure that represents the result of some calculation. And if the calculation is aggregating something, the output result is often much smaller that the raw data input.

If the amount of memory needed to process a single datum is less than the total available memory, a properly architected streaming solution may be all you need.

Most of the techniques applicable to processing data streams in general can help us too. The main difference however is that we have to work within the constraints of the browser environment.

For streaming I/O in the browser, there are a variety of options/protocols. And depending on the application/processing constraints, one approach may be more appropriate than the others.

🧑‍🏭 Web Workers

As you may or may not know, JavaScript is a single threaded language with an asynchronous event loop based concurrency model. This is actually not a limitation of the language design, but inherent to the run-time environment (i.e. the browser, Node.js, etc.). And while certain run-times (often JVM based) do in fact support real multi-threading, even implementations marketed as “multi-threaded” are actually just syntactic sugar hiding workers.

Web workers at their core are simply the browser’s answer to threads³. And more importantly, they allow web applications to execute code in parallel, rather than just concurrently.

A big difference between programming with threads and Web Workers however is that a Web Worker executes in its own isolated context (without access to shared memory or the DOM). To communicate with the main program/web page (or any other workers), you must pass messages back and forth.

An advantage of the worker model however, is that race conditions are impossible if you have no shared resources (like files or a database). And much less likely even with shared resources, since access to these is usually more obvious than simply accessing shared memory (like with threads). Additionally, in contrast to multithreading, the single thread in the main program coordinating the communication between the workers cannot be preempted⁴. A downside to this (as you have likely been the victim of) is that a runaway JavaScript function in the main program can block the entire event loop and freeze the browser window.

🧮 GPUnit

The GPU is often the darling of performance junkies… I mean how can you not get starry eyed at a 🤩5120 CUDA core V100 🤩 And these 5120 cores means that you can run massively parallel operations. But what they won’t tell you, is that those 5120 cores can only run a very specific type of instruction.

When considering if GPU based processing can help, you first must have a computational bottleneck. A GPU will not make sending data over the network any faster, nor will it give you any more memory. And second, you must be able to formulate you computation as a shader. Even when using a library for “general purpose” GPU (GPGPU) programming, you still are constrained by the need to formulate your program as a shader due to supported web standards (but this may change with the WebGPU API).

Another limitation is that to run computations on the GPU, you actually need to transfer the data to the GPUs memory (which is often much less than RAM). So if you will need to shuttle data between RAM and the GPU often, transfer overhead/latency might outweigh the actual computational speedup.

Because of these aforementioned constraints, often statistical/mathematical computing and machine learning applications are the most amenable to speedup with the GPU (high gain-to-pain ratio).

In the browser, you can access the GPU using the WebGL Javascript API. You can write shaders in OpenGL and use the JavaScript API to execute these shaders on data. But writing shaders is usually a pretty low level interface (especially for non-graphics GPGPU stuff), so if you do want to go down this route I recommend using a higher level library.

🛠️🧰 WebAssembly

And so, we are now in the final act… The latest golden boy of the web performance world is WebAssembly. And for good reason.

WebAssembly is exciting for what it brings to the browser environment:

Safe code execution in a sandbox
Near native speeds
Ability to run code written in non-JavaScript languages in a browser
Uses open web standards (and no proprietary plugins)

I will not get into the specific mechanics of how the WebAssembly VM works, nor the semantics of the WebAssembly text format. To learn more about those I highly recommend starting with Lin Clark’s Cartoon intro to WebAssembly and then checking out some of the other resources.

For this post, I mainly want to provide enough context on WebAssembly so you hopefully can evaluate where and when it might be beneficial.

WebAssembly ironically does not perform the same role as native assembly, nor can you actually do any web programming in it (it doesn’t have access to the DOM). I like to think of WebAssembly as something of an intermediate representation for programs that execute in a web browser. Because it is an intermediate representation (IR) it is not meant to be programmed in directly (much like you would should not write a program in LLVM-IR), but because it is an IR it is somewhat more readable/understandable than machine instructions.

WebAssembly can usually execute faster than the equivalent JavaScript for many programs, but it doesn’t always run faster. And to situate WebAssembly in our modern toolkit, I would say that it is most accurate to think of it as an alternative to C/C++/Rust (which coincidentally are the most well supported languages targeting wasm). JavaScript still does what it was optimized for best (and will continue to do so)— dynamic manipulation of the DOM and orchestration of interaction with the browser.

A few quick notes to clear some things up:

JavaScript DOES NOT compile to wasm^*
WebAssembly is ahead-of-time (AOT) compiled^†
WebAssembly DOES NOT have access to the DOM (or the Web APIs)
WebAssembly uses a sequential linear memory

Because of these points, WebAssembly exists as something of a dual to WebGL in the realm of performance. WebAssembly really excels at speeding up computational intensive processes that often have sequential random memory access and complex control flow^‡.

On the practical side of things, one of the most common ways of writing applications targeting WebAssembly is to use emscripten, a C/C++ to wasm compiler toolchain. And as WebAssembly is becoming more widespread, some languages (like Rust, AssemblyScript and Go) have wasm as a direct compilation target.

Resources

Profiling

Benchmarks

WebAssembly

WebGL

GPGPU
- GPU.js
- regl
- WebGPU
Graphics
- three.js
- vis.gl
Math and ML

comments/questions/concerns happen over on ✌️ Github

Read Entire Article