Monday, August 28, 2017

A brief history of the Future

A brief history of the Future

Lessons learned from API design under pressure

It was August of 2009, and the WebOS project was in a bit of trouble. The decision had been made to change the underlying structure of the OS from using a mixture of JavaScript for applications, and Java for the system services, to using JavaScript for both the UI and the system services. This decision was made for a variety of reasons, primarily in a bid to simplify and unify the programming models used for application development and services development. It was hoped that a more-familiar service development environment would be helpful in getting more developers on board with the platform. It was also hoped that by having only one virtual machine running, we'd save on memory.

Initially, this was all built on top of a customized standalone V8 JavaScript interpreter, with a few hooks to system services. Eventually, we migrated over to Node.js, when Node was far enough along that it looked like an obvious win, and after working with the Node developers to improve performance on our memory-limited platform.

The problem with going from Java to JavaScript

As you probably already know, despite the similarity in the names, Java and JavaScript are very different languages. In fact, the superficial similarities in syntax were only making things harder for the application and system services authors trying to translate things from Java to JavaScript.

In particular, the Java developers were used to a multi-threaded environment, where they could spin off threads to do background tasks, and have them call blocking APIs in a straightforward fashion. Transitioning from that to JavaScript's single-threaded, events and callbacks model was proving to be quite a challenge. Our code was rapidly starting to look like a bunch of "callback hell" spaghetti.

The proposed solution

As one of the most-recent additions to the architecture group, I was asked to look into this problem, and see if there was something we could do to make it easier for the application and service developers to write readable and maintainable code. I went away and did some research, and came back with an idea, which we called a FutureThe Future was a construct based on the idea of a value that would be available "in the future". You could write your code in a more-or-less straight-line fashion, and as soon as the data was available, it'd flow right through the queued operations.

If you're an experienced JavaScript developer, you might be thinking at this point "this sounds a lot like a Promise", and you'd be right. So, why didn't we use Promises? At this point in history, the Proamises/A spec was still in active discussion amongst the CommonJS folks, and it was not at all obvious that it'd become a popular standard (and in fact, it took Promises/A+ for that to happen). The Node.js core had in fact just removed their own Promises API in favor of a callback-based API (this would have been around Node.js v0.1, I think).

The design of the Future

Future was based on ideas from SmallTalk(promise), Java(future/promise), Dojo.js(deferred), and a number of other places. The primary design goals were:
  • Make it easy to read through a piece of asynchronous code, and understand how it was supposed to flow, in the "happy path" case
  • Simplify error handling - in particular, make it easy to bail out of an operation if errors occur along the way
  • To the extent possible, use Future for all asynchronous control flow
You can see the code for Future, because it got released along with the rest of the WebOS Foundations library as open source for the Open WebOS project.

My design brief looked something like this:
A Future is an object with these properties and methods:
.result The current value of the Future. If the future does not yet have a value, accessing the result property raises an exception. Setting the result of the Future causes it to execute the next "then" function in the Future's pipeline.  
.then(next, error) Adds a stage to the Future's pipeline of steps. The Future is passed as a parameter to the function "next". The "next" function is invoked when a value is assigned to the future's result, and the (optional) "error" function is invoked if the previous stage threw an exception. If the "next" function throws an exception, the exception is stored in the Future, and will be re-thrown if the result of the Future is accessed.
This is more-or-less what we ended up implementing, but the API did get more-complicated along the way. Some of this was an attempt to simplify common cases that didn't match the initial design well. Some of it was to make it easier to weld Futures into callback-based code, which was ultimately a waste of time, in that Future pretty much wiped out all competing approaches to flow control. And one particular set of changes was thrown in at the last minute to satisfy a request that should just have been denied (see What went wrong, below).

What went right

We shipped a "minimal viable product" extremely quickly

Working from the initial API design document, Tim got an initial version of Future out to the development team in just a couple of days, which had all of the basics working. We continued to iterate for quite a while afterwards, but we were able to start the process of bring people up to speed quickly.

We did, in fact, eliminate "callback hell" from our code base

After the predictable learning curve, the former Java developers really took to the new asynchronous programming model. We went from "it sometimes kind of works", to "it mostly works" in an impressively-short time. Generally speaking, the Future-based code was shorter, clearer, and much easier to read. We did suffer a bit in ease of debugging, but that was as much due to the primitive debugging tools on Node as it was to the new asynchronous model.

We doubled-down on our one big abstraction

Somewhat surprisingly to me, the application teams also embraced Futures. They actually re-wrote significant parts of their code to switch over to Future-based APIs at a deeper level, and to allow much more code sharing between the front end and back end of the Mail application, for example. This code re-use was on the "potential benefits" list, but it was much more of a win than anyone originally expected.

We wrote a bunch of additional libraries on top of Future, for all sorts of asynchronous tasks - for file I/O, database access, network and telecoms, for the system bus (dbus) interface, basically anything that you might have wanted to access on the platform, was available as a Future-based API.

The Future-based code was very easy to reason about in the "happy path" case

One of the best things about all this, is that with persistent use of Futures everywhere, you could write code that looked like this:
downloadContacts().then(mergeContacts).then(writeNewContacts).then(success, error)
Most cases were a bit more-complicated than that (often using inline functions), but the pattern of  only handling the success case, and just letting errors propagate, was very common. And in fact, the "error" case was, as often as not, logging a message and rescheduling the task for later.

The all-or-nothing error propagation technique fit (most of) our use cases really well

The initial use case of the Future was for a WebOS feature called "Synergy". This was a framework for combining data from multiple sources into a single uniform format for the applications. So, for example, you could combine your contacts from Facebook, Google, and Yahoo into a single address book list, and WebOS would automatically de-dubplicate and link related contacts, and sync changes made on the phone to the proper remote service that the data originally came from. Similarly, all of your incoming e-mail went into the same "Mail" database on-device.

In a multi-stage synchronization process like this, there are all sorts of ways that the operation can fail - the remote server might be down, or the network might be flaky, or the user might decide to put the phone into airplane mode in the middle of a sync operation. In the vast majority of cases, we didn't actually care what the error was, just that an error had occurred. When an error happened, the usual response was to leave the on-phone data the way it was, and try again later. In those cases where "fuck it, I give up" was not the right error handling strategy, the rough edges of the error-handling model were a bit easier to see.

What went wrong

The API could have been cleaner/simpler

It didn't take long before we were adding convenience features to make some of the use cases simpler. Hence, the "whilst" function on Future, which was intended to make it easier to iterate over a function that returned Futures. There were a couple of other additions that also got a very small amount of use, and could have easily been replaced by documentation of the "right" way to do things.

Future had more-complex internal state than was strictly needed

If you look at Promises, they've really only got the minimal amount of state, and you chain functions together by returning a Promise from each stage. Instead of having lots and lots of Futures linked together to make a pipeline of operations, Future was the pipeline. I think that at some level this both decreased heap churn by not creating a bunch of small objects, and it probably made it somewhat easier to debug broken pipelines (since all of the stages were visible). Obviously, if we'd known that Promises were going to become a big thing in JavaScript, we would have stayed a lot closer to the Promises/A spec.

Error handling was still a bit touchy, for non-transactional cases

If you had to write code that actually cared about handling errors, then the "error" function was actually located in a pretty terrible place, you'd have all these happy-path "then" functions, and one error handler in the middle. Using named functions instead of anonymous inline functions helped a bit with this, but I would still occasionally get called in to help debug a thrown exception that the developer couldn't find the source for.

It would have been really nice to have a complete stack trace for the exception that was getting re-thrown, but we unfortunately didn't have stack traces available in both the application context and the service context. In the end, "thou shalt not throw an exception unless it's uniquely identifiable" was almost sufficient to resolve this.

I caved on a change to the API that I should have rejected

Fairly late in the process, someone came to me and said "I don't like the 'magic' in the way the result property works. People don't expect that accessing a property will throw an exception, so you should provide an API to access the state of the Future via function calls, rather than property access".  At this point, we had dozens of people successfully using the .result API, and very little in the way of complaints about that part of the design.

I agreed to make the addition, so we could "try it out" and see whether the functional API was really easier or clearer to use. Nobody seemed to think so, except for the person who asked for it. Since they were using it, it ended up having to stay in the implementation. and since it was in the implementation, it got documented, which just confused later users (especially third parties), who didn't understand why there were two different ways to accomplish the same tasks.


How do I feel about this, 8 years later?

Pretty good, actually. Absent a way to see into the future, I think we made a pretty reasonable decision with the information we had available. The Bedlam team did an amazing bit of work, and WebOS got rapidly better after the big re-architecturing. In the end, it was never quite enough to displace any of the major mobile OSes, but I still miss some features of Synergy, even today. After all the work Apple has done over the years to improve contact sync, it's still not quite as good (and not nearly as open to third parties) as our solution was.

No comments: