Monday, August 21, 2017

What your Internet Of Things startup can learn from LockState

The company LockState has been in the news recently for sending an over-the-air update to one of their smart lock products which "bricked" over 500 of these locks. This is a pretty spectacular failure on their part, and it's the kind of thing that ought to be impossible in any kind of well-run software development organization, so I think it's worthwhile to go through a couple of the common-sense processes that you can use to avoid being the next company in the news for something like this.

The first couple of these are specific to the problem of shipping the wrong firmware to a particular model, but the others apply equally well to an update that's for the right target, but is fundamentally broken, which is probably the more-likely scenario for most folks.

Mark your updates with the product they go to
The root cause of this incident was apparently that LockState had produced an update intended for one model of their smart locks, and somehow managed to send that to a bunch of locks that were a different model. Once the update was installed, those locks were unable to connect to the Internet (one presumes they don't even boot), and so there was no way for them to update again to replace the botched update.

It's trivially-easy to avoid this issue, using a variety of different techniques. Something as simple as using a different file name for firmware for different devices would suffice. If not that, you can have a "magic number" at a known offset in the file, or a digital signature that uses a key unique to the device model. Digitally-signed firmware updates are a good idea for a variety of other reasons, especially for a security product, so I'm not sure how they managed to screw this up.

Have an automated build & deployment process
Even if you've got a good system for marking updates as being for a particular device, that doesn't help if there are manual steps that require someone to explicitly set them. You should have a "one button" build process which allows you to say "I want to build a firmware update for *this model* of our device, and at the end you get a build that was compiled for the right device, and is marked as being for that device.

Have a staging environment
Every remote firmware update process should have the ability to be tested internally via the same process end-users would use, but from a staging environment. Ideally, this staging environment would be as similar as possible to what customers use, but available in-company only. Installing the bad update on a single lock in-house before shipping it to customers would have helped LockState avoid bricking any customer devices. And, again - this process should be automated.

Do customer rollouts incrementally
LockState might have actually done this, since they say only 10% of their locks were affected by this problem. Or they possibly just got lucky, and their update process is just naturally slow. Or maybe this model doesn't make up much of the installed base. In any case, rolling out updates to a small fraction of the installed base, then gradually ramping it up over time, is a great way to ensure that you don't inconvenience a huge slice of your user base all at once.

Have good telemetry built into your product
Tying into the previous point, wouldn't it be great if you could measure the percentage of systems that were successfully updating, and automatically throttle the update process based on that feedback? This eliminates another potential human in-the-loop situation, and could have reduced the damage in this case by detecting automatically that the updated systems were not coming back up properly.

Have an easy way to revert firmware updates
Not everybody has the storage budget for this, but these days, it seems like practically every embedded system is running Linux off of a massive Flash storage device. If you can, have two operating system partitions, one for the "current" firmware, and one for the "previous" firmware. At startup, have a key combination that swaps the active install. That way, if there is a catastrophic failure, you can get customers back up and running without having them disassemble their products and send them in to you, which is apparently how LockState is handling this.

If your software/hardware allows for it, you can even potentially automate this entirely - have a reset watchdog timer that gets disabled at the end of boot, and if the system reboots more than once without checking in with the watchdog, switch back to the previous firmware.

Don't update unnecessarily
No matter how careful you are, there are always going to be some cases where a firmware update goes bad. This can happen for reasons entirely out of your control, like defective hardware that just happens to work with version A of the software, but crashes hard on version B.

And of course the easiest way to avoid having to ship lots of updates is sufficient testing (so you have fewer critical product defects to fix), and reducing the attack surface of your product (so you have fewer critical security issues that yo need to address on a short deadline.


Sunday, August 13, 2017

Why I hate the MacBook Pro Touchbar

Why I hate the MacBook Pro Touchbar

The Touchbar that Apple added to the MacBook Pro is one of those relatively-rare instances in which they have apparently struck the wrong balance between form and function. The line between “elegant design” and “design for its own sake” is one that they frequently dance on the edge of, and occasionally fall over. But they get it right often enough that it’s worth sitting with things for a while to see if the initial gut reaction is really accurate.

I hated the Touchbar pretty much right away, and I generally haven’t warmed up to it at all over the last 6 months. Even though I’ve been living with it for a while, I have only recently really figured out why I don’t like it.

What does the Touchbar do?

One of the functions of the Touchbar, of course, is serving as a replacement for the mechanical function keys at the top of the keyboard. It can also do other things, like acting as a slider control for brightness, or quickly allowing you to quickly select elements from a list. Interestingly, it’s the “replacement for function keys” part of the functionality that gets most of the ire, and I think this is useful for figuring out where the design fails.

What is a “button” on a screen, anyway?

Back in the dark ages of the 1980s, when the world was just getting used to the idea of the Graphical User Interface, the industry gradually settled on a series of interactions, largely based on the conventions of the Macintosh UI. Among other things, this means “push buttons” that highlight when you click the mouse button on them, but don’t actually take an action until you release the mouse button. If you’ve used a GUI that’s takes actions immediately on mouse-down, you might have noticed that they feel a bit “jumpy”, and one reason the Mac, and Windows, and iOS (mostly)  perform actions on release is exactly because action on mouse-down feels “bad”.

Why action on release is good:

Feedback — When you mouse-down, or press with your finger, you can see what control is being activated. This is really important to give your brain context for what happens next. If there’s any kind of delay before the action completes, you will see that the button is “pressed”, and know that your input was accepted. This reduces both user anxiety, and double-presses.

Cancelability — In a mouse-and-pointer GUI, you can press a button, change your mind, and move the mouse off before releasing to avoid the action. Similar functionality exits on iOS, by moving your finger before lifting it. Even if you hardly ever use this gesture, it’s there, and it promotes a feeling of being in control.

Both of these interaction choices were made to make the on-screen “buttons” feel and act more like real buttons in the physical world. In the case of physical buttons or keyswitches, the feedback and the cancelability are mostly provided by the mechanical motion of the switch. You can rest your finger on a switch, look at which switch it is, and then decide that you’d rather do something else and take your finger off, with no harm done. The interaction with a GUI button isn’t exactly the same, but it provides for “breathing space” in your interaction with the machine, which is the important thing.

The Touchbar is (mostly) action on finger-down

With a very few exceptions [maybe worth exploring those more?], the Touchbar is designed to launch actions on finger-down. This is inconsistent with the rest of the user experience, and it puts a very high price on having your finger slip off of a key at the top of the keyboard. This is exacerbated by bad decisions made by third-party developers like Microsoft, who ridiculously put the “send” function in Outlook on the Touchbar, because if there was ever anything I wanted to make easier, it’s sending an email before I’m quite finished with it.

How did it end up working that way?

I’m not sure why the designers at Apple decided to make things work that way, given their previous experience with GUI design on both the Mac and iOS. If I had to take a guess, the logic might have gone something like this:

The Touchbar is, essentially, a replacement for the top row of keys on the keyboard. Given that the majority of computer users are touch-typists, then it makes sense to have the Touchbar buttons take effect immediately, in the same way that the physical keyswitches do. Since the user won’t be looking at the Touchbar anyway, there’s no need to provide the kind of selection feedback and cancelability that the main UI does.

There are a couple of places where this logic goes horribly wrong. First off, a whole lot of people are not touch typists, so that’s not necessarily the the right angle to come at this from. Even if they were, part of the whole selling point of the Touchbar is that it can change, in response to the requirements of the app. So you’re going to have to look at it some of the time, unless you’ve decide to lock it into “function key only” mode. In which case, it’s a strictly-worse replacement for the keys that used to be there, and you’re not getting the benefits of the reconfigurability.

Even if you were going to use the Touchbar strictly as an F-key replacement, it still doesn’t have the key edges to let you know whether you’re about to hit one key or two, so you’ll want to look at what you’re doing anyway. I know there are people out there who use the function keys without looking at them, but the functions there are rarely-enough used that I suspect the vast majority of users end up having to look at the keyboard anyway, in order to figure out which one is the “volume up” key, or at least the keyboard-brightness controls.

How can Apple fix this?

First, and primarily, make it less-likely for users to accidentally activate functions on the Touchbar. I think that some amount of vertical relief could make this failure mode less-likely, though I’m not sure if the Touchbar should be higher or lower than it is now. I have even considered trying to fabricate a thin divider to stick to the keyboard to keep my finger away from accidentally activating the “escape” key, which is my personal pet-peeve with the touch bar.

A better solution to that problem is probably to include some amount of pressure-sensitivity and haptic feedback. The haptic home button on the iPhone 7 does a really good job of providing satisfying feedback without any visuals, so we know thiscan work well. Failing that, at least some way to bail out of hitting the Touchbar icons would be worth pursuing - possibly making them less senstive to grazing contact, though that would increase the cases where you didn’t activate a button while actually trying to.

Another option would be bringing back the physical function keys. This doesn’t necessarily mean killing the Touchbar, but maybe just moving it away from the rest of the keyboard a bit. This pretty much kills the “you can use it without taking your eyes off the screen or your hands off the home row” argument, but I’m really not convinced that’s at all realistic, unless you only ever use one application.

So, is Apple going to do anything to address these issues?

Honestly? You can never tell with them. Apple’s history of pushing customers forward against their will (see above) argues that they’ll pursue this for a good while, even if it’s clearly a bad idea. On the other hand, the pressure-sensitivity option seems like the kind of thing they might easily decide to add all by themselves. In the meantime, I’ll be working out my Stick On Escape Key Guard…