Monday, November 20, 2017

Mergers: The Good

(intro blog post here)

How to help your acquired employees succeed

Out of the 6 acquisitions I've been involved with, two really stand out as positive experiences, both for the acquired and the parent company. Here's what was different about those two mergers, as opposed to the ones that didn't go so well.

Integrate the new team quickly (Apple/NeXT)

In Apple's case, they acquired NeXT both in order to get new technology to base their next-generation OS on, and to get a fully-functional engineering organization. You can't really understand just how screwed up Apple was in 1996 unless you were there, but in the last quarter of the year, Apple lost over a billion dollars. They had, at that point, had 2 or 3 (depending on how you count) different "next generation" OS projects crash and burn, and the latest one, Copland, was on the verge of disaster – I've seen credible evidence that it wouldn't have shipped for another 5 years, if ever. Into all this swirling chaos, we bring the NeXT team, understandably freaked out to be called on to "save" this huge, dysfunctional company from itself.

But one thing that was hugely encouraging, and helped us to all settle in, was how quickly we were integrated into the Apple organization as a whole. Within a month after the acquisition, we were meeting with our counterparts in Cupertino, we had email addresses, our systems were on the Apple network, and we'd had an army of Apple HR folks over to the NeXT offices to get us transferred over to Apple's payroll and benefits.

It was still a very hard slog, and there was a LOT of anger from folks at Apple that had their friends laid off right after the acquisition, but feeling like we were legitimately part of the team, and not just a bunch of outsiders, helped us to fight the battles we had to fight.

Put the full support of the larger organization behind the newcomers (LG/WebOS)

After the debacle that was the HP's acquisition of Palm (see the "Ugly" segment, coming soon), the folks remaining on the WebOS team were pretty nervous when we were told that we were being sold off to LG. "Oh, great, another absentee owner who will tell us we're important, but then never do anything".

And then we had our first meetings with LG's upper management. And we were told that we would be building the user interface for all of LG's high-end smart TV's, that we were going to ship in less than a year, and that we were expected to deliver something BETTER than the existing NetCast software, which they had been shipping for a few years. "Oh, crap, I thought - none of us know anything about Smart TVs, or TVs in general". But then they told us: "The CEO has expressed his full support of this project, and you'll have as much support as you need".

I really didn't believe that we were going to get "as much support as you need", but sure enough, within a short time period after the acquisition, truckloads of current-generation TVs and prototype logic boards for the next generation started flooding into the office. And in the months after that, truckloads of engineers from Korea, who knew the hardware and the existing NetCast software intimately. Anything we asked for, we got – score one for top-down, authoritarian management style, I guess.

And we did it - a small group of developers, working their asses off, managed to build something in less than a year which was immensely better than the existing product, which had been shipping for several years. The next-generation smart TVs, with a new version of WebOS, were even better. This was definitely a high point for the "acquire a smaller company to bring innovation to the larger company" strategy. And it succeeded because the project had a powerful advocate within the larger company, and a VERY clear vision of what they wanted to accomplish.

Next week

What not to do to make your new employees feel welcome, and how to tell (as an employee) when things are likely to go sour quickly.

Monday, November 13, 2017

Mergers: The Good, The Bad, and The Ugly

You've been acquired how many times?

In my career, I've been fortunate enough to have worked for a number of small software/hardware companies, several of which were absorbed by much larger companies. I though tit'd be interesting to compare and contrast some of the ways the various mergers went good and bad, and what acquiring companies might be able to learn from my experience.

Here's the timeline so far:

  1. I started working for NeXT Software in 1994, they were acquired by Apple in 1996.
  2. I left Apple in 1999 to work for Silicon Spice. They were acquired by Broadcom in 2000.
  3. Broadcom laid me off, and I went back to Apple for a while.
  4. I left Apple in 2005 to work at Zing Systems, which was acquired by Dell in 2007.
  5. I left Dell to go work at Palm in 2009. In 2010, Palm was acquired by Hewlett-Packard.
  6. Hewlett-Packard eventually sold the entire WebOS group to LG.
  7. I left LG to go work for Citrix on GoToMeeting. After 2 1/2 years, the GoToMeeting business was spun off and merged with LogMeIn, Inc.
So I've been part of 6 different merger/acquisition processes at this point, and I feel like I'm getting a feel for how you can tell when an acquisition is going to go well, as opposed to going poorly.

Why do big companies buy smaller companies?

When a big company acquires a smaller company, it can be for a variety of reasons. Sometimes it's to acquire a potential competitor, before they can get large enough to threaten the larger company. It can be an "acqui-hire", where they buy the smaller company strictly for its human resources, and have no real interest in the technology or products the smaller company has developed (this happens with social media companies frequently, because skilled developers are hard to find). Or, it can be a case of acquiring a new technology, and a team of experts in that technology, in order to either kick-start a new product, or to kick new life into an existing product. That last reason was the primary reason for all of the acquisitions I've been involved in.

What's the most-comon mistake acquiring companies make?

Understandably, big companies often look to smaller companies as an engine to drive innovation. There's a perception that small companies can move faster and be more nimble than larger companies. So there's often a desire to let the new acquisition run itself, as a sort of independent entity inside the larger company. Being hands-off seems like the obviously-right thing to do if you wanted increased agility to start with, but this is generally not as good of an idea as it'd seem at first blush.

Yes, you absolutely don't want to break up the functional team you just acquired, and spread them out willy-nilly throughout your company. You don't want to drag them into the bureaucracy and infighting that has marred all of your internal attempts at innovation. But guess what? If you don't make an effort to get them at least nominally integrated with the rest of the company, you will, at best, end up with an isolated group, who continue to do their one thing, but don't meaningfully contribute to your larger organization's bottom line. And the smaller group will also get none of the benefits of scale of being part of the larger group. It's lose-lose.

Examples of the Good, the Bad, and the Ugly

Tune in next Monday (and the Monday after that) for real-life tales of integrations gone well, gone poorly, and gone horribly wrong.

Monday, November 06, 2017

That delicate line between security and convenience

A key problem, maybe the key problem in software security is how to properly balance user convenience with security. Adding additional security to a system often includes extra work, more time, or other compromises from the end-user. And reasonable people can disagree about where the line is for the appropriate trade-off.

That iPhone camera permissions "flaw"
There was a brief flurry of articles in the news recently, talking about a "flaw" in iOS permissions which would allow applications to take your picture without you being aware. Typically, these were presented with click-bait headlines like:


The blog post of the actual security researcher who raised this issue (Felix Krause) is substantially less-sensational:

It's good that this issue is getting some attention, but it's important to understand where we're coming from, what the actual issue is, and possible ways to mitigate it. As a quick aside, I find it annoying that the articles say "Google engineer". Yes, Krause works for Google, but this work is not coming out of his "day job", but rather his own work in security research. Also, Android has exactly this same problem, but it doesn't merit a blog post or worldwide news coverage, because apparently nobody expects even minimal privacy from Android devices.

How camera permissions work on iOS today
The current version of iOS asks the user for permission to use the camera the first time that an application tries to access it. After that, fi the application is running in the foreground, it can access the camera whenever it wants to, without any additional interaction. And typically, this is actually what the user wants.

It's convenient and fun to be able to use the built-in camera support in Facebook without having to answer "yes I do want to use the camera" each time that you choose to share a photo on social media. And replacements for the built-in camera app, like Instagram, Snapchat, and Halide, would be pretty much unusable if you had to answer a prompt Every. Single. Time. you wanted to take a photo.

How it used to work
Previous versions of iOS actually required applications to use the built-in camera interface to take pictures. You still only had to give permission once, but it was really obvious when the app was taking you picture, because the camera preview was right there in your face, taking over your screen. This design was widely criticized by app developers, because it made for a really jarring break in their carefully-crafted use experience to have the built-in camera appear, and they couldn't provide a preview that actually showed what was going to be captured (with the rise of photo filters, this is especially problematic).

At some point, Apple added the capability to capture photos and video, while presenting the app's  own interface. This makes for a more-cohesive experience for the end-user, and makes it possible for apps to preview what they're actually going to produce, filters, silly hats, and all. This is clearly a win for the app developers, and I'd argue it is also a win for the end-user, as they get a better experience with the whole picture taking process.

What's the actual privacy issue here?
I use Facebook to post photos and videos, sometimes. But I don't really want Facebook taking pictures of my face when I'm not using the camera features, and analyzing that data to better serve me content, including advertisements.

If I'm scrolling through my news feed, and Facebook is silently analyzing the images coming through the back camera, so that they can discover my location and serve me ads for whatever business I'm standing in front of, that's intrusive and creepy. If they're reading my facial expression to try to determine how I feel about the items in my news feed, that's even worse.

How Apple can better-inform users
I don't think anybody wants to go back to using the UIImagePicker interface, and I don't think anybody (except possibly security researchers) wants to have to affirmatively give permission every time an application wants to take a picture or video. One alternative that I like (and Krause mentions this in his initial blog) is some kind of persistent system UI element that indicates that the camera is on. Apple already does something similar with a persistent banner on the top of the screen when applications in the background are capturing audio (for VoIP communications). A little dot on the status area would go a long way, here.

It'd also be really nice to have a toggle in Preferences (or better, in Control Center) to disable the camera system-wide, so if you know you're heading somewhere that you shouldn't be taking pictures, you can temporarily disable the camera.

What users can do to better protect themselves
Obviously, just don't grant camera permission to applications that don't actually need them.  think most social network software falls into this category. Twitter and Facebook don't actually need to access my camera, so I have it disabled for both of them. If you actually DO use Facebook and Twitter to take pictures, then I guess you'll just need to be more aware of the tradeoffs.

If you "have to" enable camera access to certain apps, but you don't fully-trust them, there are honest-to-goodnes lens caps you can buy which will cover your iPhone camera when you're not using it. Or a piece of tape works. There's even specially-made tape tabs for just this purpose.

Tuesday, October 17, 2017

"Responsible Encryption" - what does that mean?

This weekend I read this excellent article by Alex Gaynor responding to Deputy Attorney General Rod Rosenstein's remarks on encryption to two different audiences last week. Please do go and read it when you get a chance, as it delves into the sadly common tactic of pointing to a bunch of scary criminal incidents, then saying "unbreakable encryption enables criminals and terrorists", without presenting any evidence that those crimes were enabled by encryption technology, or that law enforcement officers were actually hampered in their investigations by encryption.

In fact, in the case of the FBI, Apple, and the San Bernardino shooter, AG Rosenstein repeats all of the same false narrative that we've been presented with before - that the shooter's phone possibly contained vital information, that Apple "could" decrypt the information, and that they fought the FBI's legal attempts to force them to do so. Read my previous blog post (linked above) for background on that line of argument, and how the FBI willfully twists the facts of the case, to try to get something much more far-reaching than what they claim to want.

One thing not addressed directly in Alex's article is the frustration that the FBI and other law enforcement  officials have expressed over the inability to execute a legal search warrant, when they're faced with a locked phone, or a communications system that provides end-to-end encryption.

From Rosenstein's remarks to the Global Security Conference
We use the term “responsible encryption” to describe platforms that allow police to access data when a judge determines that compelling law enforcement concerns outweigh the privacy interests of a particular user.  In contrast, warrant-proof encryption places zero value on law enforcement.  Evidence remains unavailable to the police, no matter how great the harm to victims.
First, what a bunch of emotionally-charged words. And again we see the disconnect between what the FBI and other agencies say that they want (a way to unlock individual phones), and what they seem to keep asking for (a key to unlock any phone they can get their hands on).

But the man does have a point - there is some value to society in the FBI being able to execute a valid search warrant against someone's phone, or to "tap" the communications between known criminals. And I think he's also right that that sort of access is not going to be provided if the free market is allowed to set the rules. It'll never be in Apple's or any individual customer's interest to make it easier to access a locked phone. So, it'll come down to a matter of legislation, and I think it's worth the tech folks having this conversation before Congress sits down with a bill authored by the FBI and the NSA to try to force something on us.

The encryption-in-flight question is very complicated (and crypto protocols are hard to get right - see the recent KRACK security vulnerabilities), so I'll leave that for a future post. I do believe that there are reasonable ways for tech companies to design data-at-rest encryption that is accessible via a court order, but maintains reasonably-good security for customers. Here's a sketch of how one such idea might be implemented:

On-device Key Escrow

Key escrow 
The basic idea of key escrow is that there can be two keys for a particular piece of encrypted data - one key that the user keeps, and one that is kept "in escrow" so another authorized agent can access the data, if necessary. The ill-fated Clipper Chip was an example of such a system. The fatal flaw of Clipper (well, one of them) is that it envisioned every single protected device would have its secondary key  held securely by the government to be used in case of a search warrant being issued. If Clipper had ever seen broad adoption, the value of that centralized key store would have been enormous, both economically and militarily. We're talking a significant fraction of the US GDP, probably trillions of dollars. That would have made it the #1 target of thieves and spies across the world.

Eliminating central key storage
But the FBI really doesn't need the ability to decrypt every phone out there. They need the ability to decrypt specific phones, in response to a valid search warrant. So, how about storing the second key on the device itself? Every current on-device encryption solution that I know of provides for the option of multiple keys. And in fact, briefly getting back to the San Bernardino shooter's phone, if the owners of that phone (San Bernardino County) had had a competent IT department, they would have set up a second key that they could then have handed over to the FBI, neatly avoiding that whole mess with suing Apple.

You could imagine Apple generating a separate "law enforcement" key for every phone, and storing that somewhere, but that has all the same problems as the Clipper central key repository, just on a slightly smaller scale. So those keys need to stored separately. How about storing them on the device itself?

Use secure storage
Not every phone has a "secure enclave" processor like the iPhone, but it's a feature that you'll increasingly see on newer phones, as Apple and other manufacturers try to compete on the basis of providing better privacy protection to their customers. The important feature of these processors is that they don't allow software running on the phone to extract the stored keys. This is what keeps the user's data secure from hackers. So, if the key is stored in there, but the phone software can't get it out, how will the FBI get the key?

Require physical access
My preferred solution would be for the secure enclave to have a physically-disconnected set of pins that can be used just for extracting the second key. In order to extract the key, you'd need to have physical access to the device, disassemble it, and solder some wires on it. This is, I think, sufficiently annoying that nobody would try to do it without getting a warrant first.

It also means that nobody can search your phone without taking it out of your possession for a good long while. This seems like a reasonable trade-off to me. If someone executes a search warrant on your house, you'll certainly know about it. There's such a thing as "sneak and peek" warrants, or delayed-notice warrants, where police sneak in and search your home while you're not there, but I'm not particularly interested in solving that problem for them.

Is this a perfect solution? Of course not. But I think something like this is a reasonable place to start when discussing law enforcement access to personal electronics. And I think we want to have this conversation sooner, rather than later. What do you think?

Monday, October 02, 2017

The "Just Smart Enough" House

Less Architectural Digest, more "This is our home"

We've been doing some remodeling on our house, and the overarching theme of the renovations has been "make this house convenient for real humans to live in". When we bought the house, it was "perfect" in one sense - the house is broken up into two sections, with a central courtyard between, and we were looking for a place where my Father-in-law could come live with us, and still have some space to himself and some privacy.

In many other respects, it was a wildly-impractical house. There's a sad story there, of a couple who fall into and out of love during a remodel, of a mother who overruled the architect in a few critical ways, of a home that was left unfinished when the couple living there split up, and of a house split (illegally) into two units to try to keep it, by supplementing income via renting out the back. 

The end result was a house that certainly looks "fancy", in that it's got a Great Room with a wall entirely filled up by windows and sliding doors, a big fireplace faced in Travertine, and a ridiculous number of doors to the outside, for that "indoor/outdoor living" feeling. Seriously, there are 11 doors to the outside, not including the garage door. Other than being slightly unfinished, it could totally have been a house featured in Architectural Digest.

But when you're living there, you start to notice some of the compromises. I don't think I've ever lived in a house that didn't have a coat closet before. Or a broom closet. Or a linen closet.  Hence the remodel, the first part of which was just turning the illegal 2nd unit into a more-reasonable bedroom suite for Bob, and adding some damn storage.

We added a bunch more storage into the Great Room, and that meant adding new electrical circuits for new under-cabinet and in-cabinet lighting. And because I'm a total nerd, that meant researching Smart Switches to control all of the new lighting (and ideally move some of the more-inconvenient switches to a better location).

Who do you trust?

I pretty quickly settled on getting my smart switches from an electrical equipment manufacturer, rather than some startup "home automation" company. I really, really don't want my house to burn down, and while I have no reason to think that the quality of the zillions of Wi-Fi enabled switches on is anything but excellent, I felt more-comfortable going with a company that has a hundred years or so of experience with not burning people's houses down.

Lutron vs Leviton

(that really sounds like a super-hero movie, doesn't it?)

Lutron and Leviton are two of the largest electrical fixture manufacturers, and choosing between one or the other when buying a regular switch or receptacle is mostly just a matter of which brand your local hardware store carries, and whether or not you want to save $0.25 by buying the store brand.

In the "Home Automation" arena, they each have a variety of solutions, ranging from giant panel-based systems that you're expected to put in a closet somewhere and have installed by a "trained integrator", to simpler systems which are aimed at the DIY market.

You can go all-in, or you can just put a toe in

It didn't take long for me to decide that the fancier integrated systems were not really what I wanted. First off, they're fairly expensive, though the expense looks a little less extreme once you start comparing the per-switch cost of the smart switches vs the centralized version. But ultimately, I didn't really want to deal with a "system integrator" setting the thing up (though apparently it's very easy to get certified by Lutron if you're a licensed electrician, which I'm not). Also, nobody had anything good to say about the phone apps that were available for these systems. And finally, the high-end systems are all about providing a touch pad interface, to give your home that proper Jetsons look. I have no interest in having touch screens mounted on the wall in every room, so that was more of a downside for me, than an attraction. The stand-alone switches from either vendor look more-or-less like standard Decora-style dimmers.

In the consumer-focused lines, there are some interesting differences between the two companies. Leviton's consumer products are mostly compatible with the Z-Wave standard, which means they work with third-party smart home hubs. The reviews online for the Smart Things and Wink hubs weren't particularly encouraging to me, so that was a bit of a bummer.

The Lutron stuff uses a proprietary wireless protocol, and they sell their own hub. The Caseta hub (Lutron's hub) seemed to actually get pretty good reviews. It isn't as capable as the Smart Things hub but, and this was pretty critical for me - it does connect to HomeKit, Apple's home automation system (it also works with Amazon's Alexa and the Google Home device). So, we went with the Lutron Caseta stuff, because it's easy to use, looks reasonable in our house and is available at both Home Depot and Lowes, as well as the local electrical supply store.

Hardware from the hardware store, software from a software company

The connection to HomeKit means that even though the Caseta hub isn't as full-featured as some of the other smart home hubs, I don't really need to care. We're pretty much an all-Apple shop here at Casa de Bessey, so knowing that I could control all of the things attached to the Caseta hub from my phone, using Apple's Home app, is a pretty big draw for me. 

I know it's the 21st century, and everybody needs to have an App, but that doesn't mean every application is equally well-made. If there's a feature that I really "need", and it's not available in the standard software that comes with the Caseta, I could (at least in theory) set up an Apple TV or an iPad as an always-on HomeKit hub, and write my own damn software to run on it.

HomeKit will likely continue to gain new features over the years, so I may never need to do anything custom. But if I do, it's nice to know that I can work with familiar tools and environment, rather than struggling with some obscure system provided by the switch manufacturer.

The Caseta Wireless experience

We're a couple of months into using the Caseta hardware, and here's how it's been going so far.

The Good

Dimmers everywhere
One thing I hadn't really thought about before doing this work is that the dimmer-switch version of the Caseta switches is almost the same price as the plain switch version. We were in the process of gradually replacing our CFL bulbs with LED bulbs anyway, so we've gone with dimmer switches basically everywhere. The added flexibility of being able to set the brightness of any particular light is a nice upgrade.

The basics are all there
All of the fancy features in the world wouldn't be helpful if the basic features weren't there. The switches feel nice, they look nice, and they're easy to install. The software makes it easy to set up "scenes" where you can hit a single button, and set the brightness level of any sub-set of lights in the house.

HomeKit/Siri integration
It just works. There really is something magical about being able to say "Siri, turn out all the lights", and have the entire house go dark. Or indeed saying "Siri, turn out the light in Jeremy's Room" to my watch, and having that work on the first try.

Easy to setup and use
You basically plug in the hub, press a button to pair it with the app on your phone, and then start adding devices. The switches are direct replacements for your existing switches, so installing them is basically:
  1. Turn off the power
  2. Remove the old switch
  3. Wire the new switch/dimmer in
  4. Turn the power back on
The only slightly-complex cases are when you're replacing a three-way switch. The Caseta solution for 3-way (or more) situations is to install the switch at one end, then just install battery-powered remotes at any other location you need to control that light from. When you take out the 3-way, you do need to connect the "traveller" wires together, but they provide instructions online to show you how to do that.

You do have to add each individual switch to the app one at a time, which could get tedious in a large installation. It sure made things easy for the electricians, though - they just had to wire things up, without keeping track of which switch went in which room, since I would set all that up later after they left. From talking to them, I got the impression that the usual install of the higher-end stuff does involve writing down a bunch of "this circuit is on switch #12345" notes, then going back and fixing things later when setting up the controller.

Unless the WiFi in the house is down, I haven't had any problems connecting to the hub, either from the Lutron app (when adding new hardware) or from Apple's Home App. Because the individual switches all have controls on them, even in the case of catastrophic failure, you can still walk around and turn off everything "by hand". That's another point in favor of the non-centralized system, I guess.

Supports "enough" devices for my house
One of the big differences between the Caseta stuff and Lutron's next higher tier (Radio RA2), is the number of "devices" they support. Every switch, every dimmer, and every remote control is a "device" for these counts. Caseta only supports 50 devices. I haven't come anywhere close to the limit yet, but we haven't replaced every last switch in the house yet, either. I think we'll be over 40 once all of the switches I care about have been replaced. Our house is close to 2,000 square feet, so if your house is smaller than that, I doubt the limit will ever matter much. And here's where the connection to HomeKit also helps - if we ever do hit the device limit, I can buy another Caseta hub for $75, and have another 50 devices.

The Bad

Range and range extenders
The Caseta documentation says that every controlled device needs to be within 30 feet of the hub. In practice, the maximum reach is just a bit longer than that in our house, but not very much farther. You can extend the range of the system, by using a plug-in dimmer as a repeater. You can have exactly one repeater, which is another limitation compared to the higher-end systems, which support multiple repeaters. But again - if I ever did run into this in practice, I'd probably just get another hub, and have one for each end of the house, since the hubs really aren't all that expensive.

Pricing structure
Honestly, the way that Lutron prices this stuff makes almost no sense at all. You can buy various "kits" with a hub, a dimmer and a remote, or a hub and a few dimmers and remotes, or a hub and some plug-in dimmers. The individual kit components cost more separately, which is no surprise, but some of the prices are weirdly inverted - it costs more to buy just a dimmer than it does to buy the dimmer, a remote, and all of the trim hardware. I assume anybody who makes extensive use of this product line eventually ends up with a box full of unused remotes, but that's just slightly wasteful, not an actual problem.

Trigger configuration is very basic
The "smart" hub isn't very smart. You can bind particular remotes to particular switches, set up scenes, and do some very basic automation. A recent software update improved some of this so that you can now do some more scheduling.

But take, for example, the "arriving home" automation. I can set up a scene to activate when I arrive home. That's nice, but I can't actually set up a scene to activate when I'm the first one home, or the last to leave. HomeKit supports this, so that might be the thing that gets me to finally set up an Apple TV as a HomeKit hub. Or maybe I'll wait for the HomePod...

The Unknown

I haven't done a basic security audit on the Caseta hub, yet. That'll make a fun weekend project. The online component of the hub is protected by a user name and password, at least. And if I do get totally paranoid, I can always disconnect the hub from the internet, and route everything through an iOS HomeKit hub, which is likely to be more-secure.

What happens if Lutron decides to end-of-life the Caseta line? Will I still be able to get replacement parts, or a new hub if the old one breaks? For that matter, what if Apple stops supporting HomeKit, or removes the Lutron app from the App Store?

This is the problem with going with the proprietary solution. I am somewhat dependent on both Lutron and Apple staying in this business, and getting along with each other. The hub is basically unusable without the app, so that's definitely a concern. I suspect if Lutron found themselves in a situation where they could no longer provide the iOS app, they'd be motivated to provide another solution, or at the very least, a migration strategy to one of their other control hubs.

At the absolute worst-case scenario, the Caseta switches and the remote controls can be set up and paired to operate completely independently of the hub. I'd lose all of the "smart" features, but at least I'd still have working light switches.


Overall, this was a really great way to get my feet wet with "smartening up" my home. The increased control over the lights in the house is convenient, and actually helps make the house more livable. The potential downsides are limited by the design of the Caseta system, which gracefully falls back to  "no worse than just having old light switches", something which is not necessarily true of other connected home devices, like thermostats, which can have terrible failure modes.

If you're interested in adding some smarts to your home, I can definitely recommend the Caseta products. They're easy to set up and use, and have been very reliable for us so far.

Monday, September 25, 2017

Follow up: LockState security failures

I wrote a blog post last month on what your IoT startup can learn from the LockState debacle. In the intervening weeks, not much new information has come to light about the specifics of the update failure, and it seems from their public statements that LockState thinks it's better if they don't do any kind of public postmortem on their process failures, which is too bad for the rest of us, and for the Internet of Things industry, in general - if you can't learn from others' mistakes, you (and your customers) might have to learn your own mistakes.

However, I did see a couple of interesting articles in the news related to LockState. The first one is from a site called, and it takes a bit more of a business-focused look at things, as you might have expected from the site name. Rather than looking at the technical failures that allowed the incident to happen, they take LockState to task for their response after the fact. There's good stuff there, about how it's important to help your customers understand possible failure modes, how you should put the software update process under their control, and how to properly respond to an incident via social media.

And on The Parallax, a security news site, I found this article, which tells us about another major failure on the part of LockState - they apparently have a default 4-digit unlock code set for all of their locks from the factory, and also an 8-digit "programming code", which gives you total control over the lock - you can change the entry codes, rest the lock, disable it, and disconnect it from WiFi, among other things.

Okay, I really shouldn't be surprised by this at this point, I guess - these LockState guys are obviously a bit "flying by the seat of your pants" in terms of security practice, but seriously? Every single lock comes pre-programmed with the same unlock code and the same master programming code?

Maybe I'm expecting too much, but if a $2.99 cheap combination lock from the hardware store comes with a slip of paper in the package with its combination printed on it, maybe the $600 internet-connected smart lock can do the same? Or hell, use a laser to mark the master combination on the inside of the case, so it's not easily lost, and anyone with the key and physical access can use the code to reset the lock, in the (rare) case that that's necessary.

Or, for that matter - if you must have a default security code for your device (because your manufacturing line isn't set up for per-unit customization, maybe?), then make it part of the setup process to change the damn code, and don't let your users get into a state where they think your product is set up, but they haven't changed the code.

It's easy to fall into the trap of saying that the user should be more-aware of these things, and they should know that they need to change the default code. But your customers are not typically security experts, and you (or at least some of your employees) should be security experts. You need to be looking out for them, because they aren't going to be doing a threat analysis while installing their new IoT bauble.

Monday, September 18, 2017

A short rant on XML - the Good, the Bad, and the Ugly

[editor's note: This blog post has been in "Drafts" for 11 years. In the spirit of just getting stuff out there, I'm publishing it basically as-is. Look for a follow-up blog post next week with some additional observations on structured data transfer from the 21st century]

So, let's see if I can keep myself to less than ten pages of text this time...

XML is the eXtensible Markup Language. It's closely related to both HTML, the markup language used to make the World Wide Web, and SGML, a document format that you've probably never dealt with unless you're either a government contractor, or you used the Internet back in the days before the Web. For the pedants out there, I do know that HTML is actually an SGML "application" and that XML is a proper subset of SGML. Let's not get caught up in the petty details at this point.

XML is used for a variety of different tasks these days, but the most common by far is as a kind of "neutral" format for exchanging structured data between different applications. To keep this short and simple, I'm going to look at XML strictly from the perspective of a data storage and interchange format.

The good

Unicode support

XML Documents can be encoded using the Unicode character encoding, which means that nearly any written character in any language can be easily represented in an XML document.

Uniform hierarchical structure

XML defines a simple tree structure for all the elements in a file - there's one root element, it has zero or more children, which each have zero or more children, ad infinitum. All elements must have an open and close tag, and elements can't overlap. This simple structure makes it relatively easy to parse XML documents.

Human-readable (more or less)

XML is a text format, so it's possible to read and edit an XML document "by hand" in a text editor. This is often useful when you're learning the format of an XML document in order to write a program to read or translate it. Actually writing or modifying XML documents in a text editor can be incredibly tedious, though a syntax-coloring editor makes it easier.

Widely supported

Modern languages like C# and Java have XML support "built in" in their standard libraries. Most other languages have well-supported free libraries for working with XML. Chances are, whatever messed up environment you have to work in, there's an XML reader/writer library available.

The bad

Legacy encoding support

XML Documents can also be encoded in whatever wacky character set your nasty legacy system uses. You can put a simple encoding="Ancient-Elbonian-EBCDIC" attribute in the XML declaration element, and you can write well-formed XML documents in your favorite character encoding. You probably shouldn't expect that anyone else will actually be able to read it, though.

Strictly hierarchical format

Not every data set you might want to interchange between two systems is structured hierarchically. In particular, representing a relational database or an in-memory graph of objects is problematic in XML. A number of approaches are used to get around this issue, but they're all outside the scope of standardized XML (obviously), and different systems tend to solve this problem in different ways, neatly turning the "standardized interchange format for data" into yet another proprietary format, which is only readable by the software that created it.

XML is verbose

A typical XML document can be 30% markup, sometimes more. This makes it larger than desired in many cases. There have been several attempts to define a "binary XML" format (most recently by the W3C group), but they really haven't caught on yet. For most applications where size or transmission speed is an issue, you probably ought to look into compressing the XML document using a standard compression algorithm (gzip, or zlib, or whatever), then decompressing it on the other end. You'll save quite a bit more that way than by trying to make the XML itself less wordy.

Some XML processing libraries are extremely memory-intensive

There are two basic approaches to reading an XML document. You can read the whole thing into memory and re-construct the structure of the file into a tree of nodes in memory, and then the application can use standard pointer manipulation to scan through the tree of nodes, looking for whatever information it needs, or further-transforming the tree into the program's native data structures. One XML processing library I've used loaded the whole file into memory all at once, then created a second copy of all the data in the tags. Actually, it could end up using up to the size of the file, plus twice the combined size of all the tags.

Alternatively, the reader can take a more stream-oriented approach, scanning through the file from beginning to end, and calling into the application code whenever an element starts or ends. This can be implemented with a callback to your code for every tag start/end, which gives you a simple interface, and doesn't require holding large amounts of data in memory during the parsing.

No random access

This is just fallout from the strict hierarchy, but it's extremely labor intensive to do any kind of data extraction from a large XML document. If you only want a subset of nodes from a couple levels down in the hierarchy, you've still got to step your way down there, and keep scanning throught the rest of the file to figure out when you've gone up a level.

The ugly

By far, the biggest problems with XML don't have anything to do with the technology itself, but with the often perverse ways in which it's misapplied to the wrong problems. Here are a couple of examples from my own experience.

Archiving an object graph, and the UUID curse

XML is a fairly reasonable format for transferring "documents", as humans understand them. That is, a primarily linear bunch of text, with some attributes that apply to certain sections of the text.

These days, a lot of data interchange between computer programs is in the form of relational data (databases), or complex graphs of objects, where you'll frequently need to make references back to previous parts of the document, or forward to parts that haven't come across yet.

The obvious way to solve this problem is by having a unique ID that you can reference to find one entity from another. Unfortunately, the "obvious" way to ensure that a key is unique is to generate a globally-unique key, and so you end up with a bunch of 64-bit or 128-bit GUIDs stuck in your XML, which makes it really difficult to follow the links, and basically impossible to "diff' the files, visually.

One way to avoid UUID proliferation is to use "natural unique IDs, if your data has some attribute that needs to be unique anyway.

What's the worst possible way to represent a tree?

I doubt anybody's ever actually asked this question, but I have seen some XML structures that make a pretty good case that that's how they were created. XML, by its heirarchical nature, is actually a really good fit for hierarchical data. Here is one way to store a tree in XML:

<pants color="blue" material="denim">
  <pocket location="back-right">
    <wallet color="brown" material="leather">  
      <bill currency="USD" value="10"></bill>  
      <bill currency="EURO" value="5"></bill>  

And here's another:


So, which one of those is easier to read? And did you notice that I added another 5 Euro to my wallet, while translating the structure? Key point here: try to have the structure of your XML follow the structure of your data.

Monday, September 04, 2017

Post-trip review: Telestial “International Travel SIM”

For our recent trip to Europe, Yvette and I tried the seasoned-traveler technique of swapping out the SIM cards in our phones, rather than paying AT&T’s fairly extortionate international roaming fees. It was an interesting experience, and we learned a few things along the way, which I’ll share here.

We used Telestial, which is apparently Jersey Telecom. Not New Jersey: JT is headquartered in the Jersey Isles, off the coast of Britain. JT/Telestial's claim to fame is really their wide roaming capability. Their standard “I’m traveling to Europe” SIM is good pretty much across all of Europe. It definitely worked just fine in Germany, Denmark, Sweden, Austria, Estonia, Russia, and Finland. They claim a few dozen more, and I don’t doubt it works equally-well in those countries.

Why not just get a local SIM in the country you’re visiting? Isn’t that cheaper?

People who frequently travel overseas will often just pop into a phone retailer, or buy a SIM from a kiosk in the airport. Based on the comparison shopping I did while we were traveling, this is definitely cheaper tan the Telestial solution. However, it’s not at all clear in many cases how well an Austrian SIM is going to work in Finland (for example), and just how much you’ll be paying for international roaming.

So, I think if you’re traveling to just one country (especially one with really cheap mobile phone service costs), buying a local SIM is definitely the way to go. I didn’t really want to keep updating people with new contact numbers every other day as we switched countries. I might look into one of the “virtual phone number” solutions, like Google Voice, for the next multi-country trip. Being able to give people one number, and still roam internationally, seems like it’d be useful, but I don’t know what the restrictions are.

What does setup look like?

First of all, you need a compatible phone. Not all American mobile phones will wrk in Europe. You can check the technical specs for the particular phone mode you have, to see which radio frequencies it supports. Alternatively, you can buy any iPhone more-recent than the iPhone 4s, all of which are “world phones”, as far as I know. Verizon and Sprint still some phones that are CDMA-only, which means they can’t work anywhere but the USA, but most CDMA smartphones also have a GSM SIM slot, so it’s worth taking a look to see, even if you’re on Verizon.

Secondly, your phone needs to not be “activation locked” to a particular carrier. Most phones sold on contract in the US are set up this way, so you can’t just default on the contract and get a phone you can use on another network. Ideally, your phone would get unlocked automatically at the end of the contract, but US law doesn’t require this, so you’ll need to request an unlock from your carrier. AT&T has made this process a lot easier since the last time I tried to do it, which is great, because I forgot to check that Yvette’s phone was unlocked before we left. I did manage to make the unlock request from the other phone while we were in a taxi on the freeway in Austria, which is a testament to how easy this stuff is these days, I guess.

Assuming you have a compatible phone, then process is basically power off phone, pop out the SIM tray with a paper clip, swap the SIMS, turn on the phone, and wait. For the Telestial SIM, you probably really want to activate it and pre-pay for some amount of credit before your trip, which is easy to do on their website.

What kind of plan did you get?

We had a pre-paid fixed allowance for calls and text, and 2GB of data for each phone. Calls were $0.35 a minute, and texts were $0.35 each. Pre-loading $30 on the phone was enough for and hour and a half of phone calls, or a fairly large number of texts. When we had data coverage, we could use iMessage or WhatsApp fro basically free text messages. I don't know whether Voice Over LTE actually worked, and if it avoided the per-minute charge, since we just didn't call that much.

Did you actually save any money?

Compared to what it cost to pay AT&T for an international roaming plan while Yvette was in the UK for a month, we definitely did save a substantial amount of money. This is true even with the crazy cruise ship issue (see below). Without that, it would have been massively less-expensive. And compared to AT&T’s “no plan” international rates (which I got to try out in Israel), there’s absolutely no comparison.

What happened on the cruise ship?

Most of the time, the cruise ship did not have cell service. Which was pretty much fine - we had good coverage when we were in port, and there was WiFi on the ship, if we wanted to pay for it (we did not). We had, on two occasions, a weird thing were our phones managed to connect to a shipboard cell network (maybe on another passing ship?), where we were charged truly outrageous roaming data rates - several dollars a megabyte, which obviously burned through the $30 in prepaid credit really fast. On the other hand, prepaid means that we didn't lose more than $30 (twice, so $60 total). I still don't know exactly what happened there, but if I do this again sometime, I'm going to keep data turned off on the phone when not in port.

The good:

  1. Pre-paid, which limits crazy bills
  2. Easy setup
  3. Easy to recharge, either over the phone, or using the app
  4. Per-minute and per-text rates not too terrible
  5. Works pretty much anywhere in Europe

The bad:
  1. Cruise ship roaming will use up your data allowance right quick
  2. Fixed recharge sizes, and monthly expiration
  3. Forwarding doesn’t work for texts
  4. Some weirdness with “from” numbers on texts (apparently Austria-only)?
  5. No tethering
  6. Email support non-responsive

Conclusion: would we do it again?

Overall, the process was fairly-painless, other than the cruise ship issue. If there’s a simple way to fix that, I’d have no problem doing this again. Otherwise, I’d have to recommend during cell data off when you’re not in port, to avoid accidentally costing yourself a bunch of money.

Monday, August 28, 2017

A brief history of the Future

A brief history of the Future

Lessons learned from API design under pressure

It was August of 2009, and the WebOS project was in a bit of trouble. The decision had been made to change the underlying structure of the OS from using a mixture of JavaScript for applications, and Java for the system services, to using JavaScript for both the UI and the system services. This decision was made for a variety of reasons, primarily in a bid to simplify and unify the programming models used for application development and services development. It was hoped that a more-familiar service development environment would be helpful in getting more developers on board with the platform. It was also hoped that by having only one virtual machine running, we'd save on memory.

Initially, this was all built on top of a customized standalone V8 JavaScript interpreter, with a few hooks to system services. Eventually, we migrated over to Node.js, when Node was far enough along that it looked like an obvious win, and after working with the Node developers to improve performance on our memory-limited platform.

The problem with going from Java to JavaScript

As you probably already know, despite the similarity in the names, Java and JavaScript are very different languages. In fact, the superficial similarities in syntax were only making things harder for the application and system services authors trying to translate things from Java to JavaScript.

In particular, the Java developers were used to a multi-threaded environment, where they could spin off threads to do background tasks, and have them call blocking APIs in a straightforward fashion. Transitioning from that to JavaScript's single-threaded, events and callbacks model was proving to be quite a challenge. Our code was rapidly starting to look like a bunch of "callback hell" spaghetti.

The proposed solution

As one of the most-recent additions to the architecture group, I was asked to look into this problem, and see if there was something we could do to make it easier for the application and service developers to write readable and maintainable code. I went away and did some research, and came back with an idea, which we called a FutureThe Future was a construct based on the idea of a value that would be available "in the future". You could write your code in a more-or-less straight-line fashion, and as soon as the data was available, it'd flow right through the queued operations.

If you're an experienced JavaScript developer, you might be thinking at this point "this sounds a lot like a Promise", and you'd be right. So, why didn't we use Promises? At this point in history, the Proamises/A spec was still in active discussion amongst the CommonJS folks, and it was not at all obvious that it'd become a popular standard (and in fact, it took Promises/A+ for that to happen). The Node.js core had in fact just removed their own Promises API in favor of a callback-based API (this would have been around Node.js v0.1, I think).

The design of the Future

Future was based on ideas from SmallTalk(promise), Java(future/promise), Dojo.js(deferred), and a number of other places. The primary design goals were:
  • Make it easy to read through a piece of asynchronous code, and understand how it was supposed to flow, in the "happy path" case
  • Simplify error handling - in particular, make it easy to bail out of an operation if errors occur along the way
  • To the extent possible, use Future for all asynchronous control flow
You can see the code for Future, because it got released along with the rest of the WebOS Foundations library as open source for the Open WebOS project.

My design brief looked something like this:
A Future is an object with these properties and methods:
.result The current value of the Future. If the future does not yet have a value, accessing the result property raises an exception. Setting the result of the Future causes it to execute the next "then" function in the Future's pipeline.  
.then(next, error) Adds a stage to the Future's pipeline of steps. The Future is passed as a parameter to the function "next". The "next" function is invoked when a value is assigned to the future's result, and the (optional) "error" function is invoked if the previous stage threw an exception. If the "next" function throws an exception, the exception is stored in the Future, and will be re-thrown if the result of the Future is accessed.
This is more-or-less what we ended up implementing, but the API did get more-complicated along the way. Some of this was an attempt to simplify common cases that didn't match the initial design well. Some of it was to make it easier to weld Futures into callback-based code, which was ultimately a waste of time, in that Future pretty much wiped out all competing approaches to flow control. And one particular set of changes was thrown in at the last minute to satisfy a request that should just have been denied (see What went wrong, below).

What went right

We shipped a "minimal viable product" extremely quickly

Working from the initial API design document, Tim got an initial version of Future out to the development team in just a couple of days, which had all of the basics working. We continued to iterate for quite a while afterwards, but we were able to start the process of bring people up to speed quickly.

We did, in fact, eliminate "callback hell" from our code base

After the predictable learning curve, the former Java developers really took to the new asynchronous programming model. We went from "it sometimes kind of works", to "it mostly works" in an impressively-short time. Generally speaking, the Future-based code was shorter, clearer, and much easier to read. We did suffer a bit in ease of debugging, but that was as much due to the primitive debugging tools on Node as it was to the new asynchronous model.

We doubled-down on our one big abstraction

Somewhat surprisingly to me, the application teams also embraced Futures. They actually re-wrote significant parts of their code to switch over to Future-based APIs at a deeper level, and to allow much more code sharing between the front end and back end of the Mail application, for example. This code re-use was on the "potential benefits" list, but it was much more of a win than anyone originally expected.

We wrote a bunch of additional libraries on top of Future, for all sorts of asynchronous tasks - for file I/O, database access, network and telecoms, for the system bus (dbus) interface, basically anything that you might have wanted to access on the platform, was available as a Future-based API.

The Future-based code was very easy to reason about in the "happy path" case

One of the best things about all this, is that with persistent use of Futures everywhere, you could write code that looked like this:
downloadContacts().then(mergeContacts).then(writeNewContacts).then(success, error)
Most cases were a bit more-complicated than that (often using inline functions), but the pattern of  only handling the success case, and just letting errors propagate, was very common. And in fact, the "error" case was, as often as not, logging a message and rescheduling the task for later.

The all-or-nothing error propagation technique fit (most of) our use cases really well

The initial use case of the Future was for a WebOS feature called "Synergy". This was a framework for combining data from multiple sources into a single uniform format for the applications. So, for example, you could combine your contacts from Facebook, Google, and Yahoo into a single address book list, and WebOS would automatically de-dubplicate and link related contacts, and sync changes made on the phone to the proper remote service that the data originally came from. Similarly, all of your incoming e-mail went into the same "Mail" database on-device.

In a multi-stage synchronization process like this, there are all sorts of ways that the operation can fail - the remote server might be down, or the network might be flaky, or the user might decide to put the phone into airplane mode in the middle of a sync operation. In the vast majority of cases, we didn't actually care what the error was, just that an error had occurred. When an error happened, the usual response was to leave the on-phone data the way it was, and try again later. In those cases where "fuck it, I give up" was not the right error handling strategy, the rough edges of the error-handling model were a bit easier to see.

What went wrong

The API could have been cleaner/simpler

It didn't take long before we were adding convenience features to make some of the use cases simpler. Hence, the "whilst" function on Future, which was intended to make it easier to iterate over a function that returned Futures. There were a couple of other additions that also got a very small amount of use, and could have easily been replaced by documentation of the "right" way to do things.

Future had more-complex internal state than was strictly needed

If you look at Promises, they've really only got the minimal amount of state, and you chain functions together by returning a Promise from each stage. Instead of having lots and lots of Futures linked together to make a pipeline of operations, Future was the pipeline. I think that at some level this both decreased heap churn by not creating a bunch of small objects, and it probably made it somewhat easier to debug broken pipelines (since all of the stages were visible). Obviously, if we'd known that Promises were going to become a big thing in JavaScript, we would have stayed a lot closer to the Promises/A spec.

Error handling was still a bit touchy, for non-transactional cases

If you had to write code that actually cared about handling errors, then the "error" function was actually located in a pretty terrible place, you'd have all these happy-path "then" functions, and one error handler in the middle. Using named functions instead of anonymous inline functions helped a bit with this, but I would still occasionally get called in to help debug a thrown exception that the developer couldn't find the source for.

It would have been really nice to have a complete stack trace for the exception that was getting re-thrown, but we unfortunately didn't have stack traces available in both the application context and the service context. In the end, "thou shalt not throw an exception unless it's uniquely identifiable" was almost sufficient to resolve this.

I caved on a change to the API that I should have rejected

Fairly late in the process, someone came to me and said "I don't like the 'magic' in the way the result property works. People don't expect that accessing a property will throw an exception, so you should provide an API to access the state of the Future via function calls, rather than property access".  At this point, we had dozens of people successfully using the .result API, and very little in the way of complaints about that part of the design.

I agreed to make the addition, so we could "try it out" and see whether the functional API was really easier or clearer to use. Nobody seemed to think so, except for the person who asked for it. Since they were using it, it ended up having to stay in the implementation. and since it was in the implementation, it got documented, which just confused later users (especially third parties), who didn't understand why there were two different ways to accomplish the same tasks.

How do I feel about this, 8 years later?

Pretty good, actually. Absent a way to see into the future, I think we made a pretty reasonable decision with the information we had available. The Bedlam team did an amazing bit of work, and WebOS got rapidly better after the big re-architecturing. In the end, it was never quite enough to displace any of the major mobile OSes, but I still miss some features of Synergy, even today. After all the work Apple has done over the years to improve contact sync, it's still not quite as good (and not nearly as open to third parties) as our solution was.

Monday, August 21, 2017

What your Internet Of Things startup can learn from LockState

The company LockState has been in the news recently for sending an over-the-air update to one of their smart lock products which "bricked" over 500 of these locks. This is a pretty spectacular failure on their part, and it's the kind of thing that ought to be impossible in any kind of well-run software development organization, so I think it's worthwhile to go through a couple of the common-sense processes that you can use to avoid being the next company in the news for something like this.

The first couple of these are specific to the problem of shipping the wrong firmware to a particular model, but the others apply equally well to an update that's for the right target, but is fundamentally broken, which is probably the more-likely scenario for most folks.

Mark your updates with the product they go to
The root cause of this incident was apparently that LockState had produced an update intended for one model of their smart locks, and somehow managed to send that to a bunch of locks that were a different model. Once the update was installed, those locks were unable to connect to the Internet (one presumes they don't even boot), and so there was no way for them to update again to replace the botched update.

It's trivially-easy to avoid this issue, using a variety of different techniques. Something as simple as using a different file name for firmware for different devices would suffice. If not that, you can have a "magic number" at a known offset in the file, or a digital signature that uses a key unique to the device model. Digitally-signed firmware updates are a good idea for a variety of other reasons, especially for a security product, so I'm not sure how they managed to screw this up.

Have an automated build & deployment process
Even if you've got a good system for marking updates as being for a particular device, that doesn't help if there are manual steps that require someone to explicitly set them. You should have a "one button" build process which allows you to say "I want to build a firmware update for *this model* of our device, and at the end you get a build that was compiled for the right device, and is marked as being for that device.

Have a staging environment
Every remote firmware update process should have the ability to be tested internally via the same process end-users would use, but from a staging environment. Ideally, this staging environment would be as similar as possible to what customers use, but available in-company only. Installing the bad update on a single lock in-house before shipping it to customers would have helped LockState avoid bricking any customer devices. And, again - this process should be automated.

Do customer rollouts incrementally
LockState might have actually done this, since they say only 10% of their locks were affected by this problem. Or they possibly just got lucky, and their update process is just naturally slow. Or maybe this model doesn't make up much of the installed base. In any case, rolling out updates to a small fraction of the installed base, then gradually ramping it up over time, is a great way to ensure that you don't inconvenience a huge slice of your user base all at once.

Have good telemetry built into your product
Tying into the previous point, wouldn't it be great if you could measure the percentage of systems that were successfully updating, and automatically throttle the update process based on that feedback? This eliminates another potential human in-the-loop situation, and could have reduced the damage in this case by detecting automatically that the updated systems were not coming back up properly.

Have an easy way to revert firmware updates
Not everybody has the storage budget for this, but these days, it seems like practically every embedded system is running Linux off of a massive Flash storage device. If you can, have two operating system partitions, one for the "current" firmware, and one for the "previous" firmware. At startup, have a key combination that swaps the active install. That way, if there is a catastrophic failure, you can get customers back up and running without having them disassemble their products and send them in to you, which is apparently how LockState is handling this.

If your software/hardware allows for it, you can even potentially automate this entirely - have a reset watchdog timer that gets disabled at the end of boot, and if the system reboots more than once without checking in with the watchdog, switch back to the previous firmware.

Don't update unnecessarily
No matter how careful you are, there are always going to be some cases where a firmware update goes bad. This can happen for reasons entirely out of your control, like defective hardware that just happens to work with version A of the software, but crashes hard on version B.

And of course the easiest way to avoid having to ship lots of updates is sufficient testing (so you have fewer critical product defects to fix), and reducing the attack surface of your product (so you have fewer critical security issues that yo need to address on a short deadline.

Sunday, August 13, 2017

Why I hate the MacBook Pro Touchbar

Why I hate the MacBook Pro Touchbar

The Touchbar that Apple added to the MacBook Pro is one of those relatively-rare instances in which they have apparently struck the wrong balance between form and function. The line between “elegant design” and “design for its own sake” is one that they frequently dance on the edge of, and occasionally fall over. But they get it right often enough that it’s worth sitting with things for a while to see if the initial gut reaction is really accurate.

I hated the Touchbar pretty much right away, and I generally haven’t warmed up to it at all over the last 6 months. Even though I’ve been living with it for a while, I have only recently really figured out why I don’t like it.

What does the Touchbar do?

One of the functions of the Touchbar, of course, is serving as a replacement for the mechanical function keys at the top of the keyboard. It can also do other things, like acting as a slider control for brightness, or quickly allowing you to quickly select elements from a list. Interestingly, it’s the “replacement for function keys” part of the functionality that gets most of the ire, and I think this is useful for figuring out where the design fails.

What is a “button” on a screen, anyway?

Back in the dark ages of the 1980s, when the world was just getting used to the idea of the Graphical User Interface, the industry gradually settled on a series of interactions, largely based on the conventions of the Macintosh UI. Among other things, this means “push buttons” that highlight when you click the mouse button on them, but don’t actually take an action until you release the mouse button. If you’ve used a GUI that’s takes actions immediately on mouse-down, you might have noticed that they feel a bit “jumpy”, and one reason the Mac, and Windows, and iOS (mostly)  perform actions on release is exactly because action on mouse-down feels “bad”.

Why action on release is good:

Feedback — When you mouse-down, or press with your finger, you can see what control is being activated. This is really important to give your brain context for what happens next. If there’s any kind of delay before the action completes, you will see that the button is “pressed”, and know that your input was accepted. This reduces both user anxiety, and double-presses.

Cancelability — In a mouse-and-pointer GUI, you can press a button, change your mind, and move the mouse off before releasing to avoid the action. Similar functionality exits on iOS, by moving your finger before lifting it. Even if you hardly ever use this gesture, it’s there, and it promotes a feeling of being in control.

Both of these interaction choices were made to make the on-screen “buttons” feel and act more like real buttons in the physical world. In the case of physical buttons or keyswitches, the feedback and the cancelability are mostly provided by the mechanical motion of the switch. You can rest your finger on a switch, look at which switch it is, and then decide that you’d rather do something else and take your finger off, with no harm done. The interaction with a GUI button isn’t exactly the same, but it provides for “breathing space” in your interaction with the machine, which is the important thing.

The Touchbar is (mostly) action on finger-down

With a very few exceptions [maybe worth exploring those more?], the Touchbar is designed to launch actions on finger-down. This is inconsistent with the rest of the user experience, and it puts a very high price on having your finger slip off of a key at the top of the keyboard. This is exacerbated by bad decisions made by third-party developers like Microsoft, who ridiculously put the “send” function in Outlook on the Touchbar, because if there was ever anything I wanted to make easier, it’s sending an email before I’m quite finished with it.

How did it end up working that way?

I’m not sure why the designers at Apple decided to make things work that way, given their previous experience with GUI design on both the Mac and iOS. If I had to take a guess, the logic might have gone something like this:

The Touchbar is, essentially, a replacement for the top row of keys on the keyboard. Given that the majority of computer users are touch-typists, then it makes sense to have the Touchbar buttons take effect immediately, in the same way that the physical keyswitches do. Since the user won’t be looking at the Touchbar anyway, there’s no need to provide the kind of selection feedback and cancelability that the main UI does.

There are a couple of places where this logic goes horribly wrong. First off, a whole lot of people are not touch typists, so that’s not necessarily the the right angle to come at this from. Even if they were, part of the whole selling point of the Touchbar is that it can change, in response to the requirements of the app. So you’re going to have to look at it some of the time, unless you’ve decide to lock it into “function key only” mode. In which case, it’s a strictly-worse replacement for the keys that used to be there, and you’re not getting the benefits of the reconfigurability.

Even if you were going to use the Touchbar strictly as an F-key replacement, it still doesn’t have the key edges to let you know whether you’re about to hit one key or two, so you’ll want to look at what you’re doing anyway. I know there are people out there who use the function keys without looking at them, but the functions there are rarely-enough used that I suspect the vast majority of users end up having to look at the keyboard anyway, in order to figure out which one is the “volume up” key, or at least the keyboard-brightness controls.

How can Apple fix this?

First, and primarily, make it less-likely for users to accidentally activate functions on the Touchbar. I think that some amount of vertical relief could make this failure mode less-likely, though I’m not sure if the Touchbar should be higher or lower than it is now. I have even considered trying to fabricate a thin divider to stick to the keyboard to keep my finger away from accidentally activating the “escape” key, which is my personal pet-peeve with the touch bar.

A better solution to that problem is probably to include some amount of pressure-sensitivity and haptic feedback. The haptic home button on the iPhone 7 does a really good job of providing satisfying feedback without any visuals, so we know thiscan work well. Failing that, at least some way to bail out of hitting the Touchbar icons would be worth pursuing - possibly making them less senstive to grazing contact, though that would increase the cases where you didn’t activate a button while actually trying to.

Another option would be bringing back the physical function keys. This doesn’t necessarily mean killing the Touchbar, but maybe just moving it away from the rest of the keyboard a bit. This pretty much kills the “you can use it without taking your eyes off the screen or your hands off the home row” argument, but I’m really not convinced that’s at all realistic, unless you only ever use one application.

So, is Apple going to do anything to address these issues?

Honestly? You can never tell with them. Apple’s history of pushing customers forward against their will (see above) argues that they’ll pursue this for a good while, even if it’s clearly a bad idea. On the other hand, the pressure-sensitivity option seems like the kind of thing they might easily decide to add all by themselves. In the meantime, I’ll be working out my Stick On Escape Key Guard…