Tuesday, October 17, 2017

"Responsible Encryption" - what does that mean?

This weekend I read this excellent article by Alex Gaynor responding to Deputy Attorney General Rod Rosenstein's remarks on encryption to two different audiences last week. Please do go and read it when you get a chance, as it delves into the sadly common tactic of pointing to a bunch of scary criminal incidents, then saying "unbreakable encryption enables criminals and terrorists", without presenting any evidence that those crimes were enabled by encryption technology, or that law enforcement officers were actually hampered in their investigations by encryption.

In fact, in the case of the FBI, Apple, and the San Bernardino shooter, AG Rosenstein repeats all of the same false narrative that we've been presented with before - that the shooter's phone possibly contained vital information, that Apple "could" decrypt the information, and that they fought the FBI's legal attempts to force them to do so. Read my previous blog post (linked above) for background on that line of argument, and how the FBI willfully twists the facts of the case, to try to get something much more far-reaching than what they claim to want.

One thing not addressed directly in Alex's article is the frustration that the FBI and other law enforcement  officials have expressed over the inability to execute a legal search warrant, when they're faced with a locked phone, or a communications system that provides end-to-end encryption.

From Rosenstein's remarks to the Global Security Conference
We use the term “responsible encryption” to describe platforms that allow police to access data when a judge determines that compelling law enforcement concerns outweigh the privacy interests of a particular user.  In contrast, warrant-proof encryption places zero value on law enforcement.  Evidence remains unavailable to the police, no matter how great the harm to victims.
First, what a bunch of emotionally-charged words. And again we see the disconnect between what the FBI and other agencies say that they want (a way to unlock individual phones), and what they seem to keep asking for (a key to unlock any phone they can get their hands on).

But the man does have a point - there is some value to society in the FBI being able to execute a valid search warrant against someone's phone, or to "tap" the communications between known criminals. And I think he's also right that that sort of access is not going to be provided if the free market is allowed to set the rules. It'll never be in Apple's or any individual customer's interest to make it easier to access a locked phone. So, it'll come down to a matter of legislation, and I think it's worth the tech folks having this conversation before Congress sits down with a bill authored by the FBI and the NSA to try to force something on us.

The encryption-in-flight question is very complicated (and crypto protocols are hard to get right - see the recent KRACK security vulnerabilities), so I'll leave that for a future post. I do believe that there are reasonable ways for tech companies to design data-at-rest encryption that is accessible via a court order, but maintains reasonably-good security for customers. Here's a sketch of how one such idea might be implemented:

On-device Key Escrow

Key escrow 
The basic idea of key escrow is that there can be two keys for a particular piece of encrypted data - one key that the user keeps, and one that is kept "in escrow" so another authorized agent can access the data, if necessary. The ill-fated Clipper Chip was an example of such a system. The fatal flaw of Clipper (well, one of them) is that it envisioned every single protected device would have its secondary key  held securely by the government to be used in case of a search warrant being issued. If Clipper had ever seen broad adoption, the value of that centralized key store would have been enormous, both economically and militarily. We're talking a significant fraction of the US GDP, probably trillions of dollars. That would have made it the #1 target of thieves and spies across the world.

Eliminating central key storage
But the FBI really doesn't need the ability to decrypt every phone out there. They need the ability to decrypt specific phones, in response to a valid search warrant. So, how about storing the second key on the device itself? Every current on-device encryption solution that I know of provides for the option of multiple keys. And in fact, briefly getting back to the San Bernardino shooter's phone, if the owners of that phone (San Bernardino County) had had a competent IT department, they would have set up a second key that they could then have handed over to the FBI, neatly avoiding that whole mess with suing Apple.

You could imagine Apple generating a separate "law enforcement" key for every phone, and storing that somewhere, but that has all the same problems as the Clipper central key repository, just on a slightly smaller scale. So those keys need to stored separately. How about storing them on the device itself?

Use secure storage
Not every phone has a "secure enclave" processor like the iPhone, but it's a feature that you'll increasingly see on newer phones, as Apple and other manufacturers try to compete on the basis of providing better privacy protection to their customers. The important feature of these processors is that they don't allow software running on the phone to extract the stored keys. This is what keeps the user's data secure from hackers. So, if the key is stored in there, but the phone software can't get it out, how will the FBI get the key?

Require physical access
My preferred solution would be for the secure enclave to have a physically-disconnected set of pins that can be used just for extracting the second key. In order to extract the key, you'd need to have physical access to the device, disassemble it, and solder some wires on it. This is, I think, sufficiently annoying that nobody would try to do it without getting a warrant first.

It also means that nobody can search your phone without taking it out of your possession for a good long while. This seems like a reasonable trade-off to me. If someone executes a search warrant on your house, you'll certainly know about it. There's such a thing as "sneak and peek" warrants, or delayed-notice warrants, where police sneak in and search your home while you're not there, but I'm not particularly interested in solving that problem for them.

Is this a perfect solution? Of course not. But I think something like this is a reasonable place to start when discussing law enforcement access to personal electronics. And I think we want to have this conversation sooner, rather than later. What do you think?

Monday, October 02, 2017

The "Just Smart Enough" House

Less Architectural Digest, more "This is our home"

We've been doing some remodeling on our house, and the overarching theme of the renovations has been "make this house convenient for real humans to live in". When we bought the house, it was "perfect" in one sense - the house is broken up into two sections, with a central courtyard between, and we were looking for a place where my Father-in-law could come live with us, and still have some space to himself and some privacy.

In many other respects, it was a wildly-impractical house. There's a sad story there, of a couple who fall into and out of love during a remodel, of a mother who overruled the architect in a few critical ways, of a home that was left unfinished when the couple living there split up, and of a house split (illegally) into two units to try to keep it, by supplementing income via renting out the back. 

The end result was a house that certainly looks "fancy", in that it's got a Great Room with a wall entirely filled up by windows and sliding doors, a big fireplace faced in Travertine, and a ridiculous number of doors to the outside, for that "indoor/outdoor living" feeling. Seriously, there are 11 doors to the outside, not including the garage door. Other than being slightly unfinished, it could totally have been a house featured in Architectural Digest.

But when you're living there, you start to notice some of the compromises. I don't think I've ever lived in a house that didn't have a coat closet before. Or a broom closet. Or a linen closet.  Hence the remodel, the first part of which was just turning the illegal 2nd unit into a more-reasonable bedroom suite for Bob, and adding some damn storage.

We added a bunch more storage into the Great Room, and that meant adding new electrical circuits for new under-cabinet and in-cabinet lighting. And because I'm a total nerd, that meant researching Smart Switches to control all of the new lighting (and ideally move some of the more-inconvenient switches to a better location).

Who do you trust?

I pretty quickly settled on getting my smart switches from an electrical equipment manufacturer, rather than some startup "home automation" company. I really, really don't want my house to burn down, and while I have no reason to think that the quality of the zillions of Wi-Fi enabled switches on Amazon.com is anything but excellent, I felt more-comfortable going with a company that has a hundred years or so of experience with not burning people's houses down.

Lutron vs Leviton

(that really sounds like a super-hero movie, doesn't it?)

Lutron and Leviton are two of the largest electrical fixture manufacturers, and choosing between one or the other when buying a regular switch or receptacle is mostly just a matter of which brand your local hardware store carries, and whether or not you want to save $0.25 by buying the store brand.

In the "Home Automation" arena, they each have a variety of solutions, ranging from giant panel-based systems that you're expected to put in a closet somewhere and have installed by a "trained integrator", to simpler systems which are aimed at the DIY market.

You can go all-in, or you can just put a toe in

It didn't take long for me to decide that the fancier integrated systems were not really what I wanted. First off, they're fairly expensive, though the expense looks a little less extreme once you start comparing the per-switch cost of the smart switches vs the centralized version. But ultimately, I didn't really want to deal with a "system integrator" setting the thing up (though apparently it's very easy to get certified by Lutron if you're a licensed electrician, which I'm not). Also, nobody had anything good to say about the phone apps that were available for these systems. And finally, the high-end systems are all about providing a touch pad interface, to give your home that proper Jetsons look. I have no interest in having touch screens mounted on the wall in every room, so that was more of a downside for me, than an attraction. The stand-alone switches from either vendor look more-or-less like standard Decora-style dimmers.

In the consumer-focused lines, there are some interesting differences between the two companies. Leviton's consumer products are mostly compatible with the Z-Wave standard, which means they work with third-party smart home hubs. The reviews online for the Smart Things and Wink hubs weren't particularly encouraging to me, so that was a bit of a bummer.

The Lutron stuff uses a proprietary wireless protocol, and they sell their own hub. The Caseta hub (Lutron's hub) seemed to actually get pretty good reviews. It isn't as capable as the Smart Things hub but, and this was pretty critical for me - it does connect to HomeKit, Apple's home automation system (it also works with Amazon's Alexa and the Google Home device). So, we went with the Lutron Caseta stuff, because it's easy to use, looks reasonable in our house and is available at both Home Depot and Lowes, as well as the local electrical supply store.

Hardware from the hardware store, software from a software company

The connection to HomeKit means that even though the Caseta hub isn't as full-featured as some of the other smart home hubs, I don't really need to care. We're pretty much an all-Apple shop here at Casa de Bessey, so knowing that I could control all of the things attached to the Caseta hub from my phone, using Apple's Home app, is a pretty big draw for me. 

I know it's the 21st century, and everybody needs to have an App, but that doesn't mean every application is equally well-made. If there's a feature that I really "need", and it's not available in the standard software that comes with the Caseta, I could (at least in theory) set up an Apple TV or an iPad as an always-on HomeKit hub, and write my own damn software to run on it.

HomeKit will likely continue to gain new features over the years, so I may never need to do anything custom. But if I do, it's nice to know that I can work with familiar tools and environment, rather than struggling with some obscure system provided by the switch manufacturer.

The Caseta Wireless experience

We're a couple of months into using the Caseta hardware, and here's how it's been going so far.

The Good

Dimmers everywhere
One thing I hadn't really thought about before doing this work is that the dimmer-switch version of the Caseta switches is almost the same price as the plain switch version. We were in the process of gradually replacing our CFL bulbs with LED bulbs anyway, so we've gone with dimmer switches basically everywhere. The added flexibility of being able to set the brightness of any particular light is a nice upgrade.

The basics are all there
All of the fancy features in the world wouldn't be helpful if the basic features weren't there. The switches feel nice, they look nice, and they're easy to install. The software makes it easy to set up "scenes" where you can hit a single button, and set the brightness level of any sub-set of lights in the house.

HomeKit/Siri integration
It just works. There really is something magical about being able to say "Siri, turn out all the lights", and have the entire house go dark. Or indeed saying "Siri, turn out the light in Jeremy's Room" to my watch, and having that work on the first try.

Easy to setup and use
You basically plug in the hub, press a button to pair it with the app on your phone, and then start adding devices. The switches are direct replacements for your existing switches, so installing them is basically:
  1. Turn off the power
  2. Remove the old switch
  3. Wire the new switch/dimmer in
  4. Turn the power back on
The only slightly-complex cases are when you're replacing a three-way switch. The Caseta solution for 3-way (or more) situations is to install the switch at one end, then just install battery-powered remotes at any other location you need to control that light from. When you take out the 3-way, you do need to connect the "traveller" wires together, but they provide instructions online to show you how to do that.

You do have to add each individual switch to the app one at a time, which could get tedious in a large installation. It sure made things easy for the electricians, though - they just had to wire things up, without keeping track of which switch went in which room, since I would set all that up later after they left. From talking to them, I got the impression that the usual install of the higher-end stuff does involve writing down a bunch of "this circuit is on switch #12345" notes, then going back and fixing things later when setting up the controller.

Unless the WiFi in the house is down, I haven't had any problems connecting to the hub, either from the Lutron app (when adding new hardware) or from Apple's Home App. Because the individual switches all have controls on them, even in the case of catastrophic failure, you can still walk around and turn off everything "by hand". That's another point in favor of the non-centralized system, I guess.

Supports "enough" devices for my house
One of the big differences between the Caseta stuff and Lutron's next higher tier (Radio RA2), is the number of "devices" they support. Every switch, every dimmer, and every remote control is a "device" for these counts. Caseta only supports 50 devices. I haven't come anywhere close to the limit yet, but we haven't replaced every last switch in the house yet, either. I think we'll be over 40 once all of the switches I care about have been replaced. Our house is close to 2,000 square feet, so if your house is smaller than that, I doubt the limit will ever matter much. And here's where the connection to HomeKit also helps - if we ever do hit the device limit, I can buy another Caseta hub for $75, and have another 50 devices.

The Bad

Range and range extenders
The Caseta documentation says that every controlled device needs to be within 30 feet of the hub. In practice, the maximum reach is just a bit longer than that in our house, but not very much farther. You can extend the range of the system, by using a plug-in dimmer as a repeater. You can have exactly one repeater, which is another limitation compared to the higher-end systems, which support multiple repeaters. But again - if I ever did run into this in practice, I'd probably just get another hub, and have one for each end of the house, since the hubs really aren't all that expensive.

Pricing structure
Honestly, the way that Lutron prices this stuff makes almost no sense at all. You can buy various "kits" with a hub, a dimmer and a remote, or a hub and a few dimmers and remotes, or a hub and some plug-in dimmers. The individual kit components cost more separately, which is no surprise, but some of the prices are weirdly inverted - it costs more to buy just a dimmer than it does to buy the dimmer, a remote, and all of the trim hardware. I assume anybody who makes extensive use of this product line eventually ends up with a box full of unused remotes, but that's just slightly wasteful, not an actual problem.

Trigger configuration is very basic
The "smart" hub isn't very smart. You can bind particular remotes to particular switches, set up scenes, and do some very basic automation. A recent software update improved some of this so that you can now do some more scheduling.

But take, for example, the "arriving home" automation. I can set up a scene to activate when I arrive home. That's nice, but I can't actually set up a scene to activate when I'm the first one home, or the last to leave. HomeKit supports this, so that might be the thing that gets me to finally set up an Apple TV as a HomeKit hub. Or maybe I'll wait for the HomePod...

The Unknown

I haven't done a basic security audit on the Caseta hub, yet. That'll make a fun weekend project. The online component of the hub is protected by a user name and password, at least. And if I do get totally paranoid, I can always disconnect the hub from the internet, and route everything through an iOS HomeKit hub, which is likely to be more-secure.

What happens if Lutron decides to end-of-life the Caseta line? Will I still be able to get replacement parts, or a new hub if the old one breaks? For that matter, what if Apple stops supporting HomeKit, or removes the Lutron app from the App Store?

This is the problem with going with the proprietary solution. I am somewhat dependent on both Lutron and Apple staying in this business, and getting along with each other. The hub is basically unusable without the app, so that's definitely a concern. I suspect if Lutron found themselves in a situation where they could no longer provide the iOS app, they'd be motivated to provide another solution, or at the very least, a migration strategy to one of their other control hubs.

At the absolute worst-case scenario, the Caseta switches and the remote controls can be set up and paired to operate completely independently of the hub. I'd lose all of the "smart" features, but at least I'd still have working light switches.


Overall, this was a really great way to get my feet wet with "smartening up" my home. The increased control over the lights in the house is convenient, and actually helps make the house more livable. The potential downsides are limited by the design of the Caseta system, which gracefully falls back to  "no worse than just having old light switches", something which is not necessarily true of other connected home devices, like thermostats, which can have terrible failure modes.

If you're interested in adding some smarts to your home, I can definitely recommend the Caseta products. They're easy to set up and use, and have been very reliable for us so far.

Monday, September 25, 2017

Follow up: LockState security failures

I wrote a blog post last month on what your IoT startup can learn from the LockState debacle. In the intervening weeks, not much new information has come to light about the specifics of the update failure, and it seems from their public statements that LockState thinks it's better if they don't do any kind of public postmortem on their process failures, which is too bad for the rest of us, and for the Internet of Things industry, in general - if you can't learn from others' mistakes, you (and your customers) might have to learn your own mistakes.

However, I did see a couple of interesting articles in the news related to LockState. The first one is from a site called CTOVision.com, and it takes a bit more of a business-focused look at things, as you might have expected from the site name. Rather than looking at the technical failures that allowed the incident to happen, they take LockState to task for their response after the fact. There's good stuff there, about how it's important to help your customers understand possible failure modes, how you should put the software update process under their control, and how to properly respond to an incident via social media.

And on The Parallax, a security news site, I found this article, which tells us about another major failure on the part of LockState - they apparently have a default 4-digit unlock code set for all of their locks from the factory, and also an 8-digit "programming code", which gives you total control over the lock - you can change the entry codes, rest the lock, disable it, and disconnect it from WiFi, among other things.

Okay, I really shouldn't be surprised by this at this point, I guess - these LockState guys are obviously a bit "flying by the seat of your pants" in terms of security practice, but seriously? Every single lock comes pre-programmed with the same unlock code and the same master programming code?

Maybe I'm expecting too much, but if a $2.99 cheap combination lock from the hardware store comes with a slip of paper in the package with its combination printed on it, maybe the $600 internet-connected smart lock can do the same? Or hell, use a laser to mark the master combination on the inside of the case, so it's not easily lost, and anyone with the key and physical access can use the code to reset the lock, in the (rare) case that that's necessary.

Or, for that matter - if you must have a default security code for your device (because your manufacturing line isn't set up for per-unit customization, maybe?), then make it part of the setup process to change the damn code, and don't let your users get into a state where they think your product is set up, but they haven't changed the code.

It's easy to fall into the trap of saying that the user should be more-aware of these things, and they should know that they need to change the default code. But your customers are not typically security experts, and you (or at least some of your employees) should be security experts. You need to be looking out for them, because they aren't going to be doing a threat analysis while installing their new IoT bauble.

Monday, September 18, 2017

A short rant on XML - the Good, the Bad, and the Ugly

[editor's note: This blog post has been in "Drafts" for 11 years. In the spirit of just getting stuff out there, I'm publishing it basically as-is. Look for a follow-up blog post next week with some additional observations on structured data transfer from the 21st century]

So, let's see if I can keep myself to less than ten pages of text this time...

XML is the eXtensible Markup Language. It's closely related to both HTML, the markup language used to make the World Wide Web, and SGML, a document format that you've probably never dealt with unless you're either a government contractor, or you used the Internet back in the days before the Web. For the pedants out there, I do know that HTML is actually an SGML "application" and that XML is a proper subset of SGML. Let's not get caught up in the petty details at this point.

XML is used for a variety of different tasks these days, but the most common by far is as a kind of "neutral" format for exchanging structured data between different applications. To keep this short and simple, I'm going to look at XML strictly from the perspective of a data storage and interchange format.

The good

Unicode support

XML Documents can be encoded using the Unicode character encoding, which means that nearly any written character in any language can be easily represented in an XML document.

Uniform hierarchical structure

XML defines a simple tree structure for all the elements in a file - there's one root element, it has zero or more children, which each have zero or more children, ad infinitum. All elements must have an open and close tag, and elements can't overlap. This simple structure makes it relatively easy to parse XML documents.

Human-readable (more or less)

XML is a text format, so it's possible to read and edit an XML document "by hand" in a text editor. This is often useful when you're learning the format of an XML document in order to write a program to read or translate it. Actually writing or modifying XML documents in a text editor can be incredibly tedious, though a syntax-coloring editor makes it easier.

Widely supported

Modern languages like C# and Java have XML support "built in" in their standard libraries. Most other languages have well-supported free libraries for working with XML. Chances are, whatever messed up environment you have to work in, there's an XML reader/writer library available.

The bad

Legacy encoding support

XML Documents can also be encoded in whatever wacky character set your nasty legacy system uses. You can put a simple encoding="Ancient-Elbonian-EBCDIC" attribute in the XML declaration element, and you can write well-formed XML documents in your favorite character encoding. You probably shouldn't expect that anyone else will actually be able to read it, though.

Strictly hierarchical format

Not every data set you might want to interchange between two systems is structured hierarchically. In particular, representing a relational database or an in-memory graph of objects is problematic in XML. A number of approaches are used to get around this issue, but they're all outside the scope of standardized XML (obviously), and different systems tend to solve this problem in different ways, neatly turning the "standardized interchange format for data" into yet another proprietary format, which is only readable by the software that created it.

XML is verbose

A typical XML document can be 30% markup, sometimes more. This makes it larger than desired in many cases. There have been several attempts to define a "binary XML" format (most recently by the W3C group), but they really haven't caught on yet. For most applications where size or transmission speed is an issue, you probably ought to look into compressing the XML document using a standard compression algorithm (gzip, or zlib, or whatever), then decompressing it on the other end. You'll save quite a bit more that way than by trying to make the XML itself less wordy.

Some XML processing libraries are extremely memory-intensive

There are two basic approaches to reading an XML document. You can read the whole thing into memory and re-construct the structure of the file into a tree of nodes in memory, and then the application can use standard pointer manipulation to scan through the tree of nodes, looking for whatever information it needs, or further-transforming the tree into the program's native data structures. One XML processing library I've used loaded the whole file into memory all at once, then created a second copy of all the data in the tags. Actually, it could end up using up to the size of the file, plus twice the combined size of all the tags.

Alternatively, the reader can take a more stream-oriented approach, scanning through the file from beginning to end, and calling into the application code whenever an element starts or ends. This can be implemented with a callback to your code for every tag start/end, which gives you a simple interface, and doesn't require holding large amounts of data in memory during the parsing.

No random access

This is just fallout from the strict hierarchy, but it's extremely labor intensive to do any kind of data extraction from a large XML document. If you only want a subset of nodes from a couple levels down in the hierarchy, you've still got to step your way down there, and keep scanning throught the rest of the file to figure out when you've gone up a level.

The ugly

By far, the biggest problems with XML don't have anything to do with the technology itself, but with the often perverse ways in which it's misapplied to the wrong problems. Here are a couple of examples from my own experience.

Archiving an object graph, and the UUID curse

XML is a fairly reasonable format for transferring "documents", as humans understand them. That is, a primarily linear bunch of text, with some attributes that apply to certain sections of the text.

These days, a lot of data interchange between computer programs is in the form of relational data (databases), or complex graphs of objects, where you'll frequently need to make references back to previous parts of the document, or forward to parts that haven't come across yet.

The obvious way to solve this problem is by having a unique ID that you can reference to find one entity from another. Unfortunately, the "obvious" way to ensure that a key is unique is to generate a globally-unique key, and so you end up with a bunch of 64-bit or 128-bit GUIDs stuck in your XML, which makes it really difficult to follow the links, and basically impossible to "diff' the files, visually.

One way to avoid UUID proliferation is to use "natural unique IDs, if your data has some attribute that needs to be unique anyway.

What's the worst possible way to represent a tree?

I doubt anybody's ever actually asked this question, but I have seen some XML structures that make a pretty good case that that's how they were created. XML, by its heirarchical nature, is actually a really good fit for hierarchical data. Here is one way to store a tree in XML:

<pants color="blue" material="denim">
  <pocket location="back-right">
    <wallet color="brown" material="leather">  
      <bill currency="USD" value="10"></bill>  
      <bill currency="EURO" value="5"></bill>  

And here's another:


So, which one of those is easier to read? And did you notice that I added another 5 Euro to my wallet, while translating the structure? Key point here: try to have the structure of your XML follow the structure of your data.

Monday, September 04, 2017

Post-trip review: Telestial “International Travel SIM”

For our recent trip to Europe, Yvette and I tried the seasoned-traveler technique of swapping out the SIM cards in our phones, rather than paying AT&T’s fairly extortionate international roaming fees. It was an interesting experience, and we learned a few things along the way, which I’ll share here.

We used Telestial, which is apparently Jersey Telecom. Not New Jersey: JT is headquartered in the Jersey Isles, off the coast of Britain. JT/Telestial's claim to fame is really their wide roaming capability. Their standard “I’m traveling to Europe” SIM is good pretty much across all of Europe. It definitely worked just fine in Germany, Denmark, Sweden, Austria, Estonia, Russia, and Finland. They claim a few dozen more, and I don’t doubt it works equally-well in those countries.

Why not just get a local SIM in the country you’re visiting? Isn’t that cheaper?

People who frequently travel overseas will often just pop into a phone retailer, or buy a SIM from a kiosk in the airport. Based on the comparison shopping I did while we were traveling, this is definitely cheaper tan the Telestial solution. However, it’s not at all clear in many cases how well an Austrian SIM is going to work in Finland (for example), and just how much you’ll be paying for international roaming.

So, I think if you’re traveling to just one country (especially one with really cheap mobile phone service costs), buying a local SIM is definitely the way to go. I didn’t really want to keep updating people with new contact numbers every other day as we switched countries. I might look into one of the “virtual phone number” solutions, like Google Voice, for the next multi-country trip. Being able to give people one number, and still roam internationally, seems like it’d be useful, but I don’t know what the restrictions are.

What does setup look like?

First of all, you need a compatible phone. Not all American mobile phones will wrk in Europe. You can check the technical specs for the particular phone mode you have, to see which radio frequencies it supports. Alternatively, you can buy any iPhone more-recent than the iPhone 4s, all of which are “world phones”, as far as I know. Verizon and Sprint still some phones that are CDMA-only, which means they can’t work anywhere but the USA, but most CDMA smartphones also have a GSM SIM slot, so it’s worth taking a look to see, even if you’re on Verizon.

Secondly, your phone needs to not be “activation locked” to a particular carrier. Most phones sold on contract in the US are set up this way, so you can’t just default on the contract and get a phone you can use on another network. Ideally, your phone would get unlocked automatically at the end of the contract, but US law doesn’t require this, so you’ll need to request an unlock from your carrier. AT&T has made this process a lot easier since the last time I tried to do it, which is great, because I forgot to check that Yvette’s phone was unlocked before we left. I did manage to make the unlock request from the other phone while we were in a taxi on the freeway in Austria, which is a testament to how easy this stuff is these days, I guess.

Assuming you have a compatible phone, then process is basically power off phone, pop out the SIM tray with a paper clip, swap the SIMS, turn on the phone, and wait. For the Telestial SIM, you probably really want to activate it and pre-pay for some amount of credit before your trip, which is easy to do on their website.

What kind of plan did you get?

We had a pre-paid fixed allowance for calls and text, and 2GB of data for each phone. Calls were $0.35 a minute, and texts were $0.35 each. Pre-loading $30 on the phone was enough for and hour and a half of phone calls, or a fairly large number of texts. When we had data coverage, we could use iMessage or WhatsApp fro basically free text messages. I don't know whether Voice Over LTE actually worked, and if it avoided the per-minute charge, since we just didn't call that much.

Did you actually save any money?

Compared to what it cost to pay AT&T for an international roaming plan while Yvette was in the UK for a month, we definitely did save a substantial amount of money. This is true even with the crazy cruise ship issue (see below). Without that, it would have been massively less-expensive. And compared to AT&T’s “no plan” international rates (which I got to try out in Israel), there’s absolutely no comparison.

What happened on the cruise ship?

Most of the time, the cruise ship did not have cell service. Which was pretty much fine - we had good coverage when we were in port, and there was WiFi on the ship, if we wanted to pay for it (we did not). We had, on two occasions, a weird thing were our phones managed to connect to a shipboard cell network (maybe on another passing ship?), where we were charged truly outrageous roaming data rates - several dollars a megabyte, which obviously burned through the $30 in prepaid credit really fast. On the other hand, prepaid means that we didn't lose more than $30 (twice, so $60 total). I still don't know exactly what happened there, but if I do this again sometime, I'm going to keep data turned off on the phone when not in port.

The good:

  1. Pre-paid, which limits crazy bills
  2. Easy setup
  3. Easy to recharge, either over the phone, or using the app
  4. Per-minute and per-text rates not too terrible
  5. Works pretty much anywhere in Europe

The bad:
  1. Cruise ship roaming will use up your data allowance right quick
  2. Fixed recharge sizes, and monthly expiration
  3. Forwarding doesn’t work for texts
  4. Some weirdness with “from” numbers on texts (apparently Austria-only)?
  5. No tethering
  6. Email support non-responsive

Conclusion: would we do it again?

Overall, the process was fairly-painless, other than the cruise ship issue. If there’s a simple way to fix that, I’d have no problem doing this again. Otherwise, I’d have to recommend during cell data off when you’re not in port, to avoid accidentally costing yourself a bunch of money.

Monday, August 28, 2017

A brief history of the Future

A brief history of the Future

Lessons learned from API design under pressure

It was August of 2009, and the WebOS project was in a bit of trouble. The decision had been made to change the underlying structure of the OS from using a mixture of JavaScript for applications, and Java for the system services, to using JavaScript for both the UI and the system services. This decision was made for a variety of reasons, primarily in a bid to simplify and unify the programming models used for application development and services development. It was hoped that a more-familiar service development environment would be helpful in getting more developers on board with the platform. It was also hoped that by having only one virtual machine running, we'd save on memory.

Initially, this was all built on top of a customized standalone V8 JavaScript interpreter, with a few hooks to system services. Eventually, we migrated over to Node.js, when Node was far enough along that it looked like an obvious win, and after working with the Node developers to improve performance on our memory-limited platform.

The problem with going from Java to JavaScript

As you probably already know, despite the similarity in the names, Java and JavaScript are very different languages. In fact, the superficial similarities in syntax were only making things harder for the application and system services authors trying to translate things from Java to JavaScript.

In particular, the Java developers were used to a multi-threaded environment, where they could spin off threads to do background tasks, and have them call blocking APIs in a straightforward fashion. Transitioning from that to JavaScript's single-threaded, events and callbacks model was proving to be quite a challenge. Our code was rapidly starting to look like a bunch of "callback hell" spaghetti.

The proposed solution

As one of the most-recent additions to the architecture group, I was asked to look into this problem, and see if there was something we could do to make it easier for the application and service developers to write readable and maintainable code. I went away and did some research, and came back with an idea, which we called a FutureThe Future was a construct based on the idea of a value that would be available "in the future". You could write your code in a more-or-less straight-line fashion, and as soon as the data was available, it'd flow right through the queued operations.

If you're an experienced JavaScript developer, you might be thinking at this point "this sounds a lot like a Promise", and you'd be right. So, why didn't we use Promises? At this point in history, the Proamises/A spec was still in active discussion amongst the CommonJS folks, and it was not at all obvious that it'd become a popular standard (and in fact, it took Promises/A+ for that to happen). The Node.js core had in fact just removed their own Promises API in favor of a callback-based API (this would have been around Node.js v0.1, I think).

The design of the Future

Future was based on ideas from SmallTalk(promise), Java(future/promise), Dojo.js(deferred), and a number of other places. The primary design goals were:
  • Make it easy to read through a piece of asynchronous code, and understand how it was supposed to flow, in the "happy path" case
  • Simplify error handling - in particular, make it easy to bail out of an operation if errors occur along the way
  • To the extent possible, use Future for all asynchronous control flow
You can see the code for Future, because it got released along with the rest of the WebOS Foundations library as open source for the Open WebOS project.

My design brief looked something like this:
A Future is an object with these properties and methods:
.result The current value of the Future. If the future does not yet have a value, accessing the result property raises an exception. Setting the result of the Future causes it to execute the next "then" function in the Future's pipeline.  
.then(next, error) Adds a stage to the Future's pipeline of steps. The Future is passed as a parameter to the function "next". The "next" function is invoked when a value is assigned to the future's result, and the (optional) "error" function is invoked if the previous stage threw an exception. If the "next" function throws an exception, the exception is stored in the Future, and will be re-thrown if the result of the Future is accessed.
This is more-or-less what we ended up implementing, but the API did get more-complicated along the way. Some of this was an attempt to simplify common cases that didn't match the initial design well. Some of it was to make it easier to weld Futures into callback-based code, which was ultimately a waste of time, in that Future pretty much wiped out all competing approaches to flow control. And one particular set of changes was thrown in at the last minute to satisfy a request that should just have been denied (see What went wrong, below).

What went right

We shipped a "minimal viable product" extremely quickly

Working from the initial API design document, Tim got an initial version of Future out to the development team in just a couple of days, which had all of the basics working. We continued to iterate for quite a while afterwards, but we were able to start the process of bring people up to speed quickly.

We did, in fact, eliminate "callback hell" from our code base

After the predictable learning curve, the former Java developers really took to the new asynchronous programming model. We went from "it sometimes kind of works", to "it mostly works" in an impressively-short time. Generally speaking, the Future-based code was shorter, clearer, and much easier to read. We did suffer a bit in ease of debugging, but that was as much due to the primitive debugging tools on Node as it was to the new asynchronous model.

We doubled-down on our one big abstraction

Somewhat surprisingly to me, the application teams also embraced Futures. They actually re-wrote significant parts of their code to switch over to Future-based APIs at a deeper level, and to allow much more code sharing between the front end and back end of the Mail application, for example. This code re-use was on the "potential benefits" list, but it was much more of a win than anyone originally expected.

We wrote a bunch of additional libraries on top of Future, for all sorts of asynchronous tasks - for file I/O, database access, network and telecoms, for the system bus (dbus) interface, basically anything that you might have wanted to access on the platform, was available as a Future-based API.

The Future-based code was very easy to reason about in the "happy path" case

One of the best things about all this, is that with persistent use of Futures everywhere, you could write code that looked like this:
downloadContacts().then(mergeContacts).then(writeNewContacts).then(success, error)
Most cases were a bit more-complicated than that (often using inline functions), but the pattern of  only handling the success case, and just letting errors propagate, was very common. And in fact, the "error" case was, as often as not, logging a message and rescheduling the task for later.

The all-or-nothing error propagation technique fit (most of) our use cases really well

The initial use case of the Future was for a WebOS feature called "Synergy". This was a framework for combining data from multiple sources into a single uniform format for the applications. So, for example, you could combine your contacts from Facebook, Google, and Yahoo into a single address book list, and WebOS would automatically de-dubplicate and link related contacts, and sync changes made on the phone to the proper remote service that the data originally came from. Similarly, all of your incoming e-mail went into the same "Mail" database on-device.

In a multi-stage synchronization process like this, there are all sorts of ways that the operation can fail - the remote server might be down, or the network might be flaky, or the user might decide to put the phone into airplane mode in the middle of a sync operation. In the vast majority of cases, we didn't actually care what the error was, just that an error had occurred. When an error happened, the usual response was to leave the on-phone data the way it was, and try again later. In those cases where "fuck it, I give up" was not the right error handling strategy, the rough edges of the error-handling model were a bit easier to see.

What went wrong

The API could have been cleaner/simpler

It didn't take long before we were adding convenience features to make some of the use cases simpler. Hence, the "whilst" function on Future, which was intended to make it easier to iterate over a function that returned Futures. There were a couple of other additions that also got a very small amount of use, and could have easily been replaced by documentation of the "right" way to do things.

Future had more-complex internal state than was strictly needed

If you look at Promises, they've really only got the minimal amount of state, and you chain functions together by returning a Promise from each stage. Instead of having lots and lots of Futures linked together to make a pipeline of operations, Future was the pipeline. I think that at some level this both decreased heap churn by not creating a bunch of small objects, and it probably made it somewhat easier to debug broken pipelines (since all of the stages were visible). Obviously, if we'd known that Promises were going to become a big thing in JavaScript, we would have stayed a lot closer to the Promises/A spec.

Error handling was still a bit touchy, for non-transactional cases

If you had to write code that actually cared about handling errors, then the "error" function was actually located in a pretty terrible place, you'd have all these happy-path "then" functions, and one error handler in the middle. Using named functions instead of anonymous inline functions helped a bit with this, but I would still occasionally get called in to help debug a thrown exception that the developer couldn't find the source for.

It would have been really nice to have a complete stack trace for the exception that was getting re-thrown, but we unfortunately didn't have stack traces available in both the application context and the service context. In the end, "thou shalt not throw an exception unless it's uniquely identifiable" was almost sufficient to resolve this.

I caved on a change to the API that I should have rejected

Fairly late in the process, someone came to me and said "I don't like the 'magic' in the way the result property works. People don't expect that accessing a property will throw an exception, so you should provide an API to access the state of the Future via function calls, rather than property access".  At this point, we had dozens of people successfully using the .result API, and very little in the way of complaints about that part of the design.

I agreed to make the addition, so we could "try it out" and see whether the functional API was really easier or clearer to use. Nobody seemed to think so, except for the person who asked for it. Since they were using it, it ended up having to stay in the implementation. and since it was in the implementation, it got documented, which just confused later users (especially third parties), who didn't understand why there were two different ways to accomplish the same tasks.

How do I feel about this, 8 years later?

Pretty good, actually. Absent a way to see into the future, I think we made a pretty reasonable decision with the information we had available. The Bedlam team did an amazing bit of work, and WebOS got rapidly better after the big re-architecturing. In the end, it was never quite enough to displace any of the major mobile OSes, but I still miss some features of Synergy, even today. After all the work Apple has done over the years to improve contact sync, it's still not quite as good (and not nearly as open to third parties) as our solution was.

Monday, August 21, 2017

What your Internet Of Things startup can learn from LockState

The company LockState has been in the news recently for sending an over-the-air update to one of their smart lock products which "bricked" over 500 of these locks. This is a pretty spectacular failure on their part, and it's the kind of thing that ought to be impossible in any kind of well-run software development organization, so I think it's worthwhile to go through a couple of the common-sense processes that you can use to avoid being the next company in the news for something like this.

The first couple of these are specific to the problem of shipping the wrong firmware to a particular model, but the others apply equally well to an update that's for the right target, but is fundamentally broken, which is probably the more-likely scenario for most folks.

Mark your updates with the product they go to
The root cause of this incident was apparently that LockState had produced an update intended for one model of their smart locks, and somehow managed to send that to a bunch of locks that were a different model. Once the update was installed, those locks were unable to connect to the Internet (one presumes they don't even boot), and so there was no way for them to update again to replace the botched update.

It's trivially-easy to avoid this issue, using a variety of different techniques. Something as simple as using a different file name for firmware for different devices would suffice. If not that, you can have a "magic number" at a known offset in the file, or a digital signature that uses a key unique to the device model. Digitally-signed firmware updates are a good idea for a variety of other reasons, especially for a security product, so I'm not sure how they managed to screw this up.

Have an automated build & deployment process
Even if you've got a good system for marking updates as being for a particular device, that doesn't help if there are manual steps that require someone to explicitly set them. You should have a "one button" build process which allows you to say "I want to build a firmware update for *this model* of our device, and at the end you get a build that was compiled for the right device, and is marked as being for that device.

Have a staging environment
Every remote firmware update process should have the ability to be tested internally via the same process end-users would use, but from a staging environment. Ideally, this staging environment would be as similar as possible to what customers use, but available in-company only. Installing the bad update on a single lock in-house before shipping it to customers would have helped LockState avoid bricking any customer devices. And, again - this process should be automated.

Do customer rollouts incrementally
LockState might have actually done this, since they say only 10% of their locks were affected by this problem. Or they possibly just got lucky, and their update process is just naturally slow. Or maybe this model doesn't make up much of the installed base. In any case, rolling out updates to a small fraction of the installed base, then gradually ramping it up over time, is a great way to ensure that you don't inconvenience a huge slice of your user base all at once.

Have good telemetry built into your product
Tying into the previous point, wouldn't it be great if you could measure the percentage of systems that were successfully updating, and automatically throttle the update process based on that feedback? This eliminates another potential human in-the-loop situation, and could have reduced the damage in this case by detecting automatically that the updated systems were not coming back up properly.

Have an easy way to revert firmware updates
Not everybody has the storage budget for this, but these days, it seems like practically every embedded system is running Linux off of a massive Flash storage device. If you can, have two operating system partitions, one for the "current" firmware, and one for the "previous" firmware. At startup, have a key combination that swaps the active install. That way, if there is a catastrophic failure, you can get customers back up and running without having them disassemble their products and send them in to you, which is apparently how LockState is handling this.

If your software/hardware allows for it, you can even potentially automate this entirely - have a reset watchdog timer that gets disabled at the end of boot, and if the system reboots more than once without checking in with the watchdog, switch back to the previous firmware.

Don't update unnecessarily
No matter how careful you are, there are always going to be some cases where a firmware update goes bad. This can happen for reasons entirely out of your control, like defective hardware that just happens to work with version A of the software, but crashes hard on version B.

And of course the easiest way to avoid having to ship lots of updates is sufficient testing (so you have fewer critical product defects to fix), and reducing the attack surface of your product (so you have fewer critical security issues that yo need to address on a short deadline.