Monday, November 13, 2017

Mergers: The Good, The Bad, and The Ugly

You've been acquired how many times?
In my career, I've been fortunate enough to have worked for a number of small software/hardware companies, several of which were absorbed by much larger companies. I though tit'd be interesting to compare and contrast some of the ways the various mergers went good and bad, and what acquiring companies might be able to learn from my experience.

Here's the timeline so far:

  1. I started working for NeXT Software in 1994, they were acquired by Apple in 1996.
  2. I left Apple in 1999 to work for Silicon Spice. They were acquired by Broadcom in 2000.
  3. Broadcom laid me off, and I went back to Apple for a while.
  4. I left Apple in 2005 to work at Zing Systems, which was acquired by Dell in 2007.
  5. I left Dell to go work at Palm in 2009. In 2010, Palm was acquired by Hewlett-Packard.
  6. Hewlett-Packard eventually sold the entire WebOS group to LG.
  7. I left LG to go work for Citrix on GoToMeeting. After 2 1/2 years, the GoToMeeting business was spun off and merged with LogMeIn, Inc.
So I've been part of 6 different merger/acquisition processes at this point, and I feel like I'm getting a feel for how you can tell when an acquisition is going to go well, as opposed to going poorly.

Why do big companies buy smaller companies?
When a big company acquires a smaller company, it can be for a variety of reasons. Sometimes it's to acquire a potential competitor, before they can get large enough to threaten the larger company. It can be an "acqui-hire", where they buy the smaller company strictly for its human resources, and have no real interest in the technology or products the smaller company has developed (this happens with social media companies frequently, because skilled developers are hard to find). Or, it can be a case of acquiring a new technology, and a team of experts in that technology, in order to either kick-start a new product, or to kick new life into an existing product. That last reason was the primary reason for all of the acquisitions I've been involved in.

What's the most-comon mistake acquiring companies make?
Understandably, big companies often look to smaller companies as an engine to drive innovation. There's a perception that small companies can move faster and be more nimble than larger companies. So there's often a desire to let the new acquisition run itself, as a sort of independent entity inside the larger company. Being hands-off seems like the obviously-right thing to do if you wanted increased agility to start with, but this is generally not as good of an idea as it'd seem at first blush.

Yes, you absolutely don't want to break up the functional team you just acquired, and spread them out willy-nilly throughout your company. You don't want to drag them into the bureaucracy and infighting that has marred all of your internal attempts at innovation. But guess what? If you don't make an effort to get them at least nominally integrated with the rest of the company, you will, at best, end up with an isolated group, who continue to do their one thing, but don't meaningfully contribute to your larger organization's bottom line. And the smaller group will also get none of the benefits of scale of being part of the larger group. It's lose-lose.

Examples of the Good, the Bad, and the Ugly
Tune in next Monday (and the Monday after that) for real-life tales of integrations gone well, gone poorly, and gone horribly wrong.

Monday, November 06, 2017

That delicate line between security and convenience

A key problem, maybe the key problem in software security is how to properly balance user convenience with security. Adding additional security to a system often includes extra work, more time, or other compromises from the end-user. And reasonable people can disagree about where the line is for the appropriate trade-off.

That iPhone camera permissions "flaw"
There was a brief flurry of articles in the news recently, talking about a "flaw" in iOS permissions which would allow applications to take your picture without you being aware. Typically, these were presented with click-bait headlines like:


or 


The blog post of the actual security researcher who raised this issue (Felix Krause) is substantially less-sensational:


It's good that this issue is getting some attention, but it's important to understand where we're coming from, what the actual issue is, and possible ways to mitigate it. As a quick aside, I find it annoying that the articles say "Google engineer". Yes, Krause works for Google, but this work is not coming out of his "day job", but rather his own work in security research. Also, Android has exactly this same problem, but it doesn't merit a blog post or worldwide news coverage, because apparently nobody expects even minimal privacy from Android devices.

How camera permissions work on iOS today
The current version of iOS asks the user for permission to use the camera the first time that an application tries to access it. After that, fi the application is running in the foreground, it can access the camera whenever it wants to, without any additional interaction. And typically, this is actually what the user wants.

It's convenient and fun to be able to use the built-in camera support in Facebook without having to answer "yes I do want to use the camera" each time that you choose to share a photo on social media. And replacements for the built-in camera app, like Instagram, Snapchat, and Halide, would be pretty much unusable if you had to answer a prompt Every. Single. Time. you wanted to take a photo.

How it used to work
Previous versions of iOS actually required applications to use the built-in camera interface to take pictures. You still only had to give permission once, but it was really obvious when the app was taking you picture, because the camera preview was right there in your face, taking over your screen. This design was widely criticized by app developers, because it made for a really jarring break in their carefully-crafted use experience to have the built-in camera appear, and they couldn't provide a preview that actually showed what was going to be captured (with the rise of photo filters, this is especially problematic).

At some point, Apple added the capability to capture photos and video, while presenting the app's  own interface. This makes for a more-cohesive experience for the end-user, and makes it possible for apps to preview what they're actually going to produce, filters, silly hats, and all. This is clearly a win for the app developers, and I'd argue it is also a win for the end-user, as they get a better experience with the whole picture taking process.

What's the actual privacy issue here?
I use Facebook to post photos and videos, sometimes. But I don't really want Facebook taking pictures of my face when I'm not using the camera features, and analyzing that data to better serve me content, including advertisements.

If I'm scrolling through my news feed, and Facebook is silently analyzing the images coming through the back camera, so that they can discover my location and serve me ads for whatever business I'm standing in front of, that's intrusive and creepy. If they're reading my facial expression to try to determine how I feel about the items in my news feed, that's even worse.

How Apple can better-inform users
I don't think anybody wants to go back to using the UIImagePicker interface, and I don't think anybody (except possibly security researchers) wants to have to affirmatively give permission every time an application wants to take a picture or video. One alternative that I like (and Krause mentions this in his initial blog) is some kind of persistent system UI element that indicates that the camera is on. Apple already does something similar with a persistent banner on the top of the screen when applications in the background are capturing audio (for VoIP communications). A little dot on the status area would go a long way, here.

It'd also be really nice to have a toggle in Preferences (or better, in Control Center) to disable the camera system-wide, so if you know you're heading somewhere that you shouldn't be taking pictures, you can temporarily disable the camera.

What users can do to better protect themselves
Obviously, just don't grant camera permission to applications that don't actually need them.  think most social network software falls into this category. Twitter and Facebook don't actually need to access my camera, so I have it disabled for both of them. If you actually DO use Facebook and Twitter to take pictures, then I guess you'll just need to be more aware of the tradeoffs.

If you "have to" enable camera access to certain apps, but you don't fully-trust them, there are honest-to-goodnes lens caps you can buy which will cover your iPhone camera when you're not using it. Or a piece of tape works. There's even specially-made tape tabs for just this purpose.


Tuesday, October 17, 2017

"Responsible Encryption" - what does that mean?

This weekend I read this excellent article by Alex Gaynor responding to Deputy Attorney General Rod Rosenstein's remarks on encryption to two different audiences last week. Please do go and read it when you get a chance, as it delves into the sadly common tactic of pointing to a bunch of scary criminal incidents, then saying "unbreakable encryption enables criminals and terrorists", without presenting any evidence that those crimes were enabled by encryption technology, or that law enforcement officers were actually hampered in their investigations by encryption.

In fact, in the case of the FBI, Apple, and the San Bernardino shooter, AG Rosenstein repeats all of the same false narrative that we've been presented with before - that the shooter's phone possibly contained vital information, that Apple "could" decrypt the information, and that they fought the FBI's legal attempts to force them to do so. Read my previous blog post (linked above) for background on that line of argument, and how the FBI willfully twists the facts of the case, to try to get something much more far-reaching than what they claim to want.

One thing not addressed directly in Alex's article is the frustration that the FBI and other law enforcement  officials have expressed over the inability to execute a legal search warrant, when they're faced with a locked phone, or a communications system that provides end-to-end encryption.

From Rosenstein's remarks to the Global Security Conference
We use the term “responsible encryption” to describe platforms that allow police to access data when a judge determines that compelling law enforcement concerns outweigh the privacy interests of a particular user.  In contrast, warrant-proof encryption places zero value on law enforcement.  Evidence remains unavailable to the police, no matter how great the harm to victims.
First, what a bunch of emotionally-charged words. And again we see the disconnect between what the FBI and other agencies say that they want (a way to unlock individual phones), and what they seem to keep asking for (a key to unlock any phone they can get their hands on).

But the man does have a point - there is some value to society in the FBI being able to execute a valid search warrant against someone's phone, or to "tap" the communications between known criminals. And I think he's also right that that sort of access is not going to be provided if the free market is allowed to set the rules. It'll never be in Apple's or any individual customer's interest to make it easier to access a locked phone. So, it'll come down to a matter of legislation, and I think it's worth the tech folks having this conversation before Congress sits down with a bill authored by the FBI and the NSA to try to force something on us.

The encryption-in-flight question is very complicated (and crypto protocols are hard to get right - see the recent KRACK security vulnerabilities), so I'll leave that for a future post. I do believe that there are reasonable ways for tech companies to design data-at-rest encryption that is accessible via a court order, but maintains reasonably-good security for customers. Here's a sketch of how one such idea might be implemented:

On-device Key Escrow


Key escrow 
The basic idea of key escrow is that there can be two keys for a particular piece of encrypted data - one key that the user keeps, and one that is kept "in escrow" so another authorized agent can access the data, if necessary. The ill-fated Clipper Chip was an example of such a system. The fatal flaw of Clipper (well, one of them) is that it envisioned every single protected device would have its secondary key  held securely by the government to be used in case of a search warrant being issued. If Clipper had ever seen broad adoption, the value of that centralized key store would have been enormous, both economically and militarily. We're talking a significant fraction of the US GDP, probably trillions of dollars. That would have made it the #1 target of thieves and spies across the world.

Eliminating central key storage
But the FBI really doesn't need the ability to decrypt every phone out there. They need the ability to decrypt specific phones, in response to a valid search warrant. So, how about storing the second key on the device itself? Every current on-device encryption solution that I know of provides for the option of multiple keys. And in fact, briefly getting back to the San Bernardino shooter's phone, if the owners of that phone (San Bernardino County) had had a competent IT department, they would have set up a second key that they could then have handed over to the FBI, neatly avoiding that whole mess with suing Apple.

You could imagine Apple generating a separate "law enforcement" key for every phone, and storing that somewhere, but that has all the same problems as the Clipper central key repository, just on a slightly smaller scale. So those keys need to stored separately. How about storing them on the device itself?

Use secure storage
Not every phone has a "secure enclave" processor like the iPhone, but it's a feature that you'll increasingly see on newer phones, as Apple and other manufacturers try to compete on the basis of providing better privacy protection to their customers. The important feature of these processors is that they don't allow software running on the phone to extract the stored keys. This is what keeps the user's data secure from hackers. So, if the key is stored in there, but the phone software can't get it out, how will the FBI get the key?

Require physical access
My preferred solution would be for the secure enclave to have a physically-disconnected set of pins that can be used just for extracting the second key. In order to extract the key, you'd need to have physical access to the device, disassemble it, and solder some wires on it. This is, I think, sufficiently annoying that nobody would try to do it without getting a warrant first.

It also means that nobody can search your phone without taking it out of your possession for a good long while. This seems like a reasonable trade-off to me. If someone executes a search warrant on your house, you'll certainly know about it. There's such a thing as "sneak and peek" warrants, or delayed-notice warrants, where police sneak in and search your home while you're not there, but I'm not particularly interested in solving that problem for them.

Conclusion
Is this a perfect solution? Of course not. But I think something like this is a reasonable place to start when discussing law enforcement access to personal electronics. And I think we want to have this conversation sooner, rather than later. What do you think?

Monday, October 02, 2017

The "Just Smart Enough" House

Less Architectural Digest, more "This is our home"

We've been doing some remodeling on our house, and the overarching theme of the renovations has been "make this house convenient for real humans to live in". When we bought the house, it was "perfect" in one sense - the house is broken up into two sections, with a central courtyard between, and we were looking for a place where my Father-in-law could come live with us, and still have some space to himself and some privacy.

In many other respects, it was a wildly-impractical house. There's a sad story there, of a couple who fall into and out of love during a remodel, of a mother who overruled the architect in a few critical ways, of a home that was left unfinished when the couple living there split up, and of a house split (illegally) into two units to try to keep it, by supplementing income via renting out the back. 

The end result was a house that certainly looks "fancy", in that it's got a Great Room with a wall entirely filled up by windows and sliding doors, a big fireplace faced in Travertine, and a ridiculous number of doors to the outside, for that "indoor/outdoor living" feeling. Seriously, there are 11 doors to the outside, not including the garage door. Other than being slightly unfinished, it could totally have been a house featured in Architectural Digest.

But when you're living there, you start to notice some of the compromises. I don't think I've ever lived in a house that didn't have a coat closet before. Or a broom closet. Or a linen closet.  Hence the remodel, the first part of which was just turning the illegal 2nd unit into a more-reasonable bedroom suite for Bob, and adding some damn storage.

We added a bunch more storage into the Great Room, and that meant adding new electrical circuits for new under-cabinet and in-cabinet lighting. And because I'm a total nerd, that meant researching Smart Switches to control all of the new lighting (and ideally move some of the more-inconvenient switches to a better location).

Who do you trust?

I pretty quickly settled on getting my smart switches from an electrical equipment manufacturer, rather than some startup "home automation" company. I really, really don't want my house to burn down, and while I have no reason to think that the quality of the zillions of Wi-Fi enabled switches on Amazon.com is anything but excellent, I felt more-comfortable going with a company that has a hundred years or so of experience with not burning people's houses down.

Lutron vs Leviton

(that really sounds like a super-hero movie, doesn't it?)

Lutron and Leviton are two of the largest electrical fixture manufacturers, and choosing between one or the other when buying a regular switch or receptacle is mostly just a matter of which brand your local hardware store carries, and whether or not you want to save $0.25 by buying the store brand.

In the "Home Automation" arena, they each have a variety of solutions, ranging from giant panel-based systems that you're expected to put in a closet somewhere and have installed by a "trained integrator", to simpler systems which are aimed at the DIY market.

You can go all-in, or you can just put a toe in

It didn't take long for me to decide that the fancier integrated systems were not really what I wanted. First off, they're fairly expensive, though the expense looks a little less extreme once you start comparing the per-switch cost of the smart switches vs the centralized version. But ultimately, I didn't really want to deal with a "system integrator" setting the thing up (though apparently it's very easy to get certified by Lutron if you're a licensed electrician, which I'm not). Also, nobody had anything good to say about the phone apps that were available for these systems. And finally, the high-end systems are all about providing a touch pad interface, to give your home that proper Jetsons look. I have no interest in having touch screens mounted on the wall in every room, so that was more of a downside for me, than an attraction. The stand-alone switches from either vendor look more-or-less like standard Decora-style dimmers.

In the consumer-focused lines, there are some interesting differences between the two companies. Leviton's consumer products are mostly compatible with the Z-Wave standard, which means they work with third-party smart home hubs. The reviews online for the Smart Things and Wink hubs weren't particularly encouraging to me, so that was a bit of a bummer.

The Lutron stuff uses a proprietary wireless protocol, and they sell their own hub. The Caseta hub (Lutron's hub) seemed to actually get pretty good reviews. It isn't as capable as the Smart Things hub but, and this was pretty critical for me - it does connect to HomeKit, Apple's home automation system (it also works with Amazon's Alexa and the Google Home device). So, we went with the Lutron Caseta stuff, because it's easy to use, looks reasonable in our house and is available at both Home Depot and Lowes, as well as the local electrical supply store.

Hardware from the hardware store, software from a software company

The connection to HomeKit means that even though the Caseta hub isn't as full-featured as some of the other smart home hubs, I don't really need to care. We're pretty much an all-Apple shop here at Casa de Bessey, so knowing that I could control all of the things attached to the Caseta hub from my phone, using Apple's Home app, is a pretty big draw for me. 

I know it's the 21st century, and everybody needs to have an App, but that doesn't mean every application is equally well-made. If there's a feature that I really "need", and it's not available in the standard software that comes with the Caseta, I could (at least in theory) set up an Apple TV or an iPad as an always-on HomeKit hub, and write my own damn software to run on it.

HomeKit will likely continue to gain new features over the years, so I may never need to do anything custom. But if I do, it's nice to know that I can work with familiar tools and environment, rather than struggling with some obscure system provided by the switch manufacturer.

The Caseta Wireless experience

We're a couple of months into using the Caseta hardware, and here's how it's been going so far.

The Good

Dimmers everywhere
One thing I hadn't really thought about before doing this work is that the dimmer-switch version of the Caseta switches is almost the same price as the plain switch version. We were in the process of gradually replacing our CFL bulbs with LED bulbs anyway, so we've gone with dimmer switches basically everywhere. The added flexibility of being able to set the brightness of any particular light is a nice upgrade.

The basics are all there
All of the fancy features in the world wouldn't be helpful if the basic features weren't there. The switches feel nice, they look nice, and they're easy to install. The software makes it easy to set up "scenes" where you can hit a single button, and set the brightness level of any sub-set of lights in the house.

HomeKit/Siri integration
It just works. There really is something magical about being able to say "Siri, turn out all the lights", and have the entire house go dark. Or indeed saying "Siri, turn out the light in Jeremy's Room" to my watch, and having that work on the first try.

Easy to setup and use
You basically plug in the hub, press a button to pair it with the app on your phone, and then start adding devices. The switches are direct replacements for your existing switches, so installing them is basically:
  1. Turn off the power
  2. Remove the old switch
  3. Wire the new switch/dimmer in
  4. Turn the power back on
The only slightly-complex cases are when you're replacing a three-way switch. The Caseta solution for 3-way (or more) situations is to install the switch at one end, then just install battery-powered remotes at any other location you need to control that light from. When you take out the 3-way, you do need to connect the "traveller" wires together, but they provide instructions online to show you how to do that.

You do have to add each individual switch to the app one at a time, which could get tedious in a large installation. It sure made things easy for the electricians, though - they just had to wire things up, without keeping track of which switch went in which room, since I would set all that up later after they left. From talking to them, I got the impression that the usual install of the higher-end stuff does involve writing down a bunch of "this circuit is on switch #12345" notes, then going back and fixing things later when setting up the controller.

Reliable
Unless the WiFi in the house is down, I haven't had any problems connecting to the hub, either from the Lutron app (when adding new hardware) or from Apple's Home App. Because the individual switches all have controls on them, even in the case of catastrophic failure, you can still walk around and turn off everything "by hand". That's another point in favor of the non-centralized system, I guess.

Supports "enough" devices for my house
One of the big differences between the Caseta stuff and Lutron's next higher tier (Radio RA2), is the number of "devices" they support. Every switch, every dimmer, and every remote control is a "device" for these counts. Caseta only supports 50 devices. I haven't come anywhere close to the limit yet, but we haven't replaced every last switch in the house yet, either. I think we'll be over 40 once all of the switches I care about have been replaced. Our house is close to 2,000 square feet, so if your house is smaller than that, I doubt the limit will ever matter much. And here's where the connection to HomeKit also helps - if we ever do hit the device limit, I can buy another Caseta hub for $75, and have another 50 devices.

The Bad

Range and range extenders
The Caseta documentation says that every controlled device needs to be within 30 feet of the hub. In practice, the maximum reach is just a bit longer than that in our house, but not very much farther. You can extend the range of the system, by using a plug-in dimmer as a repeater. You can have exactly one repeater, which is another limitation compared to the higher-end systems, which support multiple repeaters. But again - if I ever did run into this in practice, I'd probably just get another hub, and have one for each end of the house, since the hubs really aren't all that expensive.

Pricing structure
Honestly, the way that Lutron prices this stuff makes almost no sense at all. You can buy various "kits" with a hub, a dimmer and a remote, or a hub and a few dimmers and remotes, or a hub and some plug-in dimmers. The individual kit components cost more separately, which is no surprise, but some of the prices are weirdly inverted - it costs more to buy just a dimmer than it does to buy the dimmer, a remote, and all of the trim hardware. I assume anybody who makes extensive use of this product line eventually ends up with a box full of unused remotes, but that's just slightly wasteful, not an actual problem.

Trigger configuration is very basic
The "smart" hub isn't very smart. You can bind particular remotes to particular switches, set up scenes, and do some very basic automation. A recent software update improved some of this so that you can now do some more scheduling.

But take, for example, the "arriving home" automation. I can set up a scene to activate when I arrive home. That's nice, but I can't actually set up a scene to activate when I'm the first one home, or the last to leave. HomeKit supports this, so that might be the thing that gets me to finally set up an Apple TV as a HomeKit hub. Or maybe I'll wait for the HomePod...

The Unknown

Security
I haven't done a basic security audit on the Caseta hub, yet. That'll make a fun weekend project. The online component of the hub is protected by a user name and password, at least. And if I do get totally paranoid, I can always disconnect the hub from the internet, and route everything through an iOS HomeKit hub, which is likely to be more-secure.

Longevity
What happens if Lutron decides to end-of-life the Caseta line? Will I still be able to get replacement parts, or a new hub if the old one breaks? For that matter, what if Apple stops supporting HomeKit, or removes the Lutron app from the App Store?

This is the problem with going with the proprietary solution. I am somewhat dependent on both Lutron and Apple staying in this business, and getting along with each other. The hub is basically unusable without the app, so that's definitely a concern. I suspect if Lutron found themselves in a situation where they could no longer provide the iOS app, they'd be motivated to provide another solution, or at the very least, a migration strategy to one of their other control hubs.

At the absolute worst-case scenario, the Caseta switches and the remote controls can be set up and paired to operate completely independently of the hub. I'd lose all of the "smart" features, but at least I'd still have working light switches.

Conclusions

Overall, this was a really great way to get my feet wet with "smartening up" my home. The increased control over the lights in the house is convenient, and actually helps make the house more livable. The potential downsides are limited by the design of the Caseta system, which gracefully falls back to  "no worse than just having old light switches", something which is not necessarily true of other connected home devices, like thermostats, which can have terrible failure modes.

If you're interested in adding some smarts to your home, I can definitely recommend the Caseta products. They're easy to set up and use, and have been very reliable for us so far.

Monday, September 25, 2017

Follow up: LockState security failures

I wrote a blog post last month on what your IoT startup can learn from the LockState debacle. In the intervening weeks, not much new information has come to light about the specifics of the update failure, and it seems from their public statements that LockState thinks it's better if they don't do any kind of public postmortem on their process failures, which is too bad for the rest of us, and for the Internet of Things industry, in general - if you can't learn from others' mistakes, you (and your customers) might have to learn your own mistakes.

However, I did see a couple of interesting articles in the news related to LockState. The first one is from a site called CTOVision.com, and it takes a bit more of a business-focused look at things, as you might have expected from the site name. Rather than looking at the technical failures that allowed the incident to happen, they take LockState to task for their response after the fact. There's good stuff there, about how it's important to help your customers understand possible failure modes, how you should put the software update process under their control, and how to properly respond to an incident via social media.

And on The Parallax, a security news site, I found this article, which tells us about another major failure on the part of LockState - they apparently have a default 4-digit unlock code set for all of their locks from the factory, and also an 8-digit "programming code", which gives you total control over the lock - you can change the entry codes, rest the lock, disable it, and disconnect it from WiFi, among other things.

Okay, I really shouldn't be surprised by this at this point, I guess - these LockState guys are obviously a bit "flying by the seat of your pants" in terms of security practice, but seriously? Every single lock comes pre-programmed with the same unlock code and the same master programming code?

Maybe I'm expecting too much, but if a $2.99 cheap combination lock from the hardware store comes with a slip of paper in the package with its combination printed on it, maybe the $600 internet-connected smart lock can do the same? Or hell, use a laser to mark the master combination on the inside of the case, so it's not easily lost, and anyone with the key and physical access can use the code to reset the lock, in the (rare) case that that's necessary.

Or, for that matter - if you must have a default security code for your device (because your manufacturing line isn't set up for per-unit customization, maybe?), then make it part of the setup process to change the damn code, and don't let your users get into a state where they think your product is set up, but they haven't changed the code.

It's easy to fall into the trap of saying that the user should be more-aware of these things, and they should know that they need to change the default code. But your customers are not typically security experts, and you (or at least some of your employees) should be security experts. You need to be looking out for them, because they aren't going to be doing a threat analysis while installing their new IoT bauble.

Monday, September 18, 2017

A short rant on XML - the Good, the Bad, and the Ugly

[editor's note: This blog post has been in "Drafts" for 11 years. In the spirit of just getting stuff out there, I'm publishing it basically as-is. Look for a follow-up blog post next week with some additional observations on structured data transfer from the 21st century]

So, let's see if I can keep myself to less than ten pages of text this time...

XML is the eXtensible Markup Language. It's closely related to both HTML, the markup language used to make the World Wide Web, and SGML, a document format that you've probably never dealt with unless you're either a government contractor, or you used the Internet back in the days before the Web. For the pedants out there, I do know that HTML is actually an SGML "application" and that XML is a proper subset of SGML. Let's not get caught up in the petty details at this point.

XML is used for a variety of different tasks these days, but the most common by far is as a kind of "neutral" format for exchanging structured data between different applications. To keep this short and simple, I'm going to look at XML strictly from the perspective of a data storage and interchange format.


The good

Unicode support

XML Documents can be encoded using the Unicode character encoding, which means that nearly any written character in any language can be easily represented in an XML document.

Uniform hierarchical structure

XML defines a simple tree structure for all the elements in a file - there's one root element, it has zero or more children, which each have zero or more children, ad infinitum. All elements must have an open and close tag, and elements can't overlap. This simple structure makes it relatively easy to parse XML documents.

Human-readable (more or less)

XML is a text format, so it's possible to read and edit an XML document "by hand" in a text editor. This is often useful when you're learning the format of an XML document in order to write a program to read or translate it. Actually writing or modifying XML documents in a text editor can be incredibly tedious, though a syntax-coloring editor makes it easier.

Widely supported

Modern languages like C# and Java have XML support "built in" in their standard libraries. Most other languages have well-supported free libraries for working with XML. Chances are, whatever messed up environment you have to work in, there's an XML reader/writer library available.


The bad

Legacy encoding support

XML Documents can also be encoded in whatever wacky character set your nasty legacy system uses. You can put a simple encoding="Ancient-Elbonian-EBCDIC" attribute in the XML declaration element, and you can write well-formed XML documents in your favorite character encoding. You probably shouldn't expect that anyone else will actually be able to read it, though.

Strictly hierarchical format

Not every data set you might want to interchange between two systems is structured hierarchically. In particular, representing a relational database or an in-memory graph of objects is problematic in XML. A number of approaches are used to get around this issue, but they're all outside the scope of standardized XML (obviously), and different systems tend to solve this problem in different ways, neatly turning the "standardized interchange format for data" into yet another proprietary format, which is only readable by the software that created it.

XML is verbose

A typical XML document can be 30% markup, sometimes more. This makes it larger than desired in many cases. There have been several attempts to define a "binary XML" format (most recently by the W3C group), but they really haven't caught on yet. For most applications where size or transmission speed is an issue, you probably ought to look into compressing the XML document using a standard compression algorithm (gzip, or zlib, or whatever), then decompressing it on the other end. You'll save quite a bit more that way than by trying to make the XML itself less wordy.

Some XML processing libraries are extremely memory-intensive

There are two basic approaches to reading an XML document. You can read the whole thing into memory and re-construct the structure of the file into a tree of nodes in memory, and then the application can use standard pointer manipulation to scan through the tree of nodes, looking for whatever information it needs, or further-transforming the tree into the program's native data structures. One XML processing library I've used loaded the whole file into memory all at once, then created a second copy of all the data in the tags. Actually, it could end up using up to the size of the file, plus twice the combined size of all the tags.

Alternatively, the reader can take a more stream-oriented approach, scanning through the file from beginning to end, and calling into the application code whenever an element starts or ends. This can be implemented with a callback to your code for every tag start/end, which gives you a simple interface, and doesn't require holding large amounts of data in memory during the parsing.

No random access

This is just fallout from the strict hierarchy, but it's extremely labor intensive to do any kind of data extraction from a large XML document. If you only want a subset of nodes from a couple levels down in the hierarchy, you've still got to step your way down there, and keep scanning throught the rest of the file to figure out when you've gone up a level.


The ugly

By far, the biggest problems with XML don't have anything to do with the technology itself, but with the often perverse ways in which it's misapplied to the wrong problems. Here are a couple of examples from my own experience.

Archiving an object graph, and the UUID curse

XML is a fairly reasonable format for transferring "documents", as humans understand them. That is, a primarily linear bunch of text, with some attributes that apply to certain sections of the text.

These days, a lot of data interchange between computer programs is in the form of relational data (databases), or complex graphs of objects, where you'll frequently need to make references back to previous parts of the document, or forward to parts that haven't come across yet.

The obvious way to solve this problem is by having a unique ID that you can reference to find one entity from another. Unfortunately, the "obvious" way to ensure that a key is unique is to generate a globally-unique key, and so you end up with a bunch of 64-bit or 128-bit GUIDs stuck in your XML, which makes it really difficult to follow the links, and basically impossible to "diff' the files, visually.

One way to avoid UUID proliferation is to use "natural unique IDs, if your data has some attribute that needs to be unique anyway.

What's the worst possible way to represent a tree?

I doubt anybody's ever actually asked this question, but I have seen some XML structures that make a pretty good case that that's how they were created. XML, by its heirarchical nature, is actually a really good fit for hierarchical data. Here is one way to store a tree in XML:

<pants color="blue" material="denim">
  <pocket location="back-right">
    <wallet color="brown" material="leather">  
      <bill currency="USD" value="10"></bill>  
      <bill currency="EURO" value="5"></bill>  
    </wallet>
  </pocket>  
</pants>  

And here's another:

<object>
  <id>
    20D06E38-60C1-433C-8D37-2FDBA090E197
  </id>
  <class>
    pants
  </class>
  <color>
    blue
  </color>
  <material>
     denim
  </material>
</object>
<object>
  <id>
    1C728378-904D-43D8-8441-FF93497B10AC
  </id>
  <parent>
    20D06E38-60C1-433C-8D37-2FDBA090E197
  </parent>
  <class>
    pocket
  </class>
  <location>
    right-back
  </location>
</object>
<object>
  <id>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </id>
  <parent>
    1C728378-904D-43D8-8441-FF93497B10AC
  </parent>
  <class>
    wallet
  </class>
  <color>
    brown
  </color>
  <material>
    leather
  </material>
</object>
<object>
  <id>
    E197AA8D-842D-4434-AAC9-A57DF4543E43
  </id>
  <parent>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </parent>
  <class>
    bill
  </class>
  <currency>
    USD
  </currency>
  <denomination>
    10
  </denomination>
</object>
<object>
  <id>
    AF723BDD-80A1-4DAB-AD16-5B37133941D0
  </id>
  <parent>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </parent>
  <class>
    bill
  </class>
  <currency>
    EURO
  </currency>
  <denomination>
    10
  </denomination>
</object>

So, which one of those is easier to read? And did you notice that I added another 5 Euro to my wallet, while translating the structure? Key point here: try to have the structure of your XML follow the structure of your data.


Monday, September 04, 2017

Post-trip review: Telestial “International Travel SIM”

For our recent trip to Europe, Yvette and I tried the seasoned-traveler technique of swapping out the SIM cards in our phones, rather than paying AT&T’s fairly extortionate international roaming fees. It was an interesting experience, and we learned a few things along the way, which I’ll share here.


We used Telestial, which is apparently Jersey Telecom. Not New Jersey: JT is headquartered in the Jersey Isles, off the coast of Britain. JT/Telestial's claim to fame is really their wide roaming capability. Their standard “I’m traveling to Europe” SIM is good pretty much across all of Europe. It definitely worked just fine in Germany, Denmark, Sweden, Austria, Estonia, Russia, and Finland. They claim a few dozen more, and I don’t doubt it works equally-well in those countries.


Why not just get a local SIM in the country you’re visiting? Isn’t that cheaper?

People who frequently travel overseas will often just pop into a phone retailer, or buy a SIM from a kiosk in the airport. Based on the comparison shopping I did while we were traveling, this is definitely cheaper tan the Telestial solution. However, it’s not at all clear in many cases how well an Austrian SIM is going to work in Finland (for example), and just how much you’ll be paying for international roaming.


So, I think if you’re traveling to just one country (especially one with really cheap mobile phone service costs), buying a local SIM is definitely the way to go. I didn’t really want to keep updating people with new contact numbers every other day as we switched countries. I might look into one of the “virtual phone number” solutions, like Google Voice, for the next multi-country trip. Being able to give people one number, and still roam internationally, seems like it’d be useful, but I don’t know what the restrictions are.


What does setup look like?

First of all, you need a compatible phone. Not all American mobile phones will wrk in Europe. You can check the technical specs for the particular phone mode you have, to see which radio frequencies it supports. Alternatively, you can buy any iPhone more-recent than the iPhone 4s, all of which are “world phones”, as far as I know. Verizon and Sprint still some phones that are CDMA-only, which means they can’t work anywhere but the USA, but most CDMA smartphones also have a GSM SIM slot, so it’s worth taking a look to see, even if you’re on Verizon.


Secondly, your phone needs to not be “activation locked” to a particular carrier. Most phones sold on contract in the US are set up this way, so you can’t just default on the contract and get a phone you can use on another network. Ideally, your phone would get unlocked automatically at the end of the contract, but US law doesn’t require this, so you’ll need to request an unlock from your carrier. AT&T has made this process a lot easier since the last time I tried to do it, which is great, because I forgot to check that Yvette’s phone was unlocked before we left. I did manage to make the unlock request from the other phone while we were in a taxi on the freeway in Austria, which is a testament to how easy this stuff is these days, I guess.


Assuming you have a compatible phone, then process is basically power off phone, pop out the SIM tray with a paper clip, swap the SIMS, turn on the phone, and wait. For the Telestial SIM, you probably really want to activate it and pre-pay for some amount of credit before your trip, which is easy to do on their website.


What kind of plan did you get?

We had a pre-paid fixed allowance for calls and text, and 2GB of data for each phone. Calls were $0.35 a minute, and texts were $0.35 each. Pre-loading $30 on the phone was enough for and hour and a half of phone calls, or a fairly large number of texts. When we had data coverage, we could use iMessage or WhatsApp fro basically free text messages. I don't know whether Voice Over LTE actually worked, and if it avoided the per-minute charge, since we just didn't call that much.

Did you actually save any money?

Compared to what it cost to pay AT&T for an international roaming plan while Yvette was in the UK for a month, we definitely did save a substantial amount of money. This is true even with the crazy cruise ship issue (see below). Without that, it would have been massively less-expensive. And compared to AT&T’s “no plan” international rates (which I got to try out in Israel), there’s absolutely no comparison.

What happened on the cruise ship?

Most of the time, the cruise ship did not have cell service. Which was pretty much fine - we had good coverage when we were in port, and there was WiFi on the ship, if we wanted to pay for it (we did not). We had, on two occasions, a weird thing were our phones managed to connect to a shipboard cell network (maybe on another passing ship?), where we were charged truly outrageous roaming data rates - several dollars a megabyte, which obviously burned through the $30 in prepaid credit really fast. On the other hand, prepaid means that we didn't lose more than $30 (twice, so $60 total). I still don't know exactly what happened there, but if I do this again sometime, I'm going to keep data turned off on the phone when not in port.

The good:


  1. Pre-paid, which limits crazy bills
  2. Easy setup
  3. Easy to recharge, either over the phone, or using the app
  4. Per-minute and per-text rates not too terrible
  5. Works pretty much anywhere in Europe


The bad:
  1. Cruise ship roaming will use up your data allowance right quick
  2. Fixed recharge sizes, and monthly expiration
  3. Forwarding doesn’t work for texts
  4. Some weirdness with “from” numbers on texts (apparently Austria-only)?
  5. No tethering
  6. Email support non-responsive

Conclusion: would we do it again?

Overall, the process was fairly-painless, other than the cruise ship issue. If there’s a simple way to fix that, I’d have no problem doing this again. Otherwise, I’d have to recommend during cell data off when you’re not in port, to avoid accidentally costing yourself a bunch of money.