Another Day In The Code Mines: wtf

Showing posts with label wtf. Show all posts

Monday, September 25, 2017

Follow up: LockState security failures

I wrote a blog post last month on what your IoT startup can learn from the LockState debacle. In the intervening weeks, not much new information has come to light about the specifics of the update failure, and it seems from their public statements that LockState thinks it's better if they don't do any kind of public postmortem on their process failures, which is too bad for the rest of us, and for the Internet of Things industry, in general - if you can't learn from others' mistakes, you (and your customers) might have to learn your own mistakes.

However, I did see a couple of interesting articles in the news related to LockState. The first one is from a site called CTOVision.com, and it takes a bit more of a business-focused look at things, as you might have expected from the site name. Rather than looking at the technical failures that allowed the incident to happen, they take LockState to task for their response after the fact. There's good stuff there, about how it's important to help your customers understand possible failure modes, how you should put the software update process under their control, and how to properly respond to an incident via social media.

And on The Parallax, a security news site, I found this article, which tells us about another major failure on the part of LockState - they apparently have a default 4-digit unlock code set for all of their locks from the factory, and also an 8-digit "programming code", which gives you total control over the lock - you can change the entry codes, rest the lock, disable it, and disconnect it from WiFi, among other things.

Okay, I really shouldn't be surprised by this at this point, I guess - these LockState guys are obviously a bit "flying by the seat of your pants" in terms of security practice, but seriously? Every single lock comes pre-programmed with the same unlock code and the same master programming code?

Maybe I'm expecting too much, but if a $2.99 cheap combination lock from the hardware store comes with a slip of paper in the package with its combination printed on it, maybe the $600 internet-connected smart lock can do the same? Or hell, use a laser to mark the master combination on the inside of the case, so it's not easily lost, and anyone with the key and physical access can use the code to reset the lock, in the (rare) case that that's necessary.

Or, for that matter - if you must have a default security code for your device (because your manufacturing line isn't set up for per-unit customization, maybe?), then make it part of the setup process to change the damn code, and don't let your users get into a state where they think your product is set up, but they haven't changed the code.

It's easy to fall into the trap of saying that the user should be more-aware of these things, and they should know that they need to change the default code. But your customers are not typically security experts, and you (or at least some of your employees) should be security experts. You need to be looking out for them, because they aren't going to be doing a threat analysis while installing their new IoT bauble.

Monday, September 18, 2017

A short rant on XML - the Good, the Bad, and the Ugly

[editor's note: This blog post has been in "Drafts" for 11 years. In the spirit of just getting stuff out there, I'm publishing it basically as-is. Look for a follow-up blog post next week with some additional observations on structured data transfer from the 21st century]

So, let's see if I can keep myself to less than ten pages of text this time...

XML is the eXtensible Markup Language. It's closely related to both HTML, the markup language used to make the World Wide Web, and SGML, a document format that you've probably never dealt with unless you're either a government contractor, or you used the Internet back in the days before the Web. For the pedants out there, I do know that HTML is actually an SGML "application" and that XML is a proper subset of SGML. Let's not get caught up in the petty details at this point.

XML is used for a variety of different tasks these days, but the most common by far is as a kind of "neutral" format for exchanging structured data between different applications. To keep this short and simple, I'm going to look at XML strictly from the perspective of a data storage and interchange format.

The good

Unicode support

XML Documents can be encoded using the Unicode character encoding, which means that nearly any written character in any language can be easily represented in an XML document.

Uniform hierarchical structure

XML defines a simple tree structure for all the elements in a file - there's one root element, it has zero or more children, which each have zero or more children, ad infinitum. All elements must have an open and close tag, and elements can't overlap. This simple structure makes it relatively easy to parse XML documents.

Human-readable (more or less)

XML is a text format, so it's possible to read and edit an XML document "by hand" in a text editor. This is often useful when you're learning the format of an XML document in order to write a program to read or translate it. Actually writing or modifying XML documents in a text editor can be incredibly tedious, though a syntax-coloring editor makes it easier.

Widely supported

Modern languages like C# and Java have XML support "built in" in their standard libraries. Most other languages have well-supported free libraries for working with XML. Chances are, whatever messed up environment you have to work in, there's an XML reader/writer library available.

The bad

Legacy encoding support

XML Documents can also be encoded in whatever wacky character set your nasty legacy system uses. You can put a simple encoding="Ancient-Elbonian-EBCDIC" attribute in the XML declaration element, and you can write well-formed XML documents in your favorite character encoding. You probably shouldn't expect that anyone else will actually be able to read it, though.

Strictly hierarchical format

Not every data set you might want to interchange between two systems is structured hierarchically. In particular, representing a relational database or an in-memory graph of objects is problematic in XML. A number of approaches are used to get around this issue, but they're all outside the scope of standardized XML (obviously), and different systems tend to solve this problem in different ways, neatly turning the "standardized interchange format for data" into yet another proprietary format, which is only readable by the software that created it.

XML is verbose

A typical XML document can be 30% markup, sometimes more. This makes it larger than desired in many cases. There have been several attempts to define a "binary XML" format (most recently by the W3C group), but they really haven't caught on yet. For most applications where size or transmission speed is an issue, you probably ought to look into compressing the XML document using a standard compression algorithm (gzip, or zlib, or whatever), then decompressing it on the other end. You'll save quite a bit more that way than by trying to make the XML itself less wordy.

Some XML processing libraries are extremely memory-intensive

There are two basic approaches to reading an XML document. You can read the whole thing into memory and re-construct the structure of the file into a tree of nodes in memory, and then the application can use standard pointer manipulation to scan through the tree of nodes, looking for whatever information it needs, or further-transforming the tree into the program's native data structures. One XML processing library I've used loaded the whole file into memory all at once, then created a second copy of all the data in the tags. Actually, it could end up using up to the size of the file, plus twice the combined size of all the tags.

Alternatively, the reader can take a more stream-oriented approach, scanning through the file from beginning to end, and calling into the application code whenever an element starts or ends. This can be implemented with a callback to your code for every tag start/end, which gives you a simple interface, and doesn't require holding large amounts of data in memory during the parsing.

No random access

This is just fallout from the strict hierarchy, but it's extremely labor intensive to do any kind of data extraction from a large XML document. If you only want a subset of nodes from a couple levels down in the hierarchy, you've still got to step your way down there, and keep scanning throught the rest of the file to figure out when you've gone up a level.

The ugly

By far, the biggest problems with XML don't have anything to do with the technology itself, but with the often perverse ways in which it's misapplied to the wrong problems. Here are a couple of examples from my own experience.

Archiving an object graph, and the UUID curse

XML is a fairly reasonable format for transferring "documents", as humans understand them. That is, a primarily linear bunch of text, with some attributes that apply to certain sections of the text.

These days, a lot of data interchange between computer programs is in the form of relational data (databases), or complex graphs of objects, where you'll frequently need to make references back to previous parts of the document, or forward to parts that haven't come across yet.

The obvious way to solve this problem is by having a unique ID that you can reference to find one entity from another. Unfortunately, the "obvious" way to ensure that a key is unique is to generate a globally-unique key, and so you end up with a bunch of 64-bit or 128-bit GUIDs stuck in your XML, which makes it really difficult to follow the links, and basically impossible to "diff' the files, visually.

One way to avoid UUID proliferation is to use "natural unique IDs, if your data has some attribute that needs to be unique anyway.

What's the worst possible way to represent a tree?

I doubt anybody's ever actually asked this question, but I have seen some XML structures that make a pretty good case that that's how they were created. XML, by its heirarchical nature, is actually a really good fit for hierarchical data. Here is one way to store a tree in XML:

<pants color="blue" material="denim">
  <pocket location="back-right">
    <wallet color="brown" material="leather">  
      <bill currency="USD" value="10"></bill>  
      <bill currency="EURO" value="5"></bill>  
    </wallet>
  </pocket>  
</pants>

And here's another:

<object>
  <id>
    20D06E38-60C1-433C-8D37-2FDBA090E197
  </id>
  <class>
    pants
  </class>
  <color>
    blue
  </color>
  <material>
     denim
  </material>
</object>
<object>
  <id>
    1C728378-904D-43D8-8441-FF93497B10AC
  </id>
  <parent>
    20D06E38-60C1-433C-8D37-2FDBA090E197
  </parent>
  <class>
    pocket
  </class>
  <location>
    right-back
  </location>
</object>
<object>
  <id>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </id>
  <parent>
    1C728378-904D-43D8-8441-FF93497B10AC
  </parent>
  <class>
    wallet
  </class>
  <color>
    brown
  </color>
  <material>
    leather
  </material>
</object>
<object>
  <id>
    E197AA8D-842D-4434-AAC9-A57DF4543E43
  </id>
  <parent>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </parent>
  <class>
    bill
  </class>
  <currency>
    USD
  </currency>
  <denomination>
    10
  </denomination>
</object>
<object>
  <id>
    AF723BDD-80A1-4DAB-AD16-5B37133941D0
  </id>
  <parent>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </parent>
  <class>
    bill
  </class>
  <currency>
    EURO
  </currency>
  <denomination>
    10
  </denomination>
</object>

So, which one of those is easier to read? And did you notice that I added another 5 Euro to my wallet, while translating the structure? Key point here: try to have the structure of your XML follow the structure of your data.

Wednesday, May 02, 2007

They're running a contest over at Worse Than Failure

Worse Than Failure (formerly The Daily WTF) is having a programming contest, with either a MacBook Pro or Sony VAIO laptop as first prize.

For those not familiar with the site, their theme is Curious Perversions in Information Technology. People submit particularly awful pieces of code, or database schemas, or business practices, and the readers of the site make various insightful, witty, outraged, or just plain misguided, comments on them.

The contest: Implement a four-function calculator program (for Windows, or UNIX/GTK). Your program must pass the specified test cases, have a GUI driveable using a mouse, and should cause people who read the code to shake their heads sadly in disgust.

There's much more detail on the contest website, but I thought the idea of the contest was interesting enough to mention. The contest discussion forum is hilarious, as well. Check out this comment:

Mine's too slow right now. It's taking about 45 minutes to add 9876 and 1234, when I was hoping for about three seconds. I knew it was O(n^2), but I was expecting the constant to be a bit smaller. I may need to replace the Mersenne twister with a faster random-number generator.

I have what I think is a pretty good set of ideas for a submission, I'm just not sure that I'll be able to finish in the time remaining (12 days left). I'll probably submit whatever I have by then, even though I almost certainly won't win. I'm looking forward to seeing the other entries, though.