Another Day In The Code Mines: XCode

Showing posts with label XCode. Show all posts

Saturday, June 13, 2020

ARM Macs are coming, and faster than you think

ARM Macs and transition timeframes

(note: This is a lightly-edited version of a post originally published on June 13th, 2020)

We all knew this was coming. In fact, some of us have been expecting it for years. Various rumor outlets are saying that Apple will announce at WWDC that they're transitioning the Macintosh line from using Intel's processors to Apple's own processors, which are based on the ARM architecture.

A bunch of people have written extensively on this rumor, but I have a few thoughts that I haven't seen others concentrate on.

One thing you see a lot of disagreement about online is how long it'll take for Apple to convert its whole lineup of Macs to use its own processors, or if it even will. I've seen people say that they think they'll announce a single model of ARM Mac, then over the course of 2-3 years, move all of the product line over to ARM. I've even seen people predict that they'll keep the "Pro" line of computers on x86 for the foreseeable future, and only convert the portable line to ARM.

A case can be made for those positions, but here's what I think: If Apple announces a transition to ARM at WWDC, it'll happen surprisingly quickly. I wouldn't be at all surprised if the first ARM Macs ship before the end of 2020 , and the whole line is switched over before the end of 2021. That seems like a pretty far out-there prediction, compared to the "consensus" view, so let's take a look at the previous transitions, and how this one is different.

We've been here before, but then again, this is very different

This will be the third major processor transition for the Macintosh (and the 5th major software transition overall). Originally, the Mac used the Motorola m68k processor family. After 10 years, the m68k family was failing to make regular improvements in performance, and Apple started to look at other options, finally settling on the PowerPC. They moved the Mac products from m68k to PPC over the course of about 18 months. Years later, they transitioned from PowerPC to Intel, over the course of about 12 months. And now, we're apparently on the cusp of another processor transition. How will this one turn out? And most importantly: WHY NOW?

Transition 1: The m68k to PowerPC transition

"Any sufficiently-advanced technology is indistinguishable from magic"

– Arthur C. Clarke

This transition was very difficult, both for Apple and for third parties. At the time that Apple announced the change, they were still using what we now call the Classic MacOS. Large parts of the operating system, and the applications that ran on it, were written in Assembly, with an intimate understanding of the hardware they ran on.

Consequently, Apple developed a processor emulator, which would allow existing m68k code to run on the PowerPC without any changes. You could even have an application load a plugin written for the other architecture. The new PPC version of MacOS maintained a shadow copy of all its internal state in a place where 68k applications could see (and modify) it - that was the level of compatibility required to get anything to work. A heroic effort, and it paid off - most software worked out of the box, and performance was "good enough" with emulated code, because the PPC chips were much faster than the m68k chips they were replacing.

The downside of that sort of transition is that it takes many years to complete. There was relatively little pressure on the third parties to update their applications, because they ran just fine on the new models. Even the MacOS itself wasn't completely translated to native code until several years later.

Transition 2: The MacOS X transition

“If I need to make that many changes, I might as well drop the Mac, and go to Windows”

– Some Mac developer, a Halloween party in 1999

A few years after the PPC transition, Apple announced MacOS X, and software developers were faced with another transition. Originally, OS X was intended to be a clean break with Classic MacOS, with an all-new underlying operating system, and a brand new API, Cocoa (based on the OPENSTEP software which came in with the NeXT acquisition).

Major developers were (understandably) not enthusiastic about the prospect of rewriting the majority of their existing applications. Eventually, Apple caved to the pressure, and provided Carbon, a "modern" API that kept much of the same structure, but removed some of the more egregious aspects of Classic MacOS programming. Apple made it clear that they considered Carbon a transitional technology, and they encouraged developers to use Cocoa. The reaction from the larger developers was pretty much "meh." Quite a few smaller long-time MacOS developers enthusiastically embraced the new APIs though, appreciating the productivity boost they provided.

A footnote to this chapter of the saga is that the "Developer Preview" versions of Rhapsody, the first Mac OS X releases, actually had support for running the OS on Intel-based PC hardware. That didn't survive the re-alignment which gave us Carbon, and MacOS X 10.0 shipped with support for PowerPC Macs only.

Things were pretty quiet on the Macintosh front for a few years. New versions of OS X came out on a regular schedule, and Apple kept coming out with faster and better PowerBooks, PowerMacs, iBooks, and iMacs. And then, suddenly, the PowerPC processor line had a few unexpected hiccups in the delivery pipeline.

Transition 3:The Intel transition

“Wait, you were serious about that?”

– Carbon-using developers, overheard at WWDC 2005

The PowerPC processors were looking less and less competitive with Intel processors as time went by, which was an embarrassment for Apple, who had famously built the PowerPC advertising around how much faster their chips were than Intel's. The "G5" processor, which was much-hyped to close the gap with Intel, ran years late. It did eventually ship, in a form that required liquid cooling to effectively compete with mainstream Intel desktop PCs. The Mac laptop range particularly suffered, because the low-power laptop chips from Motorola just...never actually appeared.

And so, Apple announced that they were transitioning to Intel processors at WWDC 2005. I was working in the Xcode labs that year, helping third-party developers to get their code up and running on Intel systems. I worked a lot of "extra shifts", but it was amazing to see developers go from utterly freaked out, to mostly reassured by the end of the week.

For any individual developer, the amount of “pain” involved in the transition was variable. If they’d “kept up with” Apple’s developer tools strategy in the years since the introduction of Mac OS X, no problem! For the smaller indie developers who had embraced Xcode, Cocoa, and all of Apple's other newer framework technology, it actually was a trivial process (with one exception). They came into the lab, clicked a button in Xcode, fixed a bunch of compiler warnings and errors, and walked away with a working application, often in just an hour or so.

For the developers with large Carbon-based applications built using the Metrowerks compiler, it was a real slog. Because of CodeWarrior-specific compiler extensions they'd used, different project structures, etc, etc, it was hard to even get their programs to build in Xcode.

The exception to the "it just works" result for the up-to-date projects is any kind of external I/O. Code that read or wrote to binary files, or communicated over a network, would often need extensive changes to flip the "endianness" of various memory structures. Endianness is something you generally don’t need to think about as a developer in a high-level language, especially if you're only developing for one platform, which also just happens to use the same endianness as the Internet does. Luckily, these changes tended to be localized.

Transition 4: The 64-bit transition

“You can't say we didn't tell you this was coming..."

– Some Apple Developer Tools representative (probably Matthew), at WWDC 2018

The first Intel-based Macs used processors that could only run in 32-bit mode. This is what I consider one of Apple's major technology mistakes, ever. They should have gone directly to 64-bit Intel from the get-go, though that would have required waiting for the Core II Duo processors from Intel, or using AMD chips, or doing the iMac first, and the notebooks last.

Regardless, after the first year, all Macs were built with 64-bit capable processors, and MacOS started supporting 64-bit applications soon after. Technically, the previous versions of Mac OS X supported 64-bit applications on the "G5" processors, but that was only available in the Power Mac G5, and very few applications (other than ports from workstation hardware) bothered to support 64-bit mode.

Unfortunately for the folks hoping to see a glorious 64-bit future, there was again very little incentive for developers to adopt 64-bit code on MacOS. One of the advantages of Intel-based Macs over the PowerPC versions is that you could reuse code libraries that had been written for Windows PCs. But, of course - almost all Windows applications are written for 32-bit mode, so any code you share between the Windows and Mac versions of your application need to be 32-bit. You also can't mix-and-match 32-bit and 64-bit code in the same process on MacOS. So most MacOS applications remained 32-bit for years after there were no longer any 32-bit processor Macs being sold. Even when OS X 10.7 dropped support for 32-bit processors entirely, most applications stayed 32-bit.

Apple told developers at every WWDC from probably 2006 on that they should really convert to 64-bit. They'd talk about faster performance, lower memory overhead for the system software, and various other supposed advantages. And every year, there just didn’t seem to be any great need to do so, so mustly, all Mac software remained 32-bit. A few new APIs were added to MacOS which only worked in 64-bit applications, which just had the unfortunate effect of those features never seeing wide adoption.

Eventually, Apple's tactics on this issue evolved from promises to threats and shaming. Developers were told at WWDC that 32-bit applications would not be supported "without compromises" in High Sierra. Then, when High Sierra shipped, we found that Apple had added a warning message that 32-bit applications were "not optimized" for the new OS. That got the end users to start asking developers about when they were going to “optimize” for the new operating system. For the better part of a year, many developers scrambled to get their apps converted before MacOS Mojave shipped, because they made the reasonable assumption that the warning message was implying that Mojave wouldn’t support 32-bit applications. But then Mojave shipped, and 32-bit apps ran the same as they ever have, with the same warning that was displayed in High Sierra. And then, in MacOS Catalina, they finally stopped allowing 32-bit software to run at all.

Converting an existing 32-bit Cocoa application to 64-bit is not particularly difficult, but it is...tedious. You end up having to make lots of small changes all over your code. In one project that I helped convert, there were changes needed in hundreds of source code files. We got there, but nobody thought it was fun, and it seemed so pointless. Why inflict this pain on users and developers, for what seemed like no gain?

You couldn't say that we weren’t warned that this was coming, since Apple had been telling developers for literally a decade to convert to 64-bit. But third-party developers were still pretty confused about the timing. Why "suddenly" deprecate 32-bit apps for Catalina? Just to incrementally reduce the amount of maintenance work they needed to do on MacOS? Or to reduce the amount of system overhead by a handful of megabytes on and 8GB Mac? It didn’t make sense. And why did they strongly imply it was coming in Mojave, then suddenly give us a reprieve to Catalina?

Transition 5: The ARM Mac

“The good news is, you've already done the hard part”

– Apple, WWDC 2020

With all of this in mind, I think that the sudden hard push for 64-bit in High Sierra and beyond was a stealth effort to get the MacOS third-party developers ready for the coming ARM transition. When High Sierra shipped, almost all MacOS software was 32-bit. Now that Catalina is out, almost all the major applications have already transitioned to 64-bit. Perhaps the reason the “deadline” was moved from Mojave to Catalina was because not enough of the “top ten” applications had been converted, yet?

Prior to finally getting all third-party developers to adopt 64-bit, the transition story for converting to ARM would have been complicated, because the applications were all 33-bit, and the Mac ARM chips would be 64-bit (the iOS platform having had their 64-bit conversion a few years back). Apple would have been telling developers: "First, you need to convert to 64-bit. Then, you can make any changes needed to get your code running on ARM".

Now, it's going to be very simple: "If your application currently builds for Catalina, with the current SDK, you can simply flip a switch in Xcode, and it'll be able to run on the new ARM Macs, as well". That's not going to be literally true for many applications, for various reasons (binary dependencies on some other third-party SDK, Intel-specific intrinsics in performance-critical code, etc).

But this time, there is no endianness issue, no apps that will need to change their toolchain, none of the previous issues will be relevant. I also think it's quite likely that there will be very few arbitrary API deprecations in MacOS 10.16, specifically to make this transition as painless as possible for as many developers as possible.

What’s it all mean, then?

"All of these products are available TODAY"

– Steve Jobs's famous tagline, possibly heard again this year?

So then - on to timing. For the Intel transition, there were about 6 months between announcement and availability of the first Intel Mac, and another 8 months before the transition was complete. This time, it's likely that Apple will shrink the time between announcement and availability, because there's comparatively little work that needs to be done to get most applications up and running on the new hardware.

It's possible that they'll announce ARM Macs, shipping that very day. If Adobe and Microsoft are already ready to go on day one, it might even be plausible. I think they'll want to give developers some time to get ready, though. So, I predict 3 months, meaning ARM Macs in September, 2020. And I think they'll move aggressively to put as many Macs on their own processors as they can, because it's all a win for them - lower costs, better battery life, etc, etc.

"But what about the Mac Pro?", you'll hear from some experts. "Nobody's ever produced a Xeon-equivalent performance ARM chip. It'll take years to make such a design, if it's even possible at all".

The obvious comeback here is: Nobody knows what Apple has running in a lab somewhere, except the people who are working on the project. Maybe they already have an 80 Watt ARM powerhouse chip running in a Mac Pro chassis, right now. But even if they don't, I think it's reasonable to look at this from the "Why Now?" perspective, again.

The previous processor transitions were mainly driven by a need to stay competitive in performance with Intel. That is not the case this time, since the desktop/laptop competition is almost exclusively on the same Intel processors that Apple is using. The success, or rather the lack thereof, of other ARM-architecture computers & operating systems (Windows for ARM, ChromeBooks) doesn't make a compelling case for a switch to “keep up” with Microsoft or Google, either. So there's no hurry.

Given that there's no external pressure to switch, Apple must think that they have a compelling story for why they're switching. And that has to include the entire product line, since the Mac isn't their most-important product, and they surely aren't going to support two different architectures on it, just to keep existing Mac Pro users happy. They either have a prototype of this Mac Pro class processor ready to go, or they're very sure that they can produce it, and they have a believable roadmap to deliver that product. Otherwise, they’d just wait until they did.

Which Macs are switching first?

“I bet you didn’t see that coming, skeptics!”.

– Me, maybe?

Everybody is expecting to see a new Macbook, possibly bringing back the 12-inch, fanless form factor, and taking maximum advantage of the power-saving and cooler operation of Apple’s own chips. Some folks are expecting a couple of different models of laptops.

What I would really love to see (but don’t much expect) is for Tim Cook to walk out on stage, announce the first ARM-based Mac, and have it not be a super-small, low-power consumer laptop product. I want it to be something high end that decisively outperforms the current Mac Pro, and establishes that this is going to be a transition of the whole line. I think that'd be a bold statement, if they could swing it.

Monday, September 18, 2017

A short rant on XML - the Good, the Bad, and the Ugly

[editor's note: This blog post has been in "Drafts" for 11 years. In the spirit of just getting stuff out there, I'm publishing it basically as-is. Look for a follow-up blog post next week with some additional observations on structured data transfer from the 21st century]

So, let's see if I can keep myself to less than ten pages of text this time...

XML is the eXtensible Markup Language. It's closely related to both HTML, the markup language used to make the World Wide Web, and SGML, a document format that you've probably never dealt with unless you're either a government contractor, or you used the Internet back in the days before the Web. For the pedants out there, I do know that HTML is actually an SGML "application" and that XML is a proper subset of SGML. Let's not get caught up in the petty details at this point.

XML is used for a variety of different tasks these days, but the most common by far is as a kind of "neutral" format for exchanging structured data between different applications. To keep this short and simple, I'm going to look at XML strictly from the perspective of a data storage and interchange format.

The good

Unicode support

XML Documents can be encoded using the Unicode character encoding, which means that nearly any written character in any language can be easily represented in an XML document.

Uniform hierarchical structure

XML defines a simple tree structure for all the elements in a file - there's one root element, it has zero or more children, which each have zero or more children, ad infinitum. All elements must have an open and close tag, and elements can't overlap. This simple structure makes it relatively easy to parse XML documents.

Human-readable (more or less)

XML is a text format, so it's possible to read and edit an XML document "by hand" in a text editor. This is often useful when you're learning the format of an XML document in order to write a program to read or translate it. Actually writing or modifying XML documents in a text editor can be incredibly tedious, though a syntax-coloring editor makes it easier.

Widely supported

Modern languages like C# and Java have XML support "built in" in their standard libraries. Most other languages have well-supported free libraries for working with XML. Chances are, whatever messed up environment you have to work in, there's an XML reader/writer library available.

The bad

Legacy encoding support

XML Documents can also be encoded in whatever wacky character set your nasty legacy system uses. You can put a simple encoding="Ancient-Elbonian-EBCDIC" attribute in the XML declaration element, and you can write well-formed XML documents in your favorite character encoding. You probably shouldn't expect that anyone else will actually be able to read it, though.

Strictly hierarchical format

Not every data set you might want to interchange between two systems is structured hierarchically. In particular, representing a relational database or an in-memory graph of objects is problematic in XML. A number of approaches are used to get around this issue, but they're all outside the scope of standardized XML (obviously), and different systems tend to solve this problem in different ways, neatly turning the "standardized interchange format for data" into yet another proprietary format, which is only readable by the software that created it.

XML is verbose

A typical XML document can be 30% markup, sometimes more. This makes it larger than desired in many cases. There have been several attempts to define a "binary XML" format (most recently by the W3C group), but they really haven't caught on yet. For most applications where size or transmission speed is an issue, you probably ought to look into compressing the XML document using a standard compression algorithm (gzip, or zlib, or whatever), then decompressing it on the other end. You'll save quite a bit more that way than by trying to make the XML itself less wordy.

Some XML processing libraries are extremely memory-intensive

There are two basic approaches to reading an XML document. You can read the whole thing into memory and re-construct the structure of the file into a tree of nodes in memory, and then the application can use standard pointer manipulation to scan through the tree of nodes, looking for whatever information it needs, or further-transforming the tree into the program's native data structures. One XML processing library I've used loaded the whole file into memory all at once, then created a second copy of all the data in the tags. Actually, it could end up using up to the size of the file, plus twice the combined size of all the tags.

Alternatively, the reader can take a more stream-oriented approach, scanning through the file from beginning to end, and calling into the application code whenever an element starts or ends. This can be implemented with a callback to your code for every tag start/end, which gives you a simple interface, and doesn't require holding large amounts of data in memory during the parsing.

No random access

This is just fallout from the strict hierarchy, but it's extremely labor intensive to do any kind of data extraction from a large XML document. If you only want a subset of nodes from a couple levels down in the hierarchy, you've still got to step your way down there, and keep scanning throught the rest of the file to figure out when you've gone up a level.

The ugly

By far, the biggest problems with XML don't have anything to do with the technology itself, but with the often perverse ways in which it's misapplied to the wrong problems. Here are a couple of examples from my own experience.

Archiving an object graph, and the UUID curse

XML is a fairly reasonable format for transferring "documents", as humans understand them. That is, a primarily linear bunch of text, with some attributes that apply to certain sections of the text.

These days, a lot of data interchange between computer programs is in the form of relational data (databases), or complex graphs of objects, where you'll frequently need to make references back to previous parts of the document, or forward to parts that haven't come across yet.

The obvious way to solve this problem is by having a unique ID that you can reference to find one entity from another. Unfortunately, the "obvious" way to ensure that a key is unique is to generate a globally-unique key, and so you end up with a bunch of 64-bit or 128-bit GUIDs stuck in your XML, which makes it really difficult to follow the links, and basically impossible to "diff' the files, visually.

One way to avoid UUID proliferation is to use "natural unique IDs, if your data has some attribute that needs to be unique anyway.

What's the worst possible way to represent a tree?

I doubt anybody's ever actually asked this question, but I have seen some XML structures that make a pretty good case that that's how they were created. XML, by its heirarchical nature, is actually a really good fit for hierarchical data. Here is one way to store a tree in XML:

<pants color="blue" material="denim">
  <pocket location="back-right">
    <wallet color="brown" material="leather">  
      <bill currency="USD" value="10"></bill>  
      <bill currency="EURO" value="5"></bill>  
    </wallet>
  </pocket>  
</pants>

And here's another:

<object>
  <id>
    20D06E38-60C1-433C-8D37-2FDBA090E197
  </id>
  <class>
    pants
  </class>
  <color>
    blue
  </color>
  <material>
     denim
  </material>
</object>
<object>
  <id>
    1C728378-904D-43D8-8441-FF93497B10AC
  </id>
  <parent>
    20D06E38-60C1-433C-8D37-2FDBA090E197
  </parent>
  <class>
    pocket
  </class>
  <location>
    right-back
  </location>
</object>
<object>
  <id>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </id>
  <parent>
    1C728378-904D-43D8-8441-FF93497B10AC
  </parent>
  <class>
    wallet
  </class>
  <color>
    brown
  </color>
  <material>
    leather
  </material>
</object>
<object>
  <id>
    E197AA8D-842D-4434-AAC9-A57DF4543E43
  </id>
  <parent>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </parent>
  <class>
    bill
  </class>
  <currency>
    USD
  </currency>
  <denomination>
    10
  </denomination>
</object>
<object>
  <id>
    AF723BDD-80A1-4DAB-AD16-5B37133941D0
  </id>
  <parent>
    AFBD4915-212F-4B47-B6B8-A2663025E350
  </parent>
  <class>
    bill
  </class>
  <currency>
    EURO
  </currency>
  <denomination>
    10
  </denomination>
</object>

So, which one of those is easier to read? And did you notice that I added another 5 Euro to my wallet, while translating the structure? Key point here: try to have the structure of your XML follow the structure of your data.

Saturday, January 24, 2009

This week's iPhone SDK sob story

I have ranted about this before, I know, but I'm a little irritated. Every single time I update the iPhone tools, I run into some crazy issue building code that worked just fine on a previous version.

This week, after digging my office out from under all the mess from moving to a new house, I revisited one of my older projects (yes, Pictems is finally getting an update!). And I ran into not one, but two of these issues. That's not counting the usual Code Signing errors, which I don't even pay attention to - I just click randomly on the Code Signing options until they go away.

(For my friends on the XCode team: Yes, I will file bugs on these issues, once I figure out what's going on. This is not a bug report)

Issue #1: During some early experimentation, I had set the "Navigation Bar Hidden" property on one of my Nib files. It didn't seem to do what I wanted, but I didn't bother to change it back. At some point, a change was made such that it now works. Great, but apparently the change was actually made in one of the iPhone tools, so even if I build my old project, with the SDK set to 2.0, I still get the new behavior. Easy to fix, but it's weird to have to change my "archived" version of my source so it builds correctly with the current version of XCode. If I build my old project against the old SDK, I'd expect to get the old behavior.

Issue #2: One of my resource files has a $ character in the name. One of the XCode copy scripts apparently changed such that it's not escaping the filename correctly, so now the resource doesn't get copied. Amusingly, no error message results - the file just ain't there. Yes, it's dubious to name a file with a $ in the name. But, again, it used to work just fine.

Oh, well. In the bigger scheme of things, I still prefer XCode/iPhone to Eclipse/Android...