Beware of the Swift String

April 10, 2015

tags:
no comments

I have to tell you, for a company that prides itself in delivering polished, high quality products, Apple has really dropped the ball with Swift 1.0. Before the recently released Swift 1.2, it was a nightmare platform to be developing in. My main issues with it were:

  1. Compile times are so slow, that every time I compiled I died a small death. Before 1.2 they had no incrmental build, so every file I changed, I had to wait like 5 minutes for the entire project the build. It literally meant that every feature I wanted to develop took me 5 times more than it should have.
  2. XCode support for it was crappy, without basic features such as a rename refactoring.
  3. Compiling with optimizations (-o flag) was extremely buggy, resulting sometimes in compiled code that was just wrong, especially in code with lots of closures. But compiling without optimizations meant your code was extremly slow.
  4. Even with optimizations, Swift 1.0 had really, really crappy peformance. The dictionary data structure, in particular, was so slow, that I simply had to avoid it everywhere, using NSMutableDictionary or even arrays if I could. I haven’t tested it on Swift 1.2, but I pray it is much faster now.
  5. No Exceptions. Write once, crash everywhere.
  6. Unlike most other programming languages these days, Swift is not open source. Want to look at that Dictionary code? See how you could use it better? Well, you can’t, and you won’t, ever.

But the issue I want to talk about today is the Swift String type. The language designers had decided on a very intresting approach here, they made the string characters not constant in size. As you know, Java or C# a String is basiclly an array of two-byte UTF-16 characters. So for the string “I love emojis ☺️” the smiley there would be represented by two separate characters. Not so in Swift, in which the String type was designed to handle unicode surrogates in a more elegant matter, and the emoji would be considered just one character.

In an ideal world, this would be a Good Thing. But in a world where every millisecond counts, this presents a performance issue. How do you access the 5th character in a Swift string? Well, you can’t use an indexer because of the variable character length. You have to iterate over the characters until you reach the 5th one. And that’s pretty slow. How do you know the string length? Same problem, an O(n) operation. So although the Swift string is more ‘correct’ than the Java/C# implementation, if you don’t pay attention, and if you do a lot of string manipulation like we do, it can kill the peformance of your app. And who knows? Maybe that piece of code you’re writing will never ever have to handle emojis or surrogate characters (because you’re using only English dictionary words or whatever). In this case you’re writing slower code for no reason at all.

Fortunately, there’s a workaround. You can use the String’s utf16 property to treat it as a UTF-16 string. So str.utf16[5] will get you the 5th character quickly. Even iteration on str.utf16 is much faster (at least in Swift 1.0) than iterating the String characters themselves. This tells me that the underlying structure is still an array of UTF-16 characters, but I can’t say for sure as the Swift is not open source.. However, there are some caveats.

  1. str.utf16[5] has a type of UInt16. So if you want to check if it is a dot (‘.’), you have to use something like “.”.utf16[0], which is fugly.
  2. If you want to convert str.utf16[5] back to a string, you can use “\(UnicodeScalar(myUInt16))”. However, this code will CRASH your app if this is the first half of an emoji, or in other words, a non valid unicode code point. And as I said, there are no exceptions in Swift, so you can’t try-catch it. You have to roll your own IsValidUnicodeCodePoint method, based on this definition. Or just copy Java’s Character.isSurrogate method, which is what I did.

To conclude, I find this design choice of the Swift string to be interesting, but weird. They are optimizing for 1% of the cases, hurting peformance for the other 99%. There is a workaround, but it is not pretty. But hey, the language is young, so maybe in 1.3?

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*