Sunday 21 February 2021

Interop NSData compressed / decompressed on iOS with Inflater / Deflater on Android

With iOS 13 Apple added some convenience methods to NSData to compress and decompress data using a set of compression algorithms. Zlib is recommended as the cross-platform option for interoperating with non-Apple platforms. I recently had need of compressing data while supporting iOS & Android, I did not find much in the way of documented resources to achieve this, so this post is to document what I found.

Compressing Data

The zlib standard supports different levels of compression (an integer between 0 and 9) that tradeoff the speed verses the output size. On iOS the level is fixed to 5 and can not be changed. While with Android the developer can specify any of the levels on the constructor of the Deflater class. Additionally Deflater constructor takes a boolean flag called nowrap, by default Deflater generates a header and ADLER-32 checksum. By setting nowrap to true Deflater will not generate the header or checksum, this make it generate the same bytes as the iOS compressed method.

While the compressed / decompressed methods are available on NSData, unfortunately Apple has not exposed on the swift Data type, so casting to NSData is required.

Swift

Kotlin


Decompressing Data

Decompressing the data is similar with the decompressed method on iOS and the Inflater class on Android.

Swift

Kotlin

Conclusion

This article has looked at using the Android Inflater / Deflater to generate the same values achieved by iOS's compressed / decompressed methods. This has been achieved by not generating the header and checksum on Android. For some use cases it may be desired to go the opposite way and have iOS generate the header& checksum, for that have a look at https://github.com/mw99/DataCompression.

Monday 22 July 2019

Wrapping C Function Callbacks to Swift (Part 2)

In part 1 I discussed bridging a Swift reference through a C void pointer. This was for the purpose of wrapping C libraries but maintaining an object oriented approach that ties the callback to a specific instance.  However if you are not fortunate enough to have a C API that offers a void pointer to pass a context, then this approach will work as long as the callback function is called synchronously. That is to say the callback function is invoked before the C function returns and on the same thread as the call to the C function. While this is less common it can crop up from time to time. I found an occurrence of this when I wanted a B-Tree implementation and tried calling the DBM module of BSD.

Since the callback will be made synchronously on the calling thread. We can make use of a capability of dispatch queues, that lets us set a specific key/value pair on a dispatch queue and fetch it within the callback function. The name specific is likely taken from pthreads for thread specific data. To do this you need to create an instance of DispatchSpecificKey and then set the context you wish to pass to the callback function prior to calling the C function.



Sunday 18 November 2018

JSON-Patch: Applying deltas to JSON Documents in Swift

I recently started looking at saving bandwidth between a client and server that communicates with JSON. Some of these messages send large JSON documents that do not change very frequently and when there are changes, they are small relative to the whole document. My search lead me to JSON-Patch  a IETF proposed standard (RFC 6902).  JSON-Patch is a patch format which itself is specified in JSON, it can perform operations on a target JSON document to transform it to the intended state. The standard has the following operations: -

  • add - Insert / Replace a value at the given location with a new value.
  • remove - Remove a value at the given location (value must exist).
  • replace - Replace a value at the given location with a new value (value must exist).
  • move - Move a value at the given location to a new location. Logically equivalent to a remove then add. (value must exist)
  • copy - Copies a value at the given location to a new location (value must exist).
I looked for implantations of JSON-Patch in Swift, and found one but it had dependences on SwiftlyJSON another swift module. Which is great if you're already using it in your project, but I wanted to work with the JSONSerialization that comes in Foundation directly. So I wrote my own JSONPatch swift module. This initial version can only apply patches, it does not currently generate patches based on two JSON documents. Project available here on GitHub.

Usage

JSONPatch can work with the Data representations of JSON.
However if you are already parsing the JSON with JSONSerialization within your project then JSONPatch can work with the parsed objects you already have. This can be especially useful if you want to apply a patch relative to sub-element of the whole JSON document.

Wednesday 5 September 2018

Wrapping C Function Callbacks to Swift (Part 1)

While Swift is a fantastic language, sometimes it can make more sense to reuse an existing C library rather than re-write new code from the ground up with Swift. A lot of open source libraries are written in C because it is one of the most portable languages ever created. Fortunately, Swift has excellent support for calling C code. However, the direct mapping of a libraries C API may not be the most intuitive and so a wrapper is often used to provide a more swifty API on top of the underlying C implementation.
Often C functions have callbacks, these usually take the form of function pointers that are passed by the caller to a C utility function that can invoke the callback function through a pointer. This is a precursor to the higher-level functions and closures within Swift, the ability to pass functions to functions.
I’ll cover this topic in 3 parts, starting with the easiest and going to the most complex.
  1. Wrapping callbacks with a void pointer context parameter
  2. Wrapping Synchronous callbacks without a void pointer
  3. Wrapping Asynchronous callbacks without a void pointer

Fortunately, the easiest scenario is also the most common. This is when the C function in addition to taking a function pointer, also allows the caller to pass a context parameter that will be passed to the callback function. The exact name of the context parameter may vary between different C libraries.
Ever since the introduction of object-oriented languages like C++ and Objective-C, the challenge of mapping older C procedural code into an object model has existed. Ideally, we’d like to map these flat function calls into method invocations on an object instance. However, a method invocation requires at least two pieces of information (not including any other parameters), a reference to the object instance and a reference to the method to be invoked. Passing this instance reference from the caller and having access to it in the callback is the key to wrapping the code in an object-oriented API.
The C language obviously does not have a concept of a swift object. Therefore, a reference to the object must be reduced to a void pointer, these are pointers in C that specify no type information about the type the pointer references. These are mapped to the UnsafeMutableRawPointer type in Swift. We can reduce a reference to an object to a pointer via the Unmanaged type that has methods to convert Swift references to and from raw pointers.
and can be converted back via: -

In Swift you can supply a closure to a C function that takes a function pointer. The type of this closure will be marked with the @convention(c) attribute. However, there are some big limitations with the closures that can be used. That closure cannot capture any context and therefore it can only work with the parameters passed to it. Therefore, you cannot use this as a mechanism to pass a reference to an object instance. If you attempt to you will receive an “error: a C function pointer cannot be formed from a closure that captures context”.


For a real example I will use libxml2, as the example since I had previous written a swift wrapper around the SAX based HTML parser within this C library (see Github for this project). The API of this library takes a struct of C function pointers that are invoked based on the SAX events during the parsing process. The following represents an example of assigning one of the callbacks.
We create a parsing context before invoking the parsing function and and pass the UnsafeMutableRawPointer version of this context to the parsing function.

Sunday 6 May 2018

Improving unit tests with equivalence instead of equality for URLs

Like any good developer I try to ensure that my code is not just covered with unit tests, but that they are meaningful. However I often find that my tests would break because they made too many assumptions about the implementation of the code it was testing. In this post I'm going to be using the example of testing URLs, but this approach can be taken more generally. When working with web services I often find myself writing code that takes an model object and produces a URL request. So naturally I'll write unit tests for this conversion and typically I would have written something like below, were for a given input I check the output URL is equal to what I expect.


For simple cases this works fine, but as things get more complex I would find my tests would become very fragile because the order of the query items could change. This might be because I had used a Dictionary or a Set were the order of the elements is not guaranteed. Indeed even in the simple case above my test makes implicit assumptions about the implementation. The is no reason for firstname to be the first item in the query string, the URL would still be perfectly valid if the order of the items were rearranged or indeed if any of the characters in the URL were percent encoded. My unit test is highly coupled to the specific implementation of my function. Now in this trivial example that may not matter too much, however as things get more complex this coupling leads to unit tests that are brittle. At the root of this brittleness is the equality assertion. When an object (such as a URL) can have multiple alternative but equivalent forms then using equality leads to implementation coupling.


Escaping the tyranny of exact equality

To break this coupling, I've used the concept of Equivalence that will test if two objects are alternative forms of each other. I've created a protocol Equivalent that is modelled after the Equatable protocol in the standard library. Using the triple-bar operator (≡) as the equivalent-to operator. With this we can have types conform to the Equivalent protocol were they have alternative forms.



Equivalence for URLs

Now that we have the Equivalent protocol we apply it to URL. In the below code I use the URLComponents class to do much of the heavy lifting. This implementation will account for the following alternative forms:-

  • Query item order independence.
  • Percent encoding alternative forms.
  • Default port numbers.
  • Relative paths.


Writing equivalent unit tests

For consistency with the rest of the unit tests I wrote a function XCTAssertEquivalent similar to XCTAssertEqual, as it's name implies it uses equivalence rather than equality (see the attached playground for the code). The original unit test changes to the following.

Not much has change with the code of the test, however now the test is far more flexible on testing the output of the function and therefore less coupled to the implementation.

Playground file available here.

Wednesday 21 March 2018

Tracking down bugs using git bisect with Xcode unit tests

Overview

When working on software projects, fixing bugs is an inevitable fact of life. Sometimes you're lucky and the nature of the issue is obvious, however other times tracking down the source of the issue can be a time consuming and laborious task. If your project is using git for source control, then git bisect can be used to track down the commit that introduced the bug. Hopefully by narrowing the focus of investigation to the changes within a single commit, finding the root cause is significantly easier. This is especially true if commits within your project are small and focused.

Git bisect works on the basis of giving a known bad commit (were the bug is present, such as the current head) and a earlier good commit (were the bug is not present). The command works by switching into a special bisect mode were git will checkout a commit and the user will label that commit as good or bad. Git will then checkout a new commit and the process continues until git has determined the first bad commit. For those interested git uses a binary search of the commits in between the initial good and bad commit. So not every commit in-between the initial starting points will be checked. We enter bisect mode by running the following from the terminal: -

$ git bisect start [bad commit] [good commit]

Once in bisect mode, git will checkout a commit for us that we need to label as either: -
  • good – The bug is not present.
  • bad – The bug is present.
  • skip – It could not be determined if the bug was present, such as the project failed to compile.

$ git bisect good | bad | skip

After labelling the commit git bisect will checkout a new commit and the process continues until the first bad commit is found.

Automating the process

This works, however this requires user interaction to determine if the bug is present within a given commit. Instead let’s consider writing a unit test that will determine that for us. That way we could set the whole process in motion and git bisect could determine the first commit in the range that causes our unit test to fail. Git bisect has a command to run a shell command to automatically label a commit.

$ git bisect run [cmd]

The exit code of the shell command is used to determine the label for the commit.
  • Exit Code 0 - good
  • Exit Codes 1 through 127, excluding 125 - bad
  • Exit Code 125 - skip
Apple has documented the process of running unit tests from the command in Automating The Test Process using the xcodebuild command. For those interested Apple's Building from the Command Line with Xcode FAQ is also a good source of information.

The last piece of the puzzle is how to run a unit test that was not part of the project in the past and therefore when git performs a checkout the test will not be present. The simplest method I found was to create a stash of the changes to add the unit test into the project. That way we can apply the stash after each checkout, but before running the tests.

To make running the unit test easier we'll write a shell script to build and run the test target.


You'll notice that the script separates the steps of building the test target from running the unit test. In an ideal world each commit in your project would compile and run. However the reality is that some commits may break the project and fail to build. In these situations we exit the script with status code 125 which git bisect will interpret as the skip command. This will get us to carry on our search, at the cost of potentially making the end result less focused. Additionally the script uses the -only-testing argument to xcodebuild to only run a specific unit test rather than all of them.

Example

If like me you find a worked example easier to digest. Let's use a small, if somewhat contrived example. You'll need to suspend your sense of disbelief here as the source of the bug will be some what obvious. However the aim of the exercise is working with git bisect not fixing this particular issue. 

Git commit graph showing the main develop branch and 3 feature branches that are split off and later merged back to develop
This demo project can be found on GitHub here. The project is an iOS app, that has a simple slider control for numbers between 0 and 10 and a label that will report the current value as well as the sum and average of all integers between 0 and the current value.

To makes things a little more realistic within the repository I have created a main develop branch and 3 feature branches that are created from develop before being reintegrated.

At some point in the development a bug was introduced such that if the slider is at position zero the app will terminate. This did not occur initially so let's use the technique above to determine the commit that introduced the bug.























We start by writing a unit test that can be run to determine if the bug is present.



Next we stash this change so that it can be applied for each checkout.

$ git stash save "unit test"

We can then start the git bisect session with the given bad and good commits. Then issue the command for git bisect to run the script automatically.

$ git bisect start 08a7cbf 664b6a1
$ git bisect run ./bisect-unittests/BisectDemo/unittest.sh

The process will take off and after awhile the following printed out: -

fb4b22d is the first bad commit
commit fb4b22d
Author: Raymond McCrae
Date:   Tue Mar 20 20:06:39 2018 +0000

    Refactor Utilities.average



bisect run success


Looking at the commit the following method

was changed to

During the refactor a bug was introduced that an empty array will cause a division by zero causing the crash. It is worth pointing out that despite giving the initial range of commits from the develop branch, the commit git bisect identified is on feature/3 branch. Clearly git bisect will follow all paths in between the initial commits not just a simple straight path along the current branch.

Finally we can leave bisect mode by running: -

$ git bisect reset

Since we have a unit test we can include it with any fix that we make to the code to ensure that this type of bug is not made again.

Conclusion

Git bisect is a powerful tool, especially when combined with automation. It is however more suited to finding regressions were we had a previous good state and now things are broken. I have used unit tests in this post, however it would also possible to you UI automation tests instead.

Monday 12 February 2018

Automatic summarization of articles in Swift

One of the features of Mac OS X that has intrigued me was the Summarize service that has the ability to generate a summary of a larger piece of text. In more recent versions of the OS this needs to be switched on in the System Preferences. This is an example of a technique called Automatic Summarization. I thought it might be fun to try and perform a similar task in Swift. This is not an attempt to perfectly recreate the exact logic Apple used, but instead how similar results can be achieved with some basic NLP processes.


While there are many ways to achieve this goal, to keep it simple I'll use an approach called document summarization. That builds a summary from the k most important sentences within the text without modifying them, where k is an input variable of the user's choosing for controlling the length of the summary. This works on the principal that a few key sentences can get an overview of the text as a whole. Ok, so how do we know which are the most important sentences? Well based on the observation that the most important topics of an article are often the most repeated, we can build a dictionary of the word frequencies (a count of how many times that word appeared within the article text) and from that we can compute a ranking score for each sentence by adding up the frequencies of the words the sentence contains. Finally we can string together the k highest scoring sentences to form our summary.

Let’s take this one step at a time using this article on recent US policy changes to the ISS from the Washington Post as an example. The first step is computing the word frequencies. To do this we need to break up the text into the individual words.  Fortunately, there is a Foundation class to do just that, NSLinguisticTagger which will enumerate all the tokens in a string. Tokens can be words, punctuation or whitespace. However for our purposes we are only interested in the words, so we specify the options omitWhitespace and omitPunctuation to only deal with the words. The joinName option is used to combine names that have multiple words in to a single token, such as treating New York as a single token rather than two separate tokens.


The following table lists the 10 most frequent words found in the article.

Word Frequency
the 66
to 33
of 24
and 18
station 15
that 15
a 13
in 12
said 11
it 10

There is a problem here. The table is dominated by words that are unimportant to the meaning of the the article. Languages like English often have many words like "the" and "a" which are important for the grammar of a sentence, but are not important to picking out the main themes of an article. If left in the word frequency values then they would dominate the rankings, skewing the score of sentences. Such words are called stop words and we can filter them out. There is no standard list of stop words, so I will use Google's stop word list for English. Let’s update the approach to filter out the stop words from being counted in the word frequencies. We store the stop words as a Set of Strings and check for a given word that it is not contained in the set before incrementing the word frequency.


The following table lists the 10 most frequent non-stop words in the article.

Word Frequency
station 15
nasa 10
commercial 8
space 7
iss 6
private 6
will 5
international 4
document 4
transition 4

This looks much better with more of the words representing important terms and themes in the article. Next we need to identify the sentences, here again NSLinguisticTagger comes to the rescue. In addition to receiving the token range, the closure passed to enumerateTags also gets passed the range of the containing sentence. We can track the sentences as we enumerate the tokens.


Then we can compute the ranking score for each sentence by summing the word frequencies of each word in the sentence.

The following table lists the 3 highest ranked sentences for the article.

Sentence Ranking Index
Last month, as reports circulated about NASA pulling the plug on the station, Mark Mulqueen, Boeing’s space station program manager, said “walking away from the International Space Station now would be a mistake, threatening American leadership and hurting the commercial market as well as the scientific community.” 80 20
The transition of the station would mark another bold step for NASA in turning over access to what’s known as low Earth orbit to the private sector so that the space agency could focus its resources on exploring deep space. 72 23
In its budget request, to be released Monday, the administration would request $150 million in fiscal year 2019, with more in additional years “to enable the development and maturation of commercial entities and capabilities which will ensure that commercial successors to the ISS — potentially including elements of the ISS — are operational when they are needed.” 72 7

Having found our most important sentences we string the k most important sentences together in the order they appeared within the original text. It is important to keep the original order of the sentences when reforming the summary. Otherwise the result can look disjointed.

For our example article and for a value of k equal to 3, the following summary is generated.
In its budget request, to be released Monday, the administration would request $150 million in fiscal year 2019, with more in additional years “to enable the development and maturation of commercial entities and capabilities which will ensure that commercial successors to the ISS — potentially including elements of the ISS — are operational when they are needed.” Last month, as reports circulated about NASA pulling the plug on the station, Mark Mulqueen, Boeing’s space station program manager, said “walking away from the International Space Station now would be a mistake, threatening American leadership and hurting the commercial market as well as the scientific community.” The transition of the station would mark another bold step for NASA in turning over access to what’s known as low Earth orbit to the private sector so that the space agency could focus its resources on exploring deep space.
The full source for this post can be found on Github as a swift playground. There are many methods of performing automatic summarization, this approach was chosen primarily on simplicity, however there are many other approaches that can give more optimal results.