Core ML: Introduction

6 min readMay 6, 2018

Apple introduced Core ML back in June during the WWDC 2017 conference as a means to integrate machine learning into applications we love and use. In this article I hope to shed some light on what Core ML is and how to use it.

What is Machine Learning?

We are surrounded by lots of data we can use to derive meaning and classification. Machine Learning is the process of deriving meaning from data, it provides machines the ability to distinguish, classify and recognise entities or patterns without having to explicitly be programmed to do so.

Let’s look at an example. If you are developing a game with a built in strategy to win, then there’s no machine learning involved because you as a programmer already defined the strategy. However, if you build a game with no strategy but only the rules then the machine needs to learn how to win the game by repeatedly playing (training) the game until it wins. This not only apply to games but also to programs which provides classification and regression (more on it later).

So in a nutshell, machine learning is the process of continually improving its outcome based on past experiences.

What is Core ML?

The above diagram describes CoreML, it should be more meaningful after reading this article.

CoreML is a framework developed by Apple for MacOS, iOS, WatchOS and TVOS to integrate machine learning into our applications. You might be thinking how devices like the iPhone have the capacity or capability to train or learn predictions? You are correct to think that because machine learning requires lots and lots of data and many GPUs for computation to perform inference. Because of this limitation machine learning traditionally, is carried out by remote servers such as AWS or IBM Watson. With the introduction of frameworks such as Metal and CoreML machine learning can now be performed on devices locally and offline.

How to setup and use CoreML?

Apple made CoreML very easy to setup and use and it can be as little as 3 lines of code.

Upload a trained model to XCode
Use it (instantiate it)
Make predictions

Uploading a trained model is simple as drag and drop. The model should be of a .mlmodel file format which is a new open file format that describes the layers in your model, the input and outputs, the class labels, and any preprocessing that needs to happen on the data. Once the model is added, XCode will automatically generate a Swift or an Objective C interface for you to program against that model. You then simply provide the model with an input and CoreML will take care of running the model (resource optimisation between GPU and CPU etc.) and provide an empirical result for your app to utilise.

Above is a representation rendered by Xcode of a model. As you can see it shows some useful information such as the author, model description and model size. The size of a model is very important as some models can be as large as 500MB or more. A large model can significantly increase your app size because addition to the auto generate model interface, the model itself is also compiled and bundled into your app.

Xcode also show other useful information such as the expected input and output parameters and its context, in this case it takes number of solar panels, greenhouses and size of land in acres. The output will be an empirical price. The model also shows a link to the generated swift model class which we can use to instantiate the model and use it to perform classification or regression.

CoreML is compatible with a wide variety of machine learning tools that are currently available such as Caffe, Keras and more which means that you can train your own model and migrate it to your application. You can also download pre-trained models from Apple’s machine learning page, which is great place to start.

An Example

With the brief introduction over, lets look at a simple example where i will try to use a pre-trained modal to perform prediction on a series of images.

Uploading a model

The above screenshot shows hows to add a modal to your project. The above “MobieNet” modal was download from Apple’s machine learning page. As you can see it is as simple as adding any file and requires no configuration. However you may need to perform a build for XCode to generate the associated modal class.

Using It

// UIImage to CVPixelBuffer
guard let pixelBuffer = weakSelf.photo.pixelBuffer() else {
      return
}

// Instantiate Modal 
let mobileNet = MobileNet()

// Perform prediction
let prediction = try? mobileNet.prediction(image: pixelBuffer)

// Obtain prediction confidence for prediction category
let confidence = prediction?.classLabelProbs[prediction?.classLabel ?? ""]

Once the associate modal class is generate, using the modal to perform prediction is very easy. In this case you first convert an image you want to perform prediction on to a CVPixelBuffer type and pass it as an argument to the prediction function. Different modal require different argument(s) which can be identified from the XCode modal representation.

The return type in this case is a data type called MobileNetOutput which provides two properties classLabel and classLabelProbs. The classLabel property is a String type which describes the image for example “sports car” or “chocolate cake”. The classLabelProbs is a dictionary type with key as String and value as Double. By accessing the dictionary value using classLabel you will obtain the confidence for that category which you can use for some useful business logic for example deciding if a picture is of a hot dog or not 😉. The dictionary also contain other predictions/probabilities of potential categories the modal think its related, for example if the image represents a chocolate cake then the dictionary could have predictions for chocolate or icing.

The above screenshots are taken from the example i put together which can be downloaded from here. As you can see using MobileNet modal i was able to perform classification on a sample picture. When choosing a modal it is important to know that there are other modals that support the same classification but with better prediction capability and lower modal file size. When choosing an ML modal it is important to experiment with several and pick the one that is best suited to your need and performance.

Conclusion

So to recap, we just saw that we’re able to run the neural network classifier called “MobileNet” on device using Core ML, all with only a couple lines of code. Now let’s go back and recap what we just saw. We saw that in Xcode, as soon as you drag an ML model that’s been trained into your application, you get this view that’s populated that gives you information such as the name of the model, the type that it is underneath, and also other information like the size or any other information the author put in. You can also see in the bottom that you get information such as the input and output of the model which is very helpful if you’re actually trying to use it. And once you add it to your app target, you get generated code that you can use to load and use the model to predict. We also saw in our application how simple it was to actually use it. Again, even though the model was a neural network classifier, all we needed to do to instantiate it was call the name of the file. This means that the model type was completely abstracted so whether it’s a support vector machine, a linear model, or a neural network, you all load them the exact same way. We also noticed that the prediction method took a CVPixelBuffer. It was strongly typed so you know you’ll be able to check input errors at compile time rather than at runtime.

With this simple and brief introduction to CoreML start experimenting with this new and exciting API.