CNTK 2.5: A C# Primer

If you prefer writing code with DotNet, and wish to get started in machine learning, you’re in luck: CNTK now allows you to build, train, and evaluate models all within your managed code, while still taking advantage of hardware acceleration!

Since the documentation for the C# libraries is poor (at best), I thought it would be useful to put up a quick primer on how to get started.

Here I’ll show how to describe, train, and test a simple model to evaluate a two-input XOR function. There will be a single “hidden” layer and an output layer.

This will NOT be an introduction to the concepts behind machine learning, rather to the CNTK API that’s available for your DotNet applications.

Getting Started

Within Visual Studio, reference the CNTK.CPUOnly or CNTK.GPU NuGet packages, as shown here. You may use either or both but, if you have a GPU, you might as well use it. They will automatically install all dependencies for you. Ah, the wonders of package management tools.


There are two main namespaces you need to reference, CNTK and CNTK.CNTKLib. For myself, I prefer using adding

using CNTK;

and manually specifying CNTKLib.<whatever>, but that’s personal preference.

The first thing you’ll need to do is grab a reference to the compute device you wish to use. For most of us, this will be one of the following lines of code:

var cpu = DeviceDescriptor.CPUDevice();
var gpu = DeviceDescriptor.GPUDevice(deviceid);

Several functions require a reference to the compute device being used, which is why we need this. CPUDevice() will return a reference to the CPU only code, and GPUDevice requires an identifier to the processor to use in case of multi-GPU systems.

Since all my work runs on machines with a single GPU that’s vastly more efficient than the CPU, I tend to do the following:

var device = DeviceDescriptor.GPUDevice(0);

and throw it somewhere convenient like a settings container.

Preparing data

Passing data into CNTK is straightforward: specify the shape of the data, a flat list (IEnumerable, actually) and the device to send the data to.

var features = Value.CreateBatch(new int[] { 2 }, new float[] {0,0, 0,1, 1,0, 1,1}, device);
var labels = Value.CreateBatch(new int[] { 2 }, new float[] {1,0, 0,1, 0,1, 1,0}, device);

CreateBatch takes an enumerable of data and batches it for the compute device of choice. It’s fairly straightforward: it wants the shape, the data, and the device.

The first operand is an instance of the NDShape class, but it casts from an integer array specifying the dimensions of your data. If you’re using multi-dimensional data you can specify it here; for instance, evaluating a game of Connect4, I would use


Since there are 7 columns, 6 rows, and 3 possible values for each cell (black, red or empty). Doing this allows you to use convolutional layers if you desire.

If you have a single shape you’re using repeatedly you could pre-declare it, but I haven’t run into a real need to do so yet. Outside of this example, I haven’t re-used the same shape in more than a few places.

Next up is the data: as I said above, it wants an IEnumerable to read the data from. I prefer to keep my data sets as either an array of arrays or a list of lists, but flattening the array is easy enough if that’s what you do.

In this example, we have four input cases: 00, 01, 10 and 11. There are also four results: 10,01,01,10.

Keep in mind that you want an output for each possible classification of your data. In this case, we are using the first signal for False, and the second for True.

Building the model

First, we need variables describing the input and output of the model:

var featureVar = Variable.InputVariable(new int[] { 2 }, DataType.Float, "Features");
var labelVar = Variable.InputVariable(new int[] { 2 }, DataType.Float, "Labels");

The InputVariable factory is fairly simple: It wants the shape of the variable, the data type, and a label. Although the label is optional, labeling variables is a useful enough practice that you probably want to go ahead and do it all the time.

DataType is an enum with possible values of  .Float, .Double or .Float16.

Variables are meant to be supplied with values, created above. We also have learnable parameters which are adjusted as we train the model. Let’s set those up:

var iWeights = new Parameter(new int[] { 5, 2 }, DataType.Float, CNTKLib.GlorotNormalInitializer());
var iBias = new Parameter(new int[] { 5 }, DataType.Float, 0);

var oWeights = new Parameter(new int[] { 2, 5 }, DataType.Float, CNTKLib.GlorotNormalInitializer());
var oBias = new Parameter(new int[] { 2 }, DataType.Float, 0);

var layer = CNTKLib.Times(iWeights, featureVar) + iBias;
var activation = CNTKLib.ReLU(layer);
var func = (oWeights * activation) + oBias;

Let’s look at the last 3 lines, as they actually determine what our first sets of lines should be.

First, we have the “layer” object: We multiply a weights variable by the features variable and add a bias. There are five hidden nodes and two input nodes, so the shape of the input Weights parameter needs to be { 5, 2 }. It is also important that the weights start with randomized values, so we’re using an initializer (CNTKLib.GlorotNormalInitializer) which provides a normal distribution of values centered around 0.

The bias parameter only needs to be the size of the hidden layer, so it’s shape is { 5 }. It’s less important for the bias to be randomly initialized, so we’re setting all the biases with an initial value of 0.

After doing the Multiply + Add, we go through an activation function. In this case, we’ll be using ReLU, but there are others available within the library.

Finally, we’ll be performing another Multiply + Add. Because there are two output nodes, and five intermediate nodes, the shape of this weighting parameter is { 2, 5 }. The bias is being added to two nodes, so its shape needs to be { 2 }.

I’m intentionally showing two different forms of multiplication here, by the way, so you can see they both work. The output at every step, from Times, ReLU, *, and +, is an instance of the Function class. The final result, “func,” is the completed model we will train.

Creating a trainer, and training

To train a model, we need to provide a loss function, a classification function, and a learning rate. In most cases, the only thing that will change will be the learning rate (let’s be honest, if you’re getting picky about your loss and classification functions you probably don’t need this guide).

var loss = CNTKLib.CrossEntropyWithSoftmax(func, labelVar);
var evalError = CNTKLib.ClassificationError(func, labelVar);
var learningRatePerSample = new CNTK.TrainingParameterScheduleDouble(0.001, 1);
var parameterLearners = new List<Learner>() { Learner.SGDLearner(func.Parameters(), learningRatePerSample) };
var trainer = Trainer.CreateTrainer(func, loss, evalError, parameterLearners);

The training data will be passed to the trainer in a dictionary using Variable instances as keys and Value instances as values. We need to specify both the Features and the Labels, so we’ll do it like this:

var trainingData = new Dictionary<Variable, Value>() { { featureVar, features }, { labelVar, labels } };

Training a single epoch is relatively straightforward at this point:

trainer.TrainMinibatch(trainingData, true, device);

However, a single epoch won’t get you very far. We’d like to monitor our model as it trains, and quit when it successfully classifies all inputs correctly. In the real world, of course, you probably don’t want that, as it will indicate over-fitting of the data.

int epochs = 0;
   trainer.TrainMinibatch(trainingData, true, device);
} while (trainer.PreviousMinibatchEvaluationAverage() > 0);
Console.WriteLine("Trained for {0} epochs.", epochs);
Console.WriteLine("CrossEntropyLoss: {0}", trainer.PreviousMinibatchLossAverage());
Console.WriteLine("EvaluationCriterion: {0}", trainer.PreviousMinibatchEvaluationAverage());

Depending on the random values the weights were initialized with, as well as your learning rate, it could take anywhere from a few dozens to a tens of thousands of epochs to fit our data. On my most recent test run with these values, it took 1,345 epochs.

Evaluating a trained model

Once your model is trained and you want to use it for evaluating data, it’s fairly straightforward. We create a dictionary of Variable/Value pairs to be evaluated, another dictionary of Variable/Value pairs to store the results in, and call Evaluate.

var inputs = new Dictionary<Variable, Value>() { { featureVar, features } };
var outputs = new Dictionary<Variable, Value>() { { func.Output, null } };
func.Evaluate(inputs, outputs, device);

As the output is placed in a Dictionary, what we actually want to do is get the value that was placed there. If you only have one output in your model (which is typical), you may do this to get to the actual data:

var outputValue = outputs[func.Output];
var results = outputValue.GetDenseData<float>(func.Output);

The return value of GetDenseData will be IList<IList<float>>. In this case, the features data we fed it was a list of eight numbers, but those features were applied to a variable with a shape of { 2 }, meaning four samples were processed in sequence. The results variable, then, has four lists in it, each of which contains two values (the shape of the output).

We can iterate and display them like this:

for (int i = 0; i < 4; i++)
   Console.Write("Expected: {0}{1}\t", rawLabels[i * 2], rawLabels[i * 2 + 1]);
   Console.Write(results[i][1] > results[i][0]);
   Console.WriteLine("\tRaw: {0:0.000} {1:0.000}", results[i][0], results[i][1]);

The Expected values are the labels we fed to the trainer above, and are one-hot (a single value of 1, the rest are 0s). The results values are all floating point, and the “chosen” output is the highest signal. Normally you’d grab the index of the maximum value, for our example we can just check if the second value (corresponding to true) is greater than the first (corresponding to false).

Here’s the output of a run on my machine:

Trained for 1345 epochs.
CrossEntropyLoss: 0.598247468471527
EvaluationCriterion: 0
Expected: 10 False   Raw:  0.153 -0.169
Expected: 01 True    Raw: -0.962 -0.559
Expected: 01 True    Raw: -0.172 -0.068
Expected: 10 False   Raw: -0.603 -0.603

As you can see, what matters here is not the actual values output, but merely which one has the greatest value.

Saving and loading models

If you’ve made it this far, you’re going to love how easy saving and loading a model is:

var newModel = Function.Load("myModel.dnn", device);

Easy, right?

Final Thoughts

Although much of the documentation for CNTK is incomplete, and most of the examples are for Python, it still presents a great opportunity to take advantage of GPU accelerated neural networks in your DotNet code. Once you get used to it, it’s straightforward and effective, and the recent API changes are a huge improvement over using the command line version of CNTK for training.

Now, I need to go update my current tests and projects to make use of the new API…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s