DeepLearning4J: Getting Started

7 min readFeb 12, 2021

Deep Learning for Java is a library that provides support for many of the algorithms associated with deep learning, including neural networks 🙂. Let us review the primary tasks for creating, training, and running a neural network using DeepLearning4J.

Scenario

Let us work with an example that we reviewed before: a neural network to recognize the XOR operator. In a previous story, we implemented this from scratch. Now, let us use DeepLearning4J. As shown in Figure 1, this neural network has two inputs, a hidden layer with three neurons and an output layer with one neuron. The input data used to train the neural network, and the corresponding known outputs are also shown.

Figure 1. Our neural network for calculating the XOR operator.

Prerequisites

To run DeepLearning4J in our project, we need the following dependencies (libraries):

deeplearning4j-core, which contains the neural network implementations.
nd4j-native-platform, a versatile library to handle n-dimensional arrays
datavec-api — an auxiliary library for vectorizing and loading data.

Be sure to include these in your project, and we are ready to go.

Loading Data

First, we need to load our input data and the corresponding known outputs. We will need them to train the neural network. For this, we use INDArray, an n-dimensional array interface. INDArray is different from standard Java arrays because it uses off-heap memory to store data. Off-heap means that the memory is allocated outside of the Java Virtual Machine, i.e., it is not managed by the garbage collector. Moreover, these memory locations can be passed (via pointers) to the underlying C++ code for use in the Nd4J library’s operations. Further, due to its different backends, Nd4j even enables us to use both CPUs and GPUs.

For our small example, this could seem unnecessary but consider cases where the data is vast. We will need two arrays, one for the input values and one for the known outputs. Two of the most commonly used methods of creating arrays are zeros() and ones(). Yes, the method’s name is telling us the value to be used as the initial value for the elements in the array. And, the shape of the array is specified with integer parameters. For example, to create a zero-filled array with four rows and 2columns, we use:

INDArray input = Nd4j.zeros(4, 2);

The default data type of the INDArray created is float. In a similar way to create an array to store our known outputs, we can use:

INDArray knownOutput = Nd4j.zeros(4,1);

Notice that it is a matrix of the size number of samples x number of features. Even with only a single example, it will need this shape.

The next step, it is filling information with the values as shown in Figure 1, tables at the left and the right side. For our example, a simple approach is just to insert the values into the array. We can use the method putScalar() which uses two parameters: the first one is an array of integers with the coordinates for the new value and the second parameter is the floating-point value to store into the array. Our input data could be loaded as follows:

input.putScalar(new int[]{0, 0}, 0);
input.putScalar(new int[]{0, 1}, 0);
input.putScalar(new int[]{1, 0}, 0);
input.putScalar(new int[]{1, 1}, 1);
input.putScalar(new int[]{2, 0}, 1);
input.putScalar(new int[]{2, 1}, 0);
input.putScalar(new int[]{3, 0}, 1);
input.putScalar(new int[]{3, 1}, 1);

And, in the same way, our known output values can be loaded as follows:

knownOutput.putScalar(new int[]{0}, 0);
knownOutput.putScalar(new int[]{1}, 1);
knownOutput.putScalar(new int[]{2}, 1);
knownOutput.putScalar(new int[]{3}, 0);

Nd4j library provides an excellent set of functions to work with arrays. I highly recommend you take a look at the Nd4j linear algebra API.

Finally, we need to put together in a DataSet object the inputs and the known outputs. DataSet objects are containers for the input data and the known output (labels). The DataSet constructor receives two parameters as follows:

DataSet dataSet = new DataSet(input, knownOutput);

Training a Model

DeepLearning4J allows us to create neural networks by using a Fluent-Builder pattern. Let me show you the code to implement the neural network in Figure 1, and then we can check the details.

MultiLayerConfiguration cfg = new NeuralNetConfiguration.Builder()
  .weightInit(WeightInit.UNIFORM)
  .list()
  .layer(0,new DenseLayer.Builder()
    .activation(Activation.SIGMOID)
    .nIn(2)
    .nOut(3)
    .build())
  .layer(1,new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
    .activation(Activation.SIGMOID)
    .nIn(3)
    .nOut(1)
    .build())
  .build();MultiLayerNetwork network = new MultiLayerNetwork(cfg);

The two fundamental classes are MultiLayerConfiguration and MultiLayerNetwork. All the details about the neural network, its layers, inputs, outputs, activation function, and so on are defined in a MultiLayerConfiguration object. Then, that object is used to create the network using MultiLayerNetwork. Let us review MultiLayerConfiguration.

The MultiLayerConfiguration goes like this: (1) high-level neural network configuration and (2) layers configuration. To create a MultiLayerConfiguration for a neural network, we use the class NeuralNetConfiguration.Builder

For the high-level neural network configuration:

weightInit() defines the weight initialization scheme. In our example, we will use a uniform distribution as described in Glorot and Bengio 2010.
list()create a ListBuilder which will store the configuration for each of our layers.
layer() creates a new layer; the first parameter is the index of the position where the layer needs to be added; the second parameter is the type of layer we need to add to the network.

A dense layer is a neural network layer that is deeply connected, which means each neuron in the dense layer receives input from all neurons of its previous layer. The output layer is the last layer of neurons that will produce the final outputs. Besides the dense layer and output layer that we will use in our example, several other layer types exist, including, among others, GravesLSTM, ConvolutionLayer, RBM, and EmbeddingLayer. Using those layers, we can define (in a similar way) simple neural networks, recurrent neural networks, and convolutional networks. Notice that for the output layer, we specify the error function that we want to use to evaluate the outputs. This error function is also called the loss function. We are using Mean Square Error in our example.

For layers, we configure the following:

activation() defines the activation function to be applied by the neurons. In our example, we are using the sigmoid function.
nIn() specifies the number of inputs coming from the previous layer. In the first layer, it represents the input it is going to take from the input layer.
nOut() specifies the number of outputs that the layer will send to the next layer. For the output layer, it represents the number of known output(s)

We are creating a neural network with two layers (we only count the layers with neurons) — a hidden layer that is a dense layer and an output layer that uses MSE for error calculation. The hidden layer has two inputs and generates three outputs. Since it is a dense layer, all inputs are connected with all neurons, as shown in Figure 1. The output layer has three inputs and generates only one output. The neurons in both layers use the sigmoid function as their activation function.

Once we create our MultiLayerNetwork object, we call the init() method to initialize the object. Additionally, we can configure the learning rate for all layers in the network to a specified value.

network.init();
network.setLearningRate(0.7);

We can print the configuration of our network using the method summary() as follows:

System.out.println(network.summary());

For our network, the method will print something like this:

=================================================================
LayerName (LayerType)   nIn,nOut   TotalParams   ParamsShape     
=================================================================
layer0 (DenseLayer)     2,3        9             W:{2,3}, b:{1,3}
layer1 (OutputLayer)    3,1        4             W:{3,1}, b:{1,1}
-----------------------------------------------------------------
            Total Parameters:  13
        Trainable Parameters:  13
           Frozen Parameters:  0
=================================================================

The same information that we have in Figure 1, but as a text table. Notice that we have nine weights (one per input per neuron) and four neurons, each with its bias value. Therefore, 13 parameters to be trained in our neural network.

Training the Model

Our neural network is complete! Time to train it. We do the training calling the method fit() as follows:

for( int i=0; i < 10000; i++ ) {
    network.fit(dataSet);
}

Take a look at my story Neural Networks Demystified for a description of what happens when we run fit(). Our method fit() is the approximate equivalent to the Lines 10 to 24 in Figure 13 in that story.

Once the network is trained, we can evaluate it.

Testing the Model

For evaluation, DeepLearning4J provides the class Evaluation. The method eval() compares the known outputs (labels) with the output generated from the model. And, the method stats() reports the classification statistics. The code is as follows:

INDArray output = network.output(input);
Evaluation eval = new Evaluation();
eval.eval(knownOutput, output);System.out.println(eval.stats());

The report includes:

Confusion matrix entries — in our example, there are two true positives (TP) and two true negatives (TN) predicted.
Accuracy — the measure of all the correctly identified cases.
Precision — the measure of the correctly identified positive cases from all the predicted positive cases.
F1-score — the harmonic mean of precision and recall.

The printed result is as follows:

========================Evaluation Metrics========================
 # of classes:    2
 Accuracy:        1.0000
 Precision:       1.0000
 Recall:          1.0000
 F1 Score:        1.0000
Precision, recall & F1: reported for positive class (class 1 - "1") only=========================Confusion Matrix=========================
 0 1
-----
 2 0 | 0 = 0
 0 2 | 1 = 1Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================

We have created a perfect neural network for the XOR operator in approximately 50 lines of code 🙂. You can download the source code for the described example from my GitHub repository. It is just a vanilla example. But, it opens the door to work on more exciting projects. For instance, what about image recognition.

Notice that now, we can worry about the data itself instead of in their storage or handling. Also, we can easily play with multiple hidden layers or activation functions. Further, we could use GPUs to improve the performance of our training process (some configuration is needed, but it is available). However, that is another story.

Thanks for reading. Feel free to leave your feedback and reviews below.