5_Deep_Learning_1_[Practical]

Anastasia Giachanou, Tina Shahedi

Machine Learning with Python - Utrecht Summer School

In this practical, we'll focus on the Fashion-MNIST dataset, which is a collection of 60,000 grayscale images representing 10 different categories of fashion items like T-shirts, trousers, and shoes. This dataset is appropriate for understanding and implementing multiclass image classification using deep learning techniques. We'll use the Keras library, an API for neural networks which runs on top of Tensorflow (Google), and Theano.

We will also construct and train neural network models to accurately classify the fashion images and we will optimise the parameters.

Learning Goals:

Understand and implement a basic neural network using TensorFlow/Keras.
Learn how to preprocess and handle image data.
Explore model optimization through hyperparameter tuning (e.g. learning rate).
Evaluate and visualize model performance.

TensorFlow is an open-source machine learning library developed by Google. It provides tools to build and train machine learning models — especially deep learning models like neural networks.

Tensors are multi-dimensional arrays that generalize vectors and matrices. They can have any number of dimensions, which makes them suitable for representing diverse types of data — such as images, text, or audio. Tensors are the building blocks of data representation and computation in deep learning models.

They store:

Input data
Intermediate values during processing
Model parameters (weights and biases)

Let's start by installing TensorFlow using !pip install.

First run the follwoing lines of code to install the libraries. Also we are using an older version of scikit-learn due to some updates that the library made

In [ ]:

!pip install scikeras[tensorflow] > /dev/null 2>&1     # gpu compute platform
!pip install scikeras[tensorflow-cpu] > /dev/null 2>&1
!pip install scikeras > /dev/null 2>&1

!pip uninstall -y scikit-learn
!pip install scikit-learn==1.5.2

We used >/dev/null 2>&1 to hide the output.

As usual we will start with importing the required libraries and datasets.

In [ ]:

from scikeras.wrappers import KerasClassifier
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pprint as pp # for nicely formatting complex data structures
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, BatchNormalization, Dropout
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.optimizers import Adam, SGD
from sklearn.model_selection import RandomizedSearchCV

scikeras.wrappers.KerasClassifier is a wrapper class that allows you to use Keras models inside scikit-learn tools like GridSearchCV, RandomizedSearchCV, or cross-validation. It makes a Keras model behave like a scikit-learn estimator.

In [ ]:

# Set a random seed for reproducibility
np.random.seed(100)
tf.random.set_seed(221)

Let's load the dataset Fashion-MNIST first. Fashion-MNIST (https://www.tensorflow.org/datasets/catalog/fashion_mnist) is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

1. Load the Fashion-MNIST which is part of the keras datasets. First you need to call the fashion_mnist module from the package tensorflow.keras.datasets. Once you do that you can use the method load_data() to load the dataset.

The method load_data() will return a tuple that contains two tuples. The first tuple contains the training data and the second tuple contains the test data: test_images and test_labels. You can load the data into the tuple (sample_images, sample_labels), (test_images, test_labels)

Because neural networks need some time to run and be optimised, we decided to randomly select a part of the training images and work with that.

Note This is a strategy that you can use only for coding because the code will run faster and NOT for model selction or evaluation etc.

2. Now randomly select 30,000 train images and train labels (from the tuple you made before) and save them into the train_images and train_labels variables. First, you can create random indices that then you can use to sample the data. For the random selection you can use np.random.choice. (https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html)

We are now going to reshape the image data and convert them into a pandas DataFrame. Run the following lines to reshape the training data and to load them into a dataframe:

If we want to see how the data look like then we can print some of the initial images together with the labels.

3. Plot the first 9 images together with their labels. Note that you can use a for loop for that. To show the images, you can use the plt.imshow() inside the loop and set the parameter cmap='gray'. If you want to place the images in a 3x3 grid you can do it with plt.subplot(3, 3, i + 1) where i can be an iterator of a for loop. The subplot needs to run before the imshow

In [ ]:

# As an example this will show just one image

# Display the first image and its label
plt.figure(figsize=(3, 3))
plt.imshow(train_images[0], cmap='gray')
plt.title(f'Label: {train_labels[0]}')
plt.axis('off')  # Hide axis ticks
plt.show()

The original pixel values in train_images and test_images are stored as integers ranging from 0 to 255.

With the following lines we will normalise the data. We divide all pixel values by 255.0, which rescales them from the range [0, 255] → [0.0, 1.0]. Normalization helps neural networks:

Converge faster during training
Avoid issues with large gradient values
Improve overall stability and performance

Let's normalise the data using the following lines.

In [ ]:

X_train= train_images.astype('float32') / 255.0
X_test = test_images.astype('float32') / 255.0

We will now convert the class labels (e.g., 0–9) into one-hot encoded vectors and split the data. Use the tf.keras.utils.to_categorical() function to encode the categorical labels (both the train and test labels) and then split your training data into train and validation sets. Select the first 20,000 observations as the new training set (this code X_train[:20000] will return the first 20,000 observations from the X_train) and the rest as the validation

We now finished with data preprocessing and preparation and we will move to the modeling part!

Sequential neural network¶

In this section, we will build a simple neural network using the Sequential API from Keras. Our goal is to see how well a basic fully connected model (also known as a dense neural network) can perform on the Fashion-MNIST image classification task.

What is the Sequential API? The Sequential API in Keras allows you to create models layer by layer, where each layer has exactly one input and one output (https://www.tensorflow.org/guide/keras/sequential_model)

What if I need more flexibility? The functional API (https://www.tensorflow.org/guide/keras/functional) allows you to create models that have a lot more flexibility as you can define models where layers connect to more than just the previous and next layers. In this way, you can connect layers to (literally) any other layer.

Let's start with a basic example. The following code defines a Sequential neural network for classifying Fashion-MNIST images:

The input images are 28×28 grayscale pixels.
We use a Flatten layer to convert each image into a 784-element vector.
This is followed by two dense hidden layers:
- The first has 256 neurons
- The second has 128 neurons
- Both use the ReLU activation function to introduce non-linearity.
Finally, we use a softmax output layer with 10 neurons (one for each clothing category).

This structure allows the model to learn increasingly abstract patterns from the image data and make predictions about which class each image belongs to.

In [ ]:

model = Sequential([
    Flatten(input_shape=(28, 28)),  # Input layer to flatten the images
    Dense(256, activation='relu'),  # Hidden layer with considerable complexity
    Dense(128, activation='relu'),  # Subsequent hidden layer to further refine the learned features
    Dense(10, activation='softmax')  # Output layer with 10 units for each category
])

5. Visualize the architecture of the neural network using keras.utils.plot_model function. The first parameter of the function is the model. Also you can use show_shapes=True to display shape information and dpi=66 to change the resolution

Here, we've built a sequential neural network model for Fashion-MNIST, consisting of flattened input images passed through dense layers with ReLU activation. The flatten layer transforms the multi-dimensional input into a flat vector of 784 elements, preparing it for the network's learning process. Following this, the model features two fully connected layers, with the first comprising 256 neurons and the second 128 neurons, both instrumental in identifying complex data patterns. The final layer uses softmax for classifying into the 10 fashion categories.

We will now compile the model with an optimizer, loss function, and metrics for training. For classification problems, you can use categorical_crossentropy. categorical_crossentropy is a loss function used to measure the difference between the model’s predicted class probabilities and the actual (true) class labels. It is specifically designed for multi-class classification problems where each input belongs to exactly one of multiple categories and labels are one-hot encoded.

6. Compile (compile()) the neural network model using compile functions and using categorical_crossentropy' as the loss function (loss = 'categorical_crossentropy') and the optimiser to Adam (optimizer='adam'). Also set the metircs to accuracy (metrics=['accuracy']`)

Adam optimizer is a popular and efficient gradient descent method and is usually a good default for deep learning tasks

7.Use the summary function to get a summary of the model. How many parameters does every layer have? How did we end up with those numbers?

From the summary, we can also see the number of parameters.

Selecting the appropriate neural network architecture and loss function is necessary for compiling the model successfully. The following table categorizes common tasks depending on the respective output types, activation functions, loss functions, and metrics, guiding model development.

Task	Output Type	Last-layer Activation	Loss Function	Metric(s)
Regression	Numerical	Linear	meanSquaredError (MSE), meanAbsoluteError (MAE)	Same as loss
Binary Classification	Binary	Sigmoid	binary_crossentropy	Accuracy, precision, recall, sensitivity, TPR, FPR, ROC, AUC
Classification: Single Label, Multiple Classes	Categorical	Softmax	categorical_crossentropy	Accuracy, confusion matrix
Classification: Multiple Labels, Multiple Classes	Categorical	Sigmoid	binary_crossentropy	Accuracy, precision, recall, sensitivity, TPR, FPR, ROC, AUC

The task we work in this practical belongs to the Classification: Single Label, Multiple Classes category. For such tasks, models employ softmax activation and categorical crossentropy loss.

As you may have noticed, up to this point we haven't used the training set yet. In the next step, we will train the model; this involves fitting it to the training data for a specified number of epochs.

What is an Epoch? An epoch is one complete pass through the entire training dataset. When we train for 1 epoch, the model sees each training sample once and updates its internal parameters accordingly. Training for multiple epochs means the model gets multiple chances to learn from the same data

What is Batch Size? The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.

In short, the batch size is a number of samples processed before the model is updated and the number of epochs is the number of complete passes through the training dataset.

The size of a batch must be more than or equal to one and less than or equal to the number of samples in the training dataset.

8. Train your neural network model on the training data using the fit() function. The first 2 parameters are the input predictos (X_train) and target labels (y_train) of the training set. Set batch_size to 64 and epochs to 10. Set validation_data=(X_test, y_test)).

Note: Be aware that if you run the fit() function again, it continues with the weights already learned from the prior training round. To reset the model's state and begin training a new, call the clear_session() function from Keras' backend like this:

from keras.backend import clear_session
clear_session()

tf.keras.backend.clear_session()

This will ensure that your model starts learning from scratch again.

9. Plot your model's training history to see its performance over epochs. Plot both 'accuracy' and 'loss' metrics for the training phase, comparing these measures across epochs. Use the following code snippets for plotting accuracy and loss (assuming that you named the model model_history):

# For plotting
model_history.history['accuracy']

# For plotting loss
model_history.history['loss']

We will now create a function that can plot the accuracy and loss. In this case we can call this function when we want to plot the training and validation accuracy and loss

In [ ]:

def plot_training_history(model_history):

    plt.figure(figsize=(12, 5))

    # Plotting accuracy
    plt.subplot(1, 2, 1)
    plt.plot(model_history.history['accuracy'], label='Training Accuracy')
    plt.plot(model_history.history['val_accuracy'], label='Validation Accuracy')
    plt.title('Model Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend(loc='lower right')

    # Plotting loss
    plt.subplot(1, 2, 2)
    plt.plot(model_history.history['loss'], label='Training Loss')
    plt.plot(model_history.history['val_loss'], label='Validation Loss')
    plt.title('Model Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend(loc='upper right')

    plt.show()

10. Evaluate the performance of the trained model. To get the performance on the training set you can the history of the model (model_history.history['accuracy'][-1]). The evaluate the model on the testing data using the model.evaluate() function. This function will return you the test loss and the test accuracy. Compare that with the accuracy on train.

Tune learning rate¶

As we mentioned there are two different ways to optimise some parameters. First we will use the train//val/test set up and then we will work again with cross validation. We do that so we can show you both ways; when you work on a project then it is better to choose one of those options and stick to it.

We will now focus on optimising the learning rate. The learning rate in a deep learning model is a hyperparameter that regulates how frequently the model's weights are changed during training.

11. Now we will find the optimal learning rate using the train/dev/test set. You can start with a creating a list of different learning rates. Then you can use a for loop to iterate the learning rate values. In the body of the loop, you can create your model, compile it (here you can set optimizer=Adam(learning_rate=learning_rate) and then fit it. You can try learning_rates = [0.001, 0.02, 0.1]. Also, you would need to store the histories so then it is able to compare the performance when using differnet learning rates

12. Find the model with the best performance and print its validation accuracy. You can do that with a for-loop that will iterate over the model_histories

13. Plot the training and validation accuracy and loss for the model with the learning rate that achieved the best performance. Remember that you can use the function we created earlier, the plot_training_history() function

Batch_sizes (Optional Part)¶

If you want to practice more, you can also try to optimise batch sizes (remember that this is a parameter of the fit function.). The batch size is a hyperparameter that defines the number of samples (rows) to work through before updating the internal model parameters.

We suggest that for now you skip to question 16 where we are going to use the grid search cv to optimise multiple parameters before working on this.

14. We can do the same but now using different batch sizes during training. Try batch sizes of 64, 128 and 256

15.Find the best batch size and plot the training and validation accuracy and loss

Hyperparameter Optimization¶

Fine-tuning hyperparameters is one of the main steps to do in deep learning. As you noticed there are several hyperparameters that we can optimise. Consider that want to try the following hyperparameters:

Dropout rates: 0.3, and 0.2.
Learning rates: 0.001, and 0.01.
Number of neurons: 256 and 128.

What is Dropout? Dropout is a regularization technique that randomly “drops” (i.e., disables) a fraction of neurons during training (0.3 → 30% of neurons are turned off). Dropout prevents overfitting by making the model less dependent on any one neuron and encourages the model to learn redundant and robust features

What is Learning Rate? It controls how big the steps the optimizer takes when updating the model’s weights.

What Are Neurons? Neurons are the units in a layer that process information (via weights and activation functions).

256: large, captures more features
128: smaller, more abstract features

Now we will fine-tune the hyperparameters of the neural network model (e.g., number of hidden layers, number of neurons per layer) to optimize performance on the validation set. Follow those steps to do it

16.ceate a function that takes as input those different hyperprameters. In this function you can build the model and compile it as well.

Your function can start like:

def create_model(num_units, dropout_rate, learning_rate)

17. Create a grid with the different values of the hyperparameters

param_grid = {
    'num_units': [128, 256],               # Different neuron counts
    'dropout_rate': [0.2, 0.3],            # Diverse dropout rates
    'learning_rate': [0.001, 0.01]        # Several learning rates
}

18. Then you create a KerasClassifier based on the defined model (model = KerasClassifier(model=create_model, ...)). You can set epochs to 15, batch_size to 64. Also, add the hypermarameters that you will tune, the value that you will put there can be the first one from the param_grid

19. Finally you perform the grid search using RandomizedSearchCV (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html), set n_iterations to 3 and fit on the training set.

Random search cross-validation is a technique that searches for the optimal hyperparameters of a model by evaluating the model's performance on random combinations of hyperparameter values. The idea is to define a set of hyperparameters and a range of values for each hyperparameter, and then randomly sample values from these ranges to create different combinations of hyperparameters. This process is repeated a specified number of times, and the best combination of hyperparameters that produces the best performance on a validation set is selected. The number of parameter settings that are tried is given by n_iter.

With those few values that we have in the prameters, we can also use GridSearchCV; The GridSearchCV will perform an exhaustive search over all the combinations of hyperparameters specified in the param_grid. It will select the best combination based on cross-validation performance.

However, it is very useful if you know about the RandomizedSearchCV because you may build a model that has many parameters to use and you want to try out multiple values per parameter. In this case we suggest RandomizedSearchCV.

Practical: Deep Learning Models with Fashion-MNIST¶

Let's get started¶

Build a model¶

Sequential neural network¶

Tune learning rate¶

Batch_sizes (Optional Part)¶

Hyperparameter Optimization¶