Practical: Deep Learning Models with Fashion-MNIST¶

Anastasia Giachanou, Tina Shahedi

Machine Learning with Python - Utrecht Summer School

In this practical, we'll focus on the Fashion-MNIST dataset, which is a collection of 60,000 grayscale images representing 10 different categories of fashion items like T-shirts, trousers, and shoes. This dataset is appropriate for understanding and implementing multiclass image classification using deep learning techniques. We'll use the Keras library, an API for neural networks which runs on top of Tensorflow (Google), and Theano.

We will also construct and train neural network models to accurately classify the fashion images and we will optimise the parameters.

Learning Goals:

  • Understand and implement a basic neural network using TensorFlow/Keras.
  • Learn how to preprocess and handle image data.
  • Explore model optimization through hyperparameter tuning (e.g. learning rate).
  • Evaluate and visualize model performance.

Let's get started¶

Let's start by installing TensorFlow using !pip install.

TensorFlow is an open-source machine learning library developed by Google. It provides tools to build and train machine learning models — especially deep learning models like neural networks.

Tensors are multi-dimensional arrays that generalize vectors and matrices. They can have any number of dimensions, which makes them suitable for representing diverse types of data — such as images, text, or audio. Tensors are the building blocks of data representation and computation in deep learning models.

They store:

  • Input data
  • Intermediate values during processing
  • Model parameters (weights and biases)
In [1]:
!pip install scikeras[tensorflow] > /dev/null 2>&1     # gpu compute platform
!pip install scikeras[tensorflow-cpu] > /dev/null 2>&1
!pip install scikeras > /dev/null 2>&1

!pip uninstall -y scikit-learn
!pip install scikit-learn==1.5.2
Found existing installation: scikit-learn 1.5.2
Uninstalling scikit-learn-1.5.2:
  Successfully uninstalled scikit-learn-1.5.2
Collecting scikit-learn==1.5.2
  Using cached scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Requirement already satisfied: numpy>=1.19.5 in /usr/local/lib/python3.11/dist-packages (from scikit-learn==1.5.2) (2.0.2)
Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn==1.5.2) (1.15.3)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn==1.5.2) (1.5.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn==1.5.2) (3.6.0)
Using cached scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
Installing collected packages: scikit-learn
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
umap-learn 0.5.9.post2 requires scikit-learn>=1.6, but you have scikit-learn 1.5.2 which is incompatible.
Successfully installed scikit-learn-1.5.2

We used >/dev/null 2>&1 to hide the output. Additionally, we can check the TensorFlow version we've installed.

As usual we will start with importing the required libraries and datasets.

In [2]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pprint as pp # for nicely formatting complex data structures
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, BatchNormalization, Dropout
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.optimizers import Adam, SGD
from sklearn.model_selection import RandomizedSearchCV

from scikeras.wrappers import KerasClassifier

scikeras.wrappers.KerasClassifier is a wrapper class that allows you to use Keras models inside scikit-learn tools like GridSearchCV, RandomizedSearchCV, or cross-validation. It makes a Keras model behave like a scikit-learn estimator.

In [3]:
# Set a random seed for reproducibility
np.random.seed(100)
tf.random.set_seed(221)

Let's load the dataset Fashion-MNIST first. Fashion-MNIST (https://www.tensorflow.org/datasets/catalog/fashion_mnist) is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

1. Load the Fashion-MNIST which is part of the keras datasets. First you need to call the fashion_mnist module from the package tensorflow.keras.datasets. Once you do that you can use the method load_data() to load the dataset.

The method load_data() will return a tuple that contains two tuples. The first tuple contains the training data and the second tuple contains the test data: test_images and test_labels. You can load the data into the tuple (sample_images, sample_labels), (test_images, test_labels)

In [4]:
# Load the dataset
fashion_mnist = tf.keras.datasets.fashion_mnist
(sample_images, sample_labels), (test_images, test_labels) = fashion_mnist.load_data()

Because neural networks need some time to run and to be optimised, we decided to randomly select a part of the training images and work with that. This is a common strategy when we have to develop code and we have a lot of data.

Note This is a strategy that you can use only for coding because the code will run faster and NOT for model selction or evaluation etc.

2. Now randomly select 30,000 train images and train labels (from the tuple you made before) and save them into the train_images and train_labels variables. First, you can create random indices that then you can use to sample the data. For the random selection you can use np.random.choice (https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html)

In [5]:
# Randomly choose 30,000 indices from the range of train_images length
# replace:  Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times. In this case we need without replacement
indices = np.random.choice(sample_images.shape[0], 30000, replace=False)

# Use these indices to sample images and labels
train_images = sample_images[indices]
train_labels = sample_labels[indices]

# Now sample_images and sample_labels contain your 30,000 samples
print("train_images shape:", train_images.shape)
print("train_labels shape:", train_labels.shape)
train_images shape: (30000, 28, 28)
train_labels shape: (30000,)

If we want to see how the data look like then we can print some of the initial images together with the labels.

3. Plot the first 9 images together with their labels. Note that you can use a for loop for that. To show the images, you can use the plt.imshow() inside the loop and set the parameter cmap='gray'. If you want to place the images in a 3x3 grid you can do it with plt.subplot(3, 3, i + 1) where i can be an iterator of a for loop. The subplot needs to run before the imshow

In [6]:
# As an example this will show just one image

# Display the first image and its label
plt.figure(figsize=(3, 3))
plt.imshow(train_images[0], cmap='gray')
plt.title(f'Label: {train_labels[0]}')
plt.axis('off')  # Hide axis ticks
plt.show()
In [7]:
# Display the first few images and their labels
plt.figure(figsize=(5, 5))
for i in range(9):
    plt.subplot(3, 3, i + 1)
    plt.imshow(train_images[i], cmap='gray')
    plt.title(f'Label: {train_labels[i]}')
    plt.axis('off')
plt.show()

The original pixel values in train_images and test_images are stored as integers ranging from 0 to 255.

With the following lines we will normalise the data. We divide all pixel values by 255.0, which rescales them from the range [0, 255] → [0.0, 1.0]. Normalization helps neural networks:

  • Converge faster during training
  • Avoid issues with large gradient values
  • Improve overall stability and performance

Let's normalise the data using the following lines.

In [8]:
X_train= train_images.astype('float32') / 255.0
X_test = test_images.astype('float32') / 255.0
  1. We will now convert the class labels (e.g., 0–9) into one-hot encoded vectors and split the data. Use the tf.keras.utils.to_categorical() function to encode the categorical labels (both the train and test labels) and then split your training data into train and validation sets. Select the first 20,000 observations as the new training set (this code X_train[:20000] will return the first 20,000 observations from the X_train) and the rest as the validation
In [9]:
# One-hot encode the labels, if train_labels[0] = 3, it becomes: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
y_train = tf.keras.utils.to_categorical(train_labels, num_classes=10)
y_test = tf.keras.utils.to_categorical(test_labels, num_classes=10)


# Split the training data into training and validation sets
X_train, X_val = X_train[:20000], X_train[20000:]
y_train, y_val = y_train[:20000], y_train[20000:]

We now finished with data preprocessing and preparation and we will move to the modeling part!

Build a model¶

Sequential neural network¶

In this section, we will build a simple neural network using the Sequential API from Keras. Our goal is to see how well a basic fully connected model (also known as a dense neural network) can perform on the Fashion-MNIST image classification task.

What is the Sequential API? The Sequential API in Keras allows you to create models layer by layer, where each layer has exactly one input and one output (https://www.tensorflow.org/guide/keras/sequential_model)

What if I need more flexibility? The functional API (https://www.tensorflow.org/guide/keras/functional) allows you to create models that have a lot more flexibility as you can define models where layers connect to more than just the previous and next layers. In this way, you can connect layers to (literally) any other layer.

Let's start with a basic example. The following code defines a Sequential neural network for classifying Fashion-MNIST images:

  • The input images are 28×28 grayscale pixels.
  • We use a Flatten layer to convert each image into a 784-element vector.
  • This is followed by two dense hidden layers:
    • The first has 256 neurons
    • The second has 128 neurons
    • Both use the ReLU activation function to introduce non-linearity.
  • Finally, we use a softmax output layer with 10 neurons (one for each clothing category).

This structure allows the model to learn increasingly abstract patterns from the image data and make predictions about which class each image belongs to.

In [10]:
model = Sequential([
    Flatten(input_shape=(28, 28)),  # Input layer to flatten the images
    Dense(256, activation='relu'),  # Hidden layer with considerable complexity
    Dense(128, activation='relu'),  # Subsequent hidden layer to further refine the learned features
    Dense(10, activation='softmax')  # Output layer with 10 units for each category
])
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)

5. Visualize the architecture of the neural network using keras.utils.plot_model function. The first parameter of the function is the model. Also you can use show_shapes=True to display shape information and dpi=66 to change the resolution

In [11]:
tf.keras.utils.plot_model(model, show_shapes=True, dpi=66)
Out[11]:

Here, we've built a sequential neural network model for Fashion-MNIST, consisting of flattened input images passed through dense layers with ReLU activation. The flatten layer transforms the multi-dimensional input into a flat vector of 784 elements, preparing it for the network's learning process. Following this, the model features two fully connected layers, with the first comprising 256 neurons and the second 128 neurons, both instrumental in identifying complex data patterns. The final layer uses softmax for classifying into the 10 fashion categories.

We will now compile the model with an optimizer, loss function, and metrics for training. For classification problems, you can use categorical_crossentropy. categorical_crossentropy is a loss function used to measure the difference between the model’s predicted class probabilities and the actual (true) class labels. It is specifically designed for multi-class classification problems where each input belongs to exactly one of multiple categories and labels are one-hot encoded

6. Compile (compile()) the neural network model using compile functions and using categorical_crossentropy' as the loss function (loss = 'categorical_crossentropy') and the optimiser to Adam (optimizer='adam'). Also set the metircs to accuracy (metrics=['accuracy']`)

Adam optimizer is a popular and efficient gradient descent method and is usually a good default for deep learning tasks

In [12]:
# model.compile(...) sets up the learning process before you start training with model.fit().
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

7.Use the summary function to get a summary of the model. How many parameters does every layer have? How did we end up with those numbers?

In [13]:
# Print the model summary
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten (Flatten)               │ (None, 784)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 256)            │       200,960 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 128)            │        32,896 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 10)             │         1,290 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 235,146 (918.54 KB)
 Trainable params: 235,146 (918.54 KB)
 Non-trainable params: 0 (0.00 B)

How did we end up with those parameters?

Here is how those numbers were estimated. For example, at the first dense layer, there are 784 inputs (from the flattened image) and 256 neurons in the first dense layer. Each neuron requires 784 weights and 1 bias. So 784×256(weights) + 256(biases) = 200,960 ​

So total parameters for each layer and the model were estimated as follows:

Layer Input Units Output Neurons Parameters
Flatten 28×28 784 0
Dense1 784 256 784×256 + 256 = 200,960
Dense2 256 128 256×128 + 128 = 32,896
Dense3 128 10 128×10 + 10 = 1,290
Total — — 235,146

From the summary, we can also see the number of parameters. For this simple model, we have more than 235,000 parameters. (by the way, LLMs have million or billion parameters - GPT-3 has 175 BILLION parameters)

Selecting the appropriate neural network architecture and loss function is necessary for compiling the model successfully. The following table categorizes common tasks depending on the respective output types, activation functions, loss functions, and metrics, guiding model development.

Task Output Type Last-layer Activation Loss Function Metric(s)
Regression Numerical Linear meanSquaredError (MSE), meanAbsoluteError (MAE) Same as loss
Binary Classification Binary Sigmoid binary_crossentropy Accuracy, precision, recall, sensitivity, TPR, FPR, ROC, AUC
Classification: Single Label, Multiple Classes Categorical Softmax categorical_crossentropy Accuracy, confusion matrix
Classification: Multiple Labels, Multiple Classes Categorical Sigmoid binary_crossentropy Accuracy, precision, recall, sensitivity, TPR, FPR, ROC, AUC

The task we work in this practical belongs to the Classification: Single Label, Multiple Classes category. For such tasks, models employ softmax activation and categorical crossentropy loss.

As you may have noticed, up to this point we haven't used the training set yet. In the next step, we will train the model; this involves fitting it to the training data for a specified number of epochs.

What is an Epoch? An epoch is one complete pass through the entire training dataset. When we train for 1 epoch, the model sees each training sample once and updates its internal parameters accordingly. Training for multiple epochs means the model gets multiple chances to learn from the same data

What is Batch Size? The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.

In short, the batch size is a number of samples processed before the model is updated and the number of epochs is the number of complete passes through the training dataset.

The size of a batch must be more than or equal to one and less than or equal to the number of samples in the training dataset.

8. Train your neural network model on the training data using the fit() function. The first 2 parameters are the input predictos (X_train) and target labels (y_train) of the training set. Set batch_size to 64 and epochs to 10. Set validation_data=(X_test, y_test)).

In [14]:
model_history = model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_val, y_val))
tf.keras.backend.clear_session()
Epoch 1/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 10s 18ms/step - accuracy: 0.7084 - loss: 0.8310 - val_accuracy: 0.8375 - val_loss: 0.4496
Epoch 2/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.8352 - loss: 0.4541 - val_accuracy: 0.8409 - val_loss: 0.4280
Epoch 3/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8555 - loss: 0.3920 - val_accuracy: 0.8576 - val_loss: 0.3927
Epoch 4/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8709 - loss: 0.3489 - val_accuracy: 0.8633 - val_loss: 0.3787
Epoch 5/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8796 - loss: 0.3237 - val_accuracy: 0.8659 - val_loss: 0.3787
Epoch 6/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8857 - loss: 0.2993 - val_accuracy: 0.8704 - val_loss: 0.3727
Epoch 7/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8938 - loss: 0.2784 - val_accuracy: 0.8699 - val_loss: 0.3672
Epoch 8/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9003 - loss: 0.2619 - val_accuracy: 0.8740 - val_loss: 0.3664
Epoch 9/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9053 - loss: 0.2470 - val_accuracy: 0.8682 - val_loss: 0.3866
Epoch 10/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9142 - loss: 0.2343 - val_accuracy: 0.8801 - val_loss: 0.3480

Each line represents one epoch (a full pass through the training data), and includes:

  • loss = training loss (categorical crossentropy)
  • accuracy = training accuracy
  • val_loss = loss on the validation set
  • val_accuracy = accuracy on the validation set

Some observations that we can do here:

  • Training accuracy improves steadily → the model is learning patterns in the data.
  • Validation accuracy also improves, but becomes stable around epoch 8–10.
  • Validation loss increases slightly at the end, which could indicate early signs of overfitting — the model is starting to memorize training data rather than generalize.

Note: Be aware that if you run the fit() function again, it continues with the weights already learned from the prior training round. To reset the model's state and begin training anew, call the clear_session() function from Keras' backend like this:

from keras.backend import clear_session
clear_session()

Or

tf.keras.backend.clear_session()

This will ensure that your model starts learning from scratch again.

9. Plot your model's training history to see its performance over epochs. Plot both 'accuracy' and 'loss' metrics for the training phase, comparing these measures across epochs. Use the following code snippets for plotting accuracy and loss (assuming that you named the model model_history):

# For plotting
model_history.history['accuracy']
# For plotting loss
model_history.history['loss']
In [15]:
plt.figure(figsize=(12, 5))

# Plotting model accuracy
plt.subplot(1, 2, 1)
plt.plot(model_history.history['accuracy'], label='Train Accuracy')
plt.plot(model_history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(loc='upper left')

# Plotting model loss
plt.subplot(1, 2, 2)
plt.plot(model_history.history['loss'], label='Train Loss')
plt.plot(model_history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(loc='upper left')

plt.show()

We will now create a function that can plot the accuracy and loss. In this case we can call this function when we want to plot the training and validation accuracy and loss

In [16]:
def plot_training_history(model_history):

    plt.figure(figsize=(12, 5))

    # Plotting accuracy
    plt.subplot(1, 2, 1)
    plt.plot(model_history.history['accuracy'], label='Training Accuracy')
    plt.plot(model_history.history['val_accuracy'], label='Validation Accuracy')
    plt.title('Model Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend(loc='lower right')

    # Plotting loss
    plt.subplot(1, 2, 2)
    plt.plot(model_history.history['loss'], label='Training Loss')
    plt.plot(model_history.history['val_loss'], label='Validation Loss')
    plt.title('Model Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend(loc='upper right')

    plt.show()

10. Evaluate the performance of the trained model. To get the performance on the training set you can the history of the model (model_history.history['accuracy'][-1]). The evaluate the model on the testing data using the model.evaluate() function. This function will return you the test loss and the test accuracy. Compare that with the accuracy on train.

In [17]:
# Evaluate the model on test dataset
print('Train accuracy:', model_history.history['accuracy'][-1])
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=False)
print('Test accuracy:', test_acc)
Train accuracy: 0.913349986076355
Test accuracy: 0.8676000237464905

You can see that the accuracy for the training set is 91% versus 86% on the test set. This drop is expected. However, we can also optimise some of the parameters to see if the performance will change.

Tune learning rate¶

As we mentioned there are two different ways to optimise some parameters. First we will use the train//val/test set up and then we will work again with the GridSearchCV. We do that so we can show you both ways; when you work on a project then it is better to choose one of those options and stick to it.

We will now focus on optimising the learning rate. The learning rate in a deep learning model is a hyperparameter that regulates how frequently the model's weights are changed during training.

11. Now we will find the optimal learning rate using the train/dev/test set. You can start with a creating a list of different learning rates. Then you can use a for loop to iterate the learning rate values. In the body of the loop, you can create your model, compile it (here you can set optimizer=Adam(learning_rate=learning_rate) and then fit it. You can try learning_rates = [0.001, 0.02, 0.1]. Also, you would need to store the histories so then it is able to compare the performance when using differnet learning rates

In [18]:
# Experiment with different learning rates to find the optimal one
learning_rates = [0.001, 0.02, 0.1]
model_histories = {}

for lr in learning_rates:
    print(f"Training model with learning rate: {lr}")

    model = Sequential([
              Flatten(input_shape=(28, 28)),
              Dense(256, activation='relu'),
              Dense(128, activation='relu'),
              Dense(10, activation='softmax')
    ])

    # Compile the model with a specified learning rate
    model.compile(optimizer=Adam(learning_rate=lr),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    # Train the model and save the history
    model_history = model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_val, y_val))

    # Store the history
    model_histories[lr] = model_history

    # Clear the TensorFlow backend to reset model state
    tf.keras.backend.clear_session()
Training model with learning rate: 0.001
Epoch 1/10
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
313/313 ━━━━━━━━━━━━━━━━━━━━ 5s 9ms/step - accuracy: 0.7163 - loss: 0.8057 - val_accuracy: 0.8311 - val_loss: 0.4566
Epoch 2/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8372 - loss: 0.4487 - val_accuracy: 0.8443 - val_loss: 0.4224
Epoch 3/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8562 - loss: 0.3878 - val_accuracy: 0.8633 - val_loss: 0.3839
Epoch 4/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8706 - loss: 0.3497 - val_accuracy: 0.8709 - val_loss: 0.3692
Epoch 5/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8787 - loss: 0.3224 - val_accuracy: 0.8752 - val_loss: 0.3617
Epoch 6/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.8884 - loss: 0.2974 - val_accuracy: 0.8779 - val_loss: 0.3574
Epoch 7/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8957 - loss: 0.2790 - val_accuracy: 0.8801 - val_loss: 0.3495
Epoch 8/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9028 - loss: 0.2607 - val_accuracy: 0.8800 - val_loss: 0.3512
Epoch 9/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9080 - loss: 0.2512 - val_accuracy: 0.8816 - val_loss: 0.3489
Epoch 10/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9128 - loss: 0.2335 - val_accuracy: 0.8836 - val_loss: 0.3452
Training model with learning rate: 0.02
Epoch 1/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6305 - loss: 1.6214 - val_accuracy: 0.8147 - val_loss: 0.5256
Epoch 2/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7970 - loss: 0.5665 - val_accuracy: 0.8309 - val_loss: 0.4992
Epoch 3/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8265 - loss: 0.4996 - val_accuracy: 0.8342 - val_loss: 0.4761
Epoch 4/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8309 - loss: 0.4728 - val_accuracy: 0.7984 - val_loss: 0.5818
Epoch 5/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8300 - loss: 0.4841 - val_accuracy: 0.8309 - val_loss: 0.4898
Epoch 6/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8419 - loss: 0.4455 - val_accuracy: 0.8406 - val_loss: 0.4813
Epoch 7/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8395 - loss: 0.4550 - val_accuracy: 0.8043 - val_loss: 0.5573
Epoch 8/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8370 - loss: 0.4590 - val_accuracy: 0.8380 - val_loss: 0.4970
Epoch 9/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8414 - loss: 0.4388 - val_accuracy: 0.8243 - val_loss: 0.5658
Epoch 10/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8421 - loss: 0.4436 - val_accuracy: 0.8414 - val_loss: 0.4809
Training model with learning rate: 0.1
Epoch 1/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.4187 - loss: 19.3952 - val_accuracy: 0.4133 - val_loss: 1.5945
Epoch 2/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.4656 - loss: 1.3649 - val_accuracy: 0.5738 - val_loss: 1.1986
Epoch 3/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.4474 - loss: 1.4342 - val_accuracy: 0.5404 - val_loss: 1.1808
Epoch 4/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.4621 - loss: 1.3402 - val_accuracy: 0.5282 - val_loss: 1.2398
Epoch 5/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.4259 - loss: 1.4996 - val_accuracy: 0.4551 - val_loss: 1.2825
Epoch 6/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4352 - loss: 1.3020 - val_accuracy: 0.4289 - val_loss: 1.3803
Epoch 7/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4289 - loss: 1.3770 - val_accuracy: 0.4509 - val_loss: 1.2908
Epoch 8/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4499 - loss: 1.2989 - val_accuracy: 0.4467 - val_loss: 1.2884
Epoch 9/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.4519 - loss: 1.2821 - val_accuracy: 0.4305 - val_loss: 1.3381
Epoch 10/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3949 - loss: 1.9914 - val_accuracy: 0.4139 - val_loss: 1.2921
In [18]:
 

12. Find the model with the best performance and print its validation accuracy. You can do that with a for-loop that will iterate over the model_histories

In [19]:
max = 0
for key, item in model_histories.items():
      print (key,model_histories[key].history['accuracy'][-1])
      if (model_histories[key].history['accuracy'][-1] > max):
        max = model_histories[key].history['accuracy'][-1]
        max_key = key

print ("Validation accuracy for the best model is: ", model_histories[max_key].history['accuracy'][-1])
0.001 0.9132500290870667
0.02 0.8453500270843506
0.1 0.3953999876976013
Validation accuracy for the best model is:  0.9132500290870667
  • learning_rate = 0.001: This is a safe default for Adam optimizer. Learning is gradual and stable. The model converges well and generalizes best (91% accuracy).
  • learning_rate = 0.02: This is faster, but may overshoot optimal values during training. Result: decent learning, but not optimal (85%).
  • learning_rate = 0.1: This is too high for most models. Causes the model to diverge or make erratic updates. Result: terrible accuracy (~19%) — model likely didn’t converge at all.

13. Plot the training and validation accuracy and loss for the model with the learning rate that achieved the best performance. Remember that you can use the function we created earlier, the plot_training_history() function

In [20]:
plot_training_history(model_histories[max_key])

At early epochs it is quite common to have a higher validation accuracy compared to the training accuracy. Eventually, training accuracy usually surpasses validation accuracy as the model learns and begins to overfit.

Batch_sizes (Optional Part)¶

If you want to practice more, you can also try to optimise batch sizes (remember that this is a parameter of the fit function.). The batch size is a hyperparameter that defines the number of samples (rows) to work through before updating the internal model parameters.

We suggest that you skip to question 16 where we are going to use the grid search cv to optimise multiple parameters before working on this.

14. We can do the same but now using different batch sizes during training. Try batch sizes of 64, 128 and 256

In [21]:
# Define different batch sizes to experiment with
batch_sizes = [64, 128, 256]
model_histories = {}

# Ensure clear separation in the output for each batch size experiment
print("\nStarting batch size experiments...\n" + "-"*50)

# Iterate over different batch sizes
for batch_size in batch_sizes:
    print(f"\nTraining model with batch size: {batch_size}")

    # Define the model architecture
    model = Sequential([
        Flatten(input_shape=(28, 28)),
        Dense(256, activation='relu'),
        Dense(128, activation='relu'),
        Dense(10, activation='softmax')
    ])

    # Compile the model
    model.compile(optimizer='Adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    # Fit the model
    model_history = model.fit(X_train, y_train,
                        batch_size=batch_size,
                        epochs=10,
                        validation_data=(X_val, y_val))

    # Store the history of each model training session

    # Store the history
    model_histories[batch_size] = model_history

    # Clear the TensorFlow backend to reset model state
    tf.keras.backend.clear_session()
Starting batch size experiments...
--------------------------------------------------

Training model with batch size: 64
Epoch 1/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7090 - loss: 0.8417 - val_accuracy: 0.8372 - val_loss: 0.4454
Epoch 2/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8377 - loss: 0.4497 - val_accuracy: 0.8459 - val_loss: 0.4117
Epoch 3/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8548 - loss: 0.3882 - val_accuracy: 0.8602 - val_loss: 0.3811
Epoch 4/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8697 - loss: 0.3507 - val_accuracy: 0.8693 - val_loss: 0.3705
Epoch 5/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8777 - loss: 0.3224 - val_accuracy: 0.8689 - val_loss: 0.3729
Epoch 6/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8891 - loss: 0.3007 - val_accuracy: 0.8758 - val_loss: 0.3633
Epoch 7/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8977 - loss: 0.2802 - val_accuracy: 0.8771 - val_loss: 0.3581
Epoch 8/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9022 - loss: 0.2663 - val_accuracy: 0.8785 - val_loss: 0.3524
Epoch 9/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9027 - loss: 0.2560 - val_accuracy: 0.8854 - val_loss: 0.3441
Epoch 10/10
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9107 - loss: 0.2368 - val_accuracy: 0.8849 - val_loss: 0.3502

Training model with batch size: 128
Epoch 1/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 4s 10ms/step - accuracy: 0.6681 - loss: 0.9627 - val_accuracy: 0.8347 - val_loss: 0.4811
Epoch 2/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8354 - loss: 0.4642 - val_accuracy: 0.8495 - val_loss: 0.4428
Epoch 3/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8549 - loss: 0.4044 - val_accuracy: 0.8517 - val_loss: 0.4180
Epoch 4/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8705 - loss: 0.3661 - val_accuracy: 0.8609 - val_loss: 0.3929
Epoch 5/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8790 - loss: 0.3386 - val_accuracy: 0.8696 - val_loss: 0.3704
Epoch 6/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8892 - loss: 0.3167 - val_accuracy: 0.8741 - val_loss: 0.3583
Epoch 7/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8937 - loss: 0.2953 - val_accuracy: 0.8738 - val_loss: 0.3563
Epoch 8/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9002 - loss: 0.2789 - val_accuracy: 0.8787 - val_loss: 0.3463
Epoch 9/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9033 - loss: 0.2657 - val_accuracy: 0.8801 - val_loss: 0.3489
Epoch 10/10
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9073 - loss: 0.2530 - val_accuracy: 0.8767 - val_loss: 0.3649

Training model with batch size: 256
Epoch 1/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 4s 22ms/step - accuracy: 0.6551 - loss: 1.0585 - val_accuracy: 0.8250 - val_loss: 0.5056
Epoch 2/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8269 - loss: 0.4931 - val_accuracy: 0.8492 - val_loss: 0.4396
Epoch 3/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8507 - loss: 0.4225 - val_accuracy: 0.8596 - val_loss: 0.4106
Epoch 4/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.8638 - loss: 0.3876 - val_accuracy: 0.8635 - val_loss: 0.3892
Epoch 5/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8707 - loss: 0.3626 - val_accuracy: 0.8678 - val_loss: 0.3786
Epoch 6/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.8796 - loss: 0.3366 - val_accuracy: 0.8727 - val_loss: 0.3647
Epoch 7/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8897 - loss: 0.3146 - val_accuracy: 0.8744 - val_loss: 0.3607
Epoch 8/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8926 - loss: 0.2985 - val_accuracy: 0.8743 - val_loss: 0.3561
Epoch 9/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8965 - loss: 0.2854 - val_accuracy: 0.8745 - val_loss: 0.3590
Epoch 10/10
79/79 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9010 - loss: 0.2737 - val_accuracy: 0.8746 - val_loss: 0.3602

15.Find the best batch size and plot the training and validation accuracy and loss

In [22]:
max = 0
for key, item in model_histories.items():
      print (key,model_histories[key].history['accuracy'][-1])
      if (model_histories[key].history['accuracy'][-1] > max):
        max = model_histories[key].history['accuracy'][-1]
        max_key = key

print ("Validation accuracy for the best model is: ", model_histories[max_key].history['accuracy'][-1])
64 0.9122999906539917
128 0.9097999930381775
256 0.9027000069618225
Validation accuracy for the best model is:  0.9122999906539917
In [23]:
plot_training_history(model_histories[max_key])

Hyperparameter Optimization¶

Fine-tuning hyperparameters is one of the main steps to do in deep learning. As you noticed there are several hyperparameters that we can optimise. Consider that want to try the following hyperparameters:

  • Dropout rates: 0.3, and 0.2.
  • Learning rates: 0.001, and 0.01.
  • Number of neurons: 256 and 128.

What is Dropout? Dropout is a regularization technique that randomly “drops” (i.e., disables) a fraction of neurons during training (0.3 → 30% of neurons are turned off). Dropout prevents overfitting by making the model less dependent on any one neuron and encourages the model to learn redundant and robust features

What is Learning Rate? It controls how big the steps the optimizer takes when updating the model’s weights.

What Are Neurons? Neurons are the units in a layer that process information (via weights and activation functions).

  • 256: large, captures more features
  • 128: smaller, more abstract features

Now we will fine-tune the hyperparameters of the neural network model (e.g., number of hidden layers, number of neurons per layer) to optimize performance on the validation set. Follow those steps to do it

16.Ceate a function that takes as input those different hyperprameters. In this function you can build the model and compile it as well.

Your function can start like:

def create_model(num_units, dropout_rate, learning_rate)
In [24]:
def create_model(num_units, dropout_rate, learning_rate):
    model = Sequential()
    model.add(Flatten(input_shape=(28, 28)))
    model.add(Dense(num_units, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_units, activation='relu'))  # Reusing num_units for simplicity
    model.add(Dropout(dropout_rate))
    model.add(Dense(10, activation='softmax'))

    model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

17. Create a grid with the different values of the hyperparameters

param_grid = {
    'num_units': [128, 256],               # Different neuron counts
    'dropout_rate': [0.2, 0.3],            # Diverse dropout rates
    'learning_rate': [0.001, 0.01]       # Several learning rates
}
In [25]:
param_grid = {
    'num_units': [128, 256],               # Different neuron counts
    'dropout_rate': [0.2, 0.3],            # Diverse dropout rates
    'learning_rate': [0.001, 0.01]        # Several learning rates
}

18. Then you create a KerasClassifier based on the defined model (model = KerasClassifier(model=create_model, ...)). You can set epochs to 15, batch_size to 64. Also, add the hypermarameters that you will tune, the value that you will put there can be the first one from the param_grid

In [26]:
# Parameter grid for grid search
# Hyperparameters to be tuned need to be added as arguments to KerasClassifier from scikeras (https://adriangb.com/scikeras/stable/migration.html#default-arguments-in-build-fn-model)
model = KerasClassifier(model=create_model,
                  epochs = 15,
                  batch_size=64,
                  num_units = 128,
                  dropout_rate = 0.2,
                  learning_rate = 0.01,
                  verbose=True)

19. Finally you perform the grid search using RandomizedSearchCV (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html), set n_iterations to 3 and fit on the training set.

Random search cross-validation is a technique that searches for the optimal hyperparameters of a model by evaluating the model's performance on random combinations of hyperparameter values. The idea is to define a set of hyperparameters and a range of values for each hyperparameter, and then randomly sample values from these ranges to create different combinations of hyperparameters. This process is repeated a specified number of times, and the best combination of hyperparameters that produces the best performance on a validation set is selected. The number of parameter settings that are tried is given by n_iter.

With those few values that we have in the prameters, we can also use GridSearchCV; The GridSearchCV will perform an exhaustive search over all the combinations of hyperparameters specified in the param_grid. It will select the best combination based on cross-validation performance.

However, it is very useful if you know about the RandomizedSearchCV because you may build a model that has many parameters to use and you want to try out multiple values per parameter. In this case we suggest RandomizedSearchCV.

In [27]:
grid = RandomizedSearchCV(
    estimator=model, # This is the model you want to tune.
    param_distributions=param_grid, # This is a dictionary of hyperparameters you want to search over
    n_iter=3,   # Number of random combinations
    cv=5, # Sets 5-fold cross-validation
    verbose=2, # Controls the level of log output during training.
    n_jobs=1  # Here is where n_jobs should be set, n_jobs=1: use only 1 core
)
In [28]:
grid_result = grid.fit(X_train, y_train)
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.6572 - loss: 0.9639
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8194 - loss: 0.5072
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8335 - loss: 0.4520
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8499 - loss: 0.4071
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8546 - loss: 0.3866
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8647 - loss: 0.3670
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8735 - loss: 0.3465
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8762 - loss: 0.3401
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8782 - loss: 0.3279
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8834 - loss: 0.3134
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8843 - loss: 0.3037
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8863 - loss: 0.2919
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8883 - loss: 0.2912
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8968 - loss: 0.2750
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8953 - loss: 0.2778
63/63 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step
[CV] END dropout_rate=0.2, learning_rate=0.01, num_units=256; total time=  14.5s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.6586 - loss: 0.9627
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8172 - loss: 0.5165
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8418 - loss: 0.4465
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8493 - loss: 0.4106
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8595 - loss: 0.3819
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8691 - loss: 0.3563
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8735 - loss: 0.3465
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8742 - loss: 0.3370
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8807 - loss: 0.3249
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8862 - loss: 0.3071
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8904 - loss: 0.2994
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8931 - loss: 0.2888
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8926 - loss: 0.2823
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8970 - loss: 0.2810
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9008 - loss: 0.2681
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
[CV] END dropout_rate=0.2, learning_rate=0.01, num_units=256; total time=  12.9s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.6531 - loss: 0.9726
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8182 - loss: 0.5144
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8390 - loss: 0.4467
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8510 - loss: 0.4064
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8585 - loss: 0.3813
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8686 - loss: 0.3620
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8722 - loss: 0.3429
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8757 - loss: 0.3266
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8871 - loss: 0.3088
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8819 - loss: 0.3155
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8879 - loss: 0.2962
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8920 - loss: 0.2835
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8911 - loss: 0.2848
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8954 - loss: 0.2718
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8980 - loss: 0.2689
63/63 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step
[CV] END dropout_rate=0.2, learning_rate=0.01, num_units=256; total time=  14.5s
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
Epoch 1/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.6586 - loss: 0.9534
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.8179 - loss: 0.5037
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8432 - loss: 0.4358
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8539 - loss: 0.3946
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8661 - loss: 0.3731
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8754 - loss: 0.3451
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8750 - loss: 0.3324
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8823 - loss: 0.3264
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8902 - loss: 0.3014
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8854 - loss: 0.3022
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8951 - loss: 0.2892
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8964 - loss: 0.2829
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8947 - loss: 0.2748
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9000 - loss: 0.2640
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9028 - loss: 0.2636
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
[CV] END dropout_rate=0.2, learning_rate=0.01, num_units=256; total time=  15.1s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.6570 - loss: 0.9741
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8128 - loss: 0.5160
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8371 - loss: 0.4448
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8524 - loss: 0.4022
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8622 - loss: 0.3734
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8704 - loss: 0.3543
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8766 - loss: 0.3381
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8771 - loss: 0.3303
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8827 - loss: 0.3116
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8856 - loss: 0.3047
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8899 - loss: 0.2944
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8919 - loss: 0.2867
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8957 - loss: 0.2775
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8948 - loss: 0.2715
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9042 - loss: 0.2616
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
[CV] END dropout_rate=0.2, learning_rate=0.01, num_units=256; total time=  14.2s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.6640 - loss: 0.9456
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8168 - loss: 0.5176
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8414 - loss: 0.4432
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8464 - loss: 0.4157
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8536 - loss: 0.3934
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8643 - loss: 0.3667
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8693 - loss: 0.3538
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8748 - loss: 0.3388
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8777 - loss: 0.3262
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8826 - loss: 0.3176
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8818 - loss: 0.3037
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8890 - loss: 0.2930
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8936 - loss: 0.2813
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8939 - loss: 0.2786
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8982 - loss: 0.2721
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
[CV] END dropout_rate=0.2, learning_rate=0.001, num_units=256; total time=  13.2s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.6549 - loss: 0.9676
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8157 - loss: 0.5099
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8390 - loss: 0.4440
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8475 - loss: 0.4089
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8623 - loss: 0.3809
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8689 - loss: 0.3605
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8747 - loss: 0.3460
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8768 - loss: 0.3379
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8853 - loss: 0.3119
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8887 - loss: 0.3064
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8932 - loss: 0.2930
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8971 - loss: 0.2837
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8986 - loss: 0.2752
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8954 - loss: 0.2712
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9043 - loss: 0.2610
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
[CV] END dropout_rate=0.2, learning_rate=0.001, num_units=256; total time=  14.0s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.6465 - loss: 0.9859
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8177 - loss: 0.5148
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8351 - loss: 0.4447
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8516 - loss: 0.4091
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8628 - loss: 0.3800
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8647 - loss: 0.3616
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8721 - loss: 0.3400
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8793 - loss: 0.3316
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8821 - loss: 0.3189
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8837 - loss: 0.3130
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8890 - loss: 0.2982
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8910 - loss: 0.2898
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8932 - loss: 0.2809
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8947 - loss: 0.2747
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9006 - loss: 0.2643
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
[CV] END dropout_rate=0.2, learning_rate=0.001, num_units=256; total time=  13.2s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.6585 - loss: 0.9721
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8236 - loss: 0.4986
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8449 - loss: 0.4343
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8542 - loss: 0.4050
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8613 - loss: 0.3724
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8674 - loss: 0.3530
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8747 - loss: 0.3392
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8801 - loss: 0.3211
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8815 - loss: 0.3121
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8880 - loss: 0.3004
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8899 - loss: 0.2880
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8919 - loss: 0.2824
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8979 - loss: 0.2726
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8996 - loss: 0.2619
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9022 - loss: 0.2541
63/63 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step
[CV] END dropout_rate=0.2, learning_rate=0.001, num_units=256; total time=  14.2s
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
Epoch 1/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.6527 - loss: 0.9671
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8161 - loss: 0.5171
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8361 - loss: 0.4436
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8488 - loss: 0.4069
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8603 - loss: 0.3786
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8617 - loss: 0.3590
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8720 - loss: 0.3443
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8816 - loss: 0.3195
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8824 - loss: 0.3140
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8856 - loss: 0.3086
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8912 - loss: 0.2951
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8967 - loss: 0.2802
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8982 - loss: 0.2726
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9028 - loss: 0.2622
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8971 - loss: 0.2707
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
[CV] END dropout_rate=0.2, learning_rate=0.001, num_units=256; total time=  14.7s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.5623 - loss: 1.2127
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7856 - loss: 0.6047
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8144 - loss: 0.5221
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8289 - loss: 0.4760
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8383 - loss: 0.4567
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8417 - loss: 0.4364
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8458 - loss: 0.4113
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8552 - loss: 0.3932
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8610 - loss: 0.3790
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8586 - loss: 0.3751
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8709 - loss: 0.3581
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8726 - loss: 0.3518
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8718 - loss: 0.3429
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8725 - loss: 0.3387
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8813 - loss: 0.3229
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
[CV] END dropout_rate=0.3, learning_rate=0.01, num_units=128; total time=  14.8s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.5811 - loss: 1.1942
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7838 - loss: 0.5980
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8203 - loss: 0.5193
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8311 - loss: 0.4713
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8347 - loss: 0.4432
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8489 - loss: 0.4220
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8517 - loss: 0.4107
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8557 - loss: 0.3974
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8660 - loss: 0.3717
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8675 - loss: 0.3719
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8677 - loss: 0.3586
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8709 - loss: 0.3419
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8744 - loss: 0.3393
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8785 - loss: 0.3327
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8785 - loss: 0.3251
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
[CV] END dropout_rate=0.3, learning_rate=0.01, num_units=128; total time=  13.4s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.5591 - loss: 1.2297
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7867 - loss: 0.6000
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8156 - loss: 0.5222
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8335 - loss: 0.4775
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8380 - loss: 0.4470
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8488 - loss: 0.4234
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8543 - loss: 0.4022
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8556 - loss: 0.3967
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8674 - loss: 0.3733
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8665 - loss: 0.3665
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8683 - loss: 0.3533
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8690 - loss: 0.3521
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8716 - loss: 0.3453
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8758 - loss: 0.3402
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8739 - loss: 0.3380
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
[CV] END dropout_rate=0.3, learning_rate=0.01, num_units=128; total time=  14.1s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.5845 - loss: 1.1694
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7909 - loss: 0.5883
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8236 - loss: 0.5006
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8327 - loss: 0.4652
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8423 - loss: 0.4336
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8497 - loss: 0.4132
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8599 - loss: 0.3915
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8592 - loss: 0.3851
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8645 - loss: 0.3702
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8703 - loss: 0.3589
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8687 - loss: 0.3536
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8748 - loss: 0.3449
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8710 - loss: 0.3423
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8767 - loss: 0.3326
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8827 - loss: 0.3216
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
[CV] END dropout_rate=0.3, learning_rate=0.01, num_units=128; total time=  17.7s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.5786 - loss: 1.1728
Epoch 2/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7861 - loss: 0.5920
Epoch 3/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8129 - loss: 0.5264
Epoch 4/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8281 - loss: 0.4717
Epoch 5/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8373 - loss: 0.4440
Epoch 6/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8390 - loss: 0.4259
Epoch 7/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8538 - loss: 0.4044
Epoch 8/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8576 - loss: 0.3841
Epoch 9/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8615 - loss: 0.3668
Epoch 10/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8654 - loss: 0.3632
Epoch 11/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8703 - loss: 0.3555
Epoch 12/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8733 - loss: 0.3449
Epoch 13/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8727 - loss: 0.3356
Epoch 14/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8767 - loss: 0.3289
Epoch 15/15
250/250 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8806 - loss: 0.3187
63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
[CV] END dropout_rate=0.3, learning_rate=0.01, num_units=128; total time=  16.8s
Epoch 1/15
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
313/313 ━━━━━━━━━━━━━━━━━━━━ 5s 9ms/step - accuracy: 0.6787 - loss: 0.9059
Epoch 2/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8254 - loss: 0.4849
Epoch 3/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8426 - loss: 0.4301
Epoch 4/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8566 - loss: 0.3944
Epoch 5/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8673 - loss: 0.3676
Epoch 6/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8674 - loss: 0.3511
Epoch 7/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8779 - loss: 0.3331
Epoch 8/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8816 - loss: 0.3216
Epoch 9/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8854 - loss: 0.3099
Epoch 10/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8857 - loss: 0.3019
Epoch 11/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8893 - loss: 0.2932
Epoch 12/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8937 - loss: 0.2808
Epoch 13/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8952 - loss: 0.2706
Epoch 14/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8972 - loss: 0.2721
Epoch 15/15
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8976 - loss: 0.2624

20. Print the optimal hyperparameters and the best score that was obtained with those parameters

In [29]:
print(grid_result.best_score_)
print(grid_result.best_params_)
0.873
{'num_units': 256, 'learning_rate': 0.01, 'dropout_rate': 0.2}
In [30]:
test_accuracy = grid.score(X_test, y_test)
test_accuracy
157/157 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step
Out[30]:
0.8679

Optional. Now go back to question 16 and try to add more parameters and check if the optimal parameters change.

End of the Practical