Practical 9: Movie review classification using Active learning¶

Tina Shahedi, Anastasia Giachanou

Machine Learning with Python - Utrecht Summer School

In this practical, we’ll work with Active Learning using the IMDB dataset, which has 50,000 movie reviews split into positive and negative sentiments. We’ll explore three strategies:

  1. Simple Evaluation Study: We'll use pool-based active learning with uncertainty sampling, where the model queries the most uncertain samples and retrains iteratively.

  2. Multi-annotator Pool-based Active Learning: This simulates multiple annotators with varying noise levels, using a SingleAnnotatorWrapper with probabilistic active learning. It highlights how multiple annotators impact model performance.

  3. Stream-based Active Learning: Here, we will implement a stream-based approach using StreamRandomSampling and StreamProbabilisticAL, ideal for real-time decision-making as data continuously flows in.

With the aim to cover a variety of applications and input data, in this practical we will work with text data. In a text mining application, our input features are the words in the text. Therefore the first steps in the practical have to do with how converting the text into a format that we can use as an input for the classification model.

Our task will be to classify movie reviews as positive or negative using their text.

Let's get started¶

We will use the scikit-activeml library. This library is built on scikit-learn. We'll show how it works by classifying IMDB reviews using the active learning cycle. Let's start by installing the library with pip install scikit-activeml and importing the needed packages from scikit-learn and scikit-activeml.

In [77]:
!pip install scikit-activeml > /dev/null 2>&1
!pip install numpy==1.24.4 scipy==1.10.1
In [78]:
import numpy as np
import matplotlib as mlp
import matplotlib.pyplot as plt
import pandas as pd
import re
import string
import skactiveml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score
from skactiveml.classifier import SklearnClassifier, ParzenWindowClassifier
from skactiveml.pool import UncertaintySampling, ProbabilisticAL, RandomSampling
from skactiveml.pool.multiannotator import SingleAnnotatorWrapper
from skactiveml.stream import StreamRandomSampling, StreamProbabilisticAL
from skactiveml.utils import unlabeled_indices, labeled_indices, MISSING_LABEL, majority_vote, call_func
from skactiveml.visualization import plot_utilities, plot_decision_boundary
from collections import deque
from scipy.ndimage import gaussian_filter1d
from sklearn.manifold import TSNE

Loading the IMDB Dataset¶

We'll be using the IMDB dataset, featuring 50,000 movie reviews from the Internet Movie Database, for our experiments. Now it is time to load the dataset

In [79]:
df = pd.read_csv("IMDB Dataset.csv")

When loading real-world datasets, you may encounter ParserError. This is usually due to loading a large CSV file into Python Pandas using the read_csv function. The solution is to use the engine='python' parameter in the read_csv function call to handle complex CSV structures, and the on_bad_lines parameter to skip problematic lines, like this:

# Load the IMDB dataset with proper handling for encoding and skipping bad lines
df = pd.read_csv("IMDB Dataset.csv", engine="python", on_bad_lines='skip')

Another solution is to load the data by mounting Google Drive which can help with issues that might lead to a ParserError.

In [80]:
#from google.colab import drive
#drive.mount('/content/drive', force_remount=True)

# Load the IMDB dataset
#df = pd.read_csv('/content/drive/My Drive/IMDB Dataset.csv')

Let's see the first lines of the dataframe

In [81]:
df.head()
Out[81]:
review sentiment
0 One of the other reviewers has mentioned that ... positive
1 A wonderful little production. <br /><br />The... positive
2 I thought this was a wonderful way to spend ti... positive
3 Basically there's a family where a little boy ... negative
4 Petter Mattei's "Love in the Time of Money" is... positive

When working with large datasets, starting with a smaller subset for initial testing helps us to develop and test our code faster. Here, we reduce the IMDB dataset to 10,000 samples using Pandas' sample method. Run the following code to sample part of the dataset.

In [82]:
# Reduce the dataset size for initial testing
df = df.sample(10000, random_state=42)

Pre-processing the Text Data¶

In this practical, we will work with text data. When we have text data, we need to do some pre-processing to bring it into a format that can be understandable by machines, so to convert it into numbers.

Also we have to do apply some pre-processing steps. Pre-processing steps include lowercase, punctuation removal, stemming and stop word removal.

For more information on how to work with text data, please refer to the A Beginner's Guide to Dealing with Text Data tutorial.

Text Preprocessing¶

At the beginning of this practical, we introduced two essential libraries for text preprocessing: re and string. The re library (https://docs.python.org/3/library/re.html) supports regular expressions for pattern matching, and the string library (https://docs.python.org/3/library/string.html) aloows us to do some operations with the data and to have access to some string related constants such as punctuation. The preprocess_text function, which we define below, converts text to lowercase, removes punctuation using re.sub(), and eliminates extra whitespace with re.sub().strip(). We will apply this function to each review in the dataset to clean the text.

In [83]:
# Preprocess the text data
def preprocess_text(text):
    text = text.lower()  # Lowercase text
    text = re.sub(f'[{re.escape(string.punctuation)}]', '', text)  # Remove punctuation
    text = re.sub(r'\s+', ' ', text).strip()  # Remove extra whitespace
    return text

df['review'] = df['review'].apply(preprocess_text)

df.describe()
Out[83]:
review sentiment
count 10000 10000
unique 9978 2
top you would probably get something like this im ... positive
freq 2 5039

Next, we convert the sentiment labels to binary values, where 'positive' is mapped to 1 and 'negative' to 0.

In [84]:
# Convert labels to binary
df['sentiment'] = df['sentiment'].map({'positive': 1, 'negative': 0})

Let's split data into training and test sets with an 80/20 ratio using train_test_split. This results in X_train and y_train for training, and X_test and y_test for testing, ensuring that the model is trained and evaluated on separate data.

In [85]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(df['review'], df['sentiment'], test_size=0.2, random_state=42)

TF-IDF and Vectorization¶

Once the text data is preprocessed, it needs to be converted into a numerical format that machine learning algorithms can work with. This process is known as vectorization. One of the most common methods for vectorization is the TF-IDF (Term Frequency-Inverse Document Frequency) approach.

TF-IDF is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). The TF-IDF value is large when a term appears many times in a document and few times in the collection of documents.

Term Frequency (TF): Measures how frequently a term appears in a document. It is calculated by dividing the number of times a term appears in a document by the total number of terms in that document.

$$ \text{TF}(t,d) =\frac{\text{Number of times term } t \text{ appears in document} d}{\text{Total number of terms in document } d} $$

Inverse Document Frequency (IDF): Measures how important a term is given a collection of documents. It is calculated by taking the logarithm of the number of documents in the corpus divided by the number of documents containing the term.

$$ \text{IDF}(t) = \log \left( \frac{\text{Total number of documents}}{\text{Number of documents containing term } t} \right) $$

TF-IDF Score: The TF-IDF score is the product of the TF and IDF scores. It reflects the importance of a term in a document within the corpus.

Formula: $$ \text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t) $$

1. To apply TF-IDF, create an instance of TfidfVectorizer() with max_features=5000 which you can name vectorizer. This means that we will only consider the 5,000 most frequent terms. Use fit_transform() on the training data to learn the vocabulary and convert the text into TF-IDF vectors. Then, apply transform() on the test data to vectorize it using the same vocabulary.

In [86]:
vectorizer = TfidfVectorizer(max_features=5000)
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)

Initialize the active learning¶

In this practical, we will use the logisticRegression classifier.

2. Create the Logistic Regression model. We will use the implementation from SklearnClassifier of skactiveml since it can handle missing labels.

In [87]:
clf = SklearnClassifier(LogisticRegression(max_iter=1000))

Our goal is to classify the observations into two classes. To do so, we introduce a vector y_train_initial to store the labels that we acquire from the oracle (y_train). The vector y_train_initial is unlabeled at the beginning.

3. Create the y_train_initial. This is an array of the same shape as y_train, and is filled with MISSING_LABEL, using np.full. Then, randomly select 10 indices from the training data (y_train) and assign the true labels to these initially selected indices in y_train_initial.

Start with defining y_train_initial which is an array of the same shape as y_train, and is filled with MISSING_LABEL, using np.full. Finally, randomly select 10 indices from the training data and assign the true labels to these initially selected indices in y_train_initial.

In [88]:
# Initialize Training Labels
y_train_initial = np.full(y_train.shape, fill_value=MISSING_LABEL)
print(np.isnan(y_train).sum())
print(np.isnan(y_train_initial).sum())
initial_idx = np.random.choice(np.arange(len(y_train)), size=10, replace=False)
y_train_initial[initial_idx] = y_train.iloc[initial_idx]

print(np.isnan(y_train_initial).sum())
0
8000
7990

Set Up the Query Strategy¶

4. Now set up the query strategy (qs), using UncertaintySampling with entropy method, and random_state=42

As a method we use 'entropy', which measures the uncertainty of the model's predictions by calculating the entropy of the predicted class probabilities.

In [89]:
qs = UncertaintySampling(method='entropy', random_state=42)

You can explore additional classifiers and query strategies available in scikit-learn and the skactiveml library for more options. Detailed information on other classifiers can be found here and all implemented strategies are listed here.

Pool-based Active Learning - Simple Evaluation Study¶

Now, we will implement 10 iterations of the Active Learning (AL) cycle. In each iteration, we will select 10 unlabeled samples (batch_size = 10) to be labeled using uncertainty sampling. Since we set up our uncertainty metric as entropy, the method will select the top 10 instances with the highest entropy for labeling.

5. Implement the 10 iterations. In each iteration:

  • Fit the logistic regression model on the training data with the missing labels
  • Run qs.query to determine the selected samples by their indices (query_idx)
  • Assign their labels from y_train to the missing labels in y_train_initial.
  • Make the predictions on the test set
  • evaluate the classifier's performance on the test set after each iteration of the AL cycle.
In [90]:
n_iterations = 10

for i in range(n_iterations):
    clf.fit(X_train_vect.toarray(), y_train_initial)
    query_idx = qs.query(X=X_train_vect.toarray(), y=y_train_initial, clf=clf, batch_size=10)
    y_train_initial[query_idx] = y_train.iloc[query_idx]
    # print(np.isnan(y_train_initial).sum())
    # Evaluate the classifier on the test set
    y_pred = clf.predict(X_test_vect.toarray())
    acc = accuracy_score(y_test, y_pred)
    print(f'Simple Evaluation Iteration {i + 1}/{n_iterations}, Accuracy: {acc:.4f}')
Simple Evaluation Iteration 1/10, Accuracy: 0.5485
Simple Evaluation Iteration 2/10, Accuracy: 0.4995
Simple Evaluation Iteration 3/10, Accuracy: 0.5010
Simple Evaluation Iteration 4/10, Accuracy: 0.5035
Simple Evaluation Iteration 5/10, Accuracy: 0.5290
Simple Evaluation Iteration 6/10, Accuracy: 0.6035
Simple Evaluation Iteration 7/10, Accuracy: 0.5825
Simple Evaluation Iteration 8/10, Accuracy: 0.6540
Simple Evaluation Iteration 9/10, Accuracy: 0.6615
Simple Evaluation Iteration 10/10, Accuracy: 0.6405

6. For comparison, you can train the classifier on the fully labeled training set and evaluate its accuracy when there are no missing labels.

In [91]:
# Final evaluation
clf.fit(X_train_vect.toarray(), y_train)
y_pred = clf.predict(X_test_vect.toarray())
final_acc = accuracy_score(y_test, y_pred)
print(f'Simple Evaluation Final accuracy: {final_acc:.4f}')
Simple Evaluation Final accuracy: 0.8700

Multi-annotator Pool-based Active Learning¶

Suppose we have 5 annotators to label the samples. The annotators have different accuracies for labeling the samples (they will do different errors).

In this part, we simulate multi-annotator active learning. We start by initializing multiple annotators with varying noise levels and generating noisy labels.

We will use the following code to initialize multiple annotators with different noise levels and generate noisy labels. Here are the steps we will take:

  • We start by defining a variable for the number of annotators (n_annotators) and set it to 5.
  • Create an array y_annot with dimensions (number of training samples, number of annotators) and fill it with zeros to store annotator labels.
  • Initialize a random number generator rng with a fixed seed (e.g., 0) for reproducibility.
  • Then, we generate noise levels, linearly spaced between 0.0 and 0.3, with a total of n_annotators values. Each annotator will have a different noise level. (np.linspace(0.0, 0.3, num=n_annotators)).
  • Next, we generate a matrix of the same shape as y_annot using a binomial distribution and the rng generator (rng.binomial) . Each value is either 0 or 1, representing whether the label is flipped or not, according to the respective annotator's noise level.
  • Then we apply noise to the true labels. For this you will need the XOR operation (^) which flips the true label based on the noise matrix value (1 flips the label, 0 keeps it unchanged).
In [92]:
# Number of annotators
n_annotators = 5

# Generate noisy labels for each annotator
y_annot = np.zeros(shape=(X_train_vect.shape[0], n_annotators), dtype=int)
rng = np.random.default_rng(seed=0)

# Noise levels
noise_levels = np.linspace(0.0, 0.3, num=n_annotators)

# Generate noise for all annotators simultaneously
y_noise_matrix = rng.binomial(1, noise_levels[:, np.newaxis], size=(n_annotators, X_train_vect.shape[0])).T

# Apply noise to the true labels
y_annot = y_noise_matrix ^ y_train.values[:, np.newaxis]

# Initialize training labels with missing values
y = np.full(shape=(X_train_vect.shape[0], n_annotators), fill_value=MISSING_LABEL)

We want to label these samples using a ParzenWindowClassifier. We query the samples using uncertainty sampling, and the annotators at random using the SingleAnnotWrapper.

7. Creare a clf object with ParzenWindowClassifier and set metric="rbf". Then pass the ParzenWindowClassifier as an argument to the single annotator query strategy ProbabilisticAL. Then pass the single annotator query strategy as an argument to the wrapper, also specifying the number of annotators.

In [93]:
# Create the classifier
clf = ParzenWindowClassifier(classes=np.unique(y_train.values), metric="rbf", random_state=0)

# Set up the query strategy
sa_qs = ProbabilisticAL(random_state=0, prior=0.001)
ma_qs = SingleAnnotatorWrapper(sa_qs, random_state=0)

8. Perform one iteration of the active learning cycle. In this iteration, query 50 unlabeled samples to be labeled by 3 annotators. Assign their labels to the initially missing labels in y. After updating the labels, retrain the classifier on the updated training data, and evaluate its performance on the test set.

In [94]:
# Function to be able to index via an array of indices
idx = lambda A: (A[:, 0], A[:, 1])

# Randomly select an initial set of labeled samples
initial_idx = np.random.choice(np.arange(len(y_train)), size=50, replace=False)
for i in initial_idx:
    y[i, :] = y_train.iloc[i]

print(np.isnan(y).sum())
39750
In [95]:
# Initial fit of the classifier
clf.fit(X_train_vect.toarray(), majority_vote(y))

# Perform one active learning cycle
print("Cycle 1/1")

# Query indices for labeling
query_params_dict = {"clf": clf}
query_idx = ma_qs.query(X_train_vect.toarray(), y, batch_size=100, n_annotators_per_sample=3, clf=clf)

# Update labels
y[idx(query_idx)] = y_annot[idx(query_idx)]
#print(np.isnan(y).sum())

# Retrain classifier
clf.fit(X_train_vect.toarray(), majority_vote(y, random_state=0))

# Evaluate the classifier on the test set
y_pred = clf.predict(X_test_vect.toarray())
acc = accuracy_score(y_test, y_pred)
print(f'Multi-annotator Iteration 1/1, Accuracy: {acc:.4f}')
Cycle 1/1
Multi-annotator Iteration 1/1, Accuracy: 0.5005

Practice: Implement more iterations of the active learning cycle. Use the above code as a reference to perform 10 iterations, similar to how it was done in the pool-based active learnings ection.

Stream-based Active Learning¶

In this part, we will show how stream-based active learning strategies are used and compared them to one another. For this purpose we will follow the next four steps:

  1. Set Up Query Strategies
  2. Initialize Classifier and Training Data
  3. Create Stream-based Active Learning Loop
  4. Calculate and Track Accuracy

We will divide each step into substeps for better clarity and ease of implementation. So let's start!

Set Up Query Strategies¶

9. Now, it's time to set up query strategies i.e., StreamRandomSampling, and StreamProbabilisticAL for our stream-based active learning, for this purpose you need follow up the steps in bellow:

  1. Define the length of the data stream to be 1000 samples
  2. Initialize the query strategies with a fixed random_state=0, and set the training_size to 200 and fit_clf to False (decide if X and y are needed and should be used). Then store the accuracy results for each query strategy by using accuracies = {}.
In [105]:
stream_length = 1000
X_stream = X_train_vect.toarray()[:stream_length]
y_stream = y_train.values[:stream_length]
In [106]:
query_strategies = {
    'StreamRandomSampling': StreamRandomSampling(random_state=0),
    'StreamProbabilisticAL': StreamProbabilisticAL(random_state=0)
}

training_size = 200
fit_clf = False
accuracies = {}

Initialize Classifier and Training Data¶

10. For each query strategy:

  1. create a ParzenWindowClassifier with unique classes from y_train.values. Set up X_train_stream and y_train_stream deques with a maximum length of training_size and initialize them with the first 10 samples from X_stream and y_stream.
  2. Fit the classifier with this initial data.
In [107]:
for query_strategy_name, query_strategy in query_strategies.items():
    clf = ParzenWindowClassifier(classes=np.unique(y_train.values), random_state=0)

    # Initialize the training data
    X_train_stream = deque(maxlen=training_size)
    X_train_stream.extend(X_stream[:10])

    y_train_stream = deque(maxlen=training_size)
    y_train_stream.extend(y_stream[:10])


    # Fit the classifier with this initial data.
    clf.fit(X_train_stream, y_train_stream)

Initial Buffering: The purpose of initializing the deque with the first 1000 instances is to start the training process with a small, initial set of labeled data. This helps in quickly building a model that can begin learning from the early data points.

Sliding Window Mechanism: With maxlen=training_size, the deque acts as a sliding window. As new data instances arrive and are added to the deque, the oldest data instances are automatically removed once the buffer reaches its maximum size. In this way the model is trained on the most recent data without growing the buffer indefinitely.

Create Stream-based Active Learning Loop¶

11. Create Stream-based Active Learning Loop by folowing steps:

  1. To keep track of the number of queried samples and to track the accuracy of its predictions set up:
    correct_classifications = []
    count = 0
    
  2. Now start a loop from the 10th sample to the end of X_stream(since the first 10 samples were used to initialize the classifier).
  3. Reshape the current sample (X_stream[t]) to be a 2D array with one sample.
  4. Refit the classifier with the current training data. Use clf.predict for predicting the label for the current sample (X_cand), and compare it to the true label (y_cand). Then, use correct_classifications.append to append the result (True if correct, False if incorrect).
  5. Update the query strategy with the selected samples (sampled_indices) and their associated utilities. Use the call_func function facilitates this process.For this purpose Start by defining the parameters you want to pass to the query method. These parameters include the candidates (X_cand), the classifier (clf), and the flags return_utilities and fit_clf.
  6. Create a dictionary budget_manager_param_dict to hold the utilities information.
  7. Use call_func to dynamically call the update method on query_strategy, by passing the parameters which you've defined earlier.
  8. Add the number of newly queried samples to the count variable by
    count += len(sampled_indices)
    
  9. Update the training data by adding the current sample and its label. If the sample was queried, add its true label; otherwise, add a missing label.
In [108]:
    correct_classifications = []
    count = 0
    for t in range(10, len(X_stream)): #`t` is the index of the current sample in the stream
        # Reshape the current sample for compatibility with the classifier's predict method, which expects a 2D array
        X_cand = X_stream[t].reshape(1, -1)
        y_cand = y_stream[t]

        # Refit the classifier and predict the current sample's label
        clf.fit(X_train_stream, y_train_stream)
        correct_classifications.append(clf.predict(X_cand)[0] == y_cand)

        # Update the query strategy with the selected samples
        sampled_indices, utilities = call_func(query_strategy.query, candidates=X_cand, clf=clf, return_utilities=True, fit_clf=fit_clf)

        # Create a dictionary budget_manager_param_dict
        budget_manager_param_dict = {"utilities": utilities}

        # Dynamically call the update method on `query_strategy`
        call_func(query_strategy.update, candidates=X_cand, queried_indices=sampled_indices, budget_manager_param_dict=budget_manager_param_dict)

        # Track the number of queried samples
        count += len(sampled_indices)

        # Update the training data with new samples and labels
        X_train_stream.append(X_stream[t]), y_train_stream.append(y_cand if len(sampled_indices) > 0 else clf.missing_label)

Calculate and Track Accuracy¶

We need to measure how well the classifier is performing overall.

12. Use np.mean(correct_classifications), to calculate the average accuracy. This average accuracy, along with the correct_classifications list, should store in the accuracies dictionary for each query strategy. It will allow you to keep track of how each strategy performed.

In [109]:
# Calculate and print the average accuracy for each query strategy
avg_accuracy = np.mean(correct_classifications)
accuracies[query_strategy_name] = correct_classifications

Now Let's run the code from Beginning to End

In [110]:
# Stream-based learning setup
stream_length = 1000
X_stream = X_train_vect.toarray()[:stream_length]
y_stream = y_train.values[:stream_length]

# Set up query strategies
query_strategies = {
    'StreamRandomSampling': StreamRandomSampling(random_state=0),
    'StreamProbabilisticAL': StreamProbabilisticAL(random_state=0)
}

training_size = 200
fit_clf = False
accuracies = {}

for query_strategy_name, query_strategy in query_strategies.items():
    clf = ParzenWindowClassifier(classes=np.unique(y_train.values), random_state=0)

    # Initialize the training data
    X_train_stream = deque(maxlen=training_size)
    y_train_stream = deque(maxlen=training_size)

    # Initialize with the first 10 samples
    X_train_stream.extend(X_stream[:10])
    y_train_stream.extend(y_stream[:10])

    clf.fit(X_train_stream, y_train_stream)
    correct_classifications = []
    count = 0
    for t in range(10, len(X_stream)):
        # Reshape the current sample for compatibility
        X_cand = X_stream[t].reshape(1, -1)
        y_cand = y_stream[t]

        # Refit the classifier and predict the current sample's label
        clf.fit(X_train_stream, y_train_stream)
        correct_classifications.append(clf.predict(X_cand)[0] == y_cand)

        # Query the classifier
        sampled_indices, utilities = call_func(query_strategy.query, candidates=X_cand, clf=clf, return_utilities=True, fit_clf=fit_clf)
        budget_manager_param_dict = {"utilities": utilities}
        call_func(query_strategy.update, candidates=X_cand, queried_indices=sampled_indices, budget_manager_param_dict=budget_manager_param_dict)

        # Update the training data with new samples and labels
        X_train_stream.append(X_stream[t])
        y_train_stream.append(y_cand if len(sampled_indices) > 0 else clf.missing_label)

        # Track the number of queried samples
        count += len(sampled_indices)

    # Calculate and print the average accuracy for each query strategy
    avg_accuracy = np.mean(correct_classifications)
    print(f"Query Strategy: {query_strategy_name}, Avg Accuracy: {avg_accuracy:.4f}, Acquisition count: {count}")
    accuracies[query_strategy_name] = correct_classifications
Query Strategy: StreamRandomSampling, Avg Accuracy: 0.4939, Acquisition count: 107
Query Strategy: StreamProbabilisticAL, Avg Accuracy: 0.4848, Acquisition count: 102

The acquisition count tells you how many samples were selected for labeling during the active learning process.

This count shows how many times each strategy asked for more information to improve the model.

13. Let's plot the accuracy over time for each query strategy, using a Gaussian filter.

In [111]:
for query_strategy_name, correct_classifications in accuracies.items():
    plt.plot(gaussian_filter1d(np.array(correct_classifications, dtype=float), 50), label=query_strategy_name)
plt.legend();
plt.xlabel('Iteration')
plt.ylabel('Accuracy')
plt.title('Accuracy over time for different query strategies')
plt.show()

14. Repeat the stream-based active learning process by adding the other strategies (e.g., FixedUncertainty, VariableUncertainty, Split, StreamDensityBasedAL, CognitiveDualQueryStrategyRan, CognitiveDualQueryStrategyFixUn, CognitiveDualQueryStrategyRanVarUn, CognitiveDualQueryStrategyVarUn, PeriodicSampling), then compare your results. Make sure to import them from skactiveml.stream beforehand!

End of Practical!