Hyperparameter Tuning

Hyperparameter tuning is a critical process in machine learning where adjustments are made to fine-tune the performance of a model. Think of it as tweaking the settings on a radio to get the best reception. Just like turning the knobs on the radio, hyperparameters are like the settings of the model that we set before training begins. These settings can greatly affect how well the model learns and performs.

What are hyperparameters?

Hyperparameters are settings that are not learned from the data itself during training. They are predetermined before training begins and remain constant throughout the process. Some examples include:

Learning rate: Controls how quickly the model learns from the data.
Number of epochs: Dictates how many times the model will see the entire dataset during training.
Batch size: Determines how many data points are used in each iteration of training.
Model architecture specifics: For example, the number of layers and units in a neural network, or parameters like the number of estimators and maximum tree depth in a random forest.

Hyperparameter Tuning Process

Define Hyperparameters to Tune: Identify which hyperparameters are most influential for your model and need adjustment.
Choose a Tuning Strategy:
- Grid Search: Evaluates all possible combinations of hyperparameter values specified in a grid. It’s exhaustive but can be slow.
- Random Search: Randomly selects combinations of hyperparameter values to evaluate, which is faster but less thorough.
- Bayesian Optimisation: Uses past results to guide the search for optimal hyperparameters, usually more efficient than grid or random search.
Evaluation: Use a validation set or cross-validation to assess model performance with different hyperparameter combinations and prevent overfitting.
Select the Best Combination: Choose the combination of hyperparameters that performs best on the validation set.
Test the Model: Evaluate the model with the chosen hyperparameters on a separate test set to ensure it generalises well to new data.

Example Code for Hyperparameter Tuning

Below is an example Python code demonstrating both random search and grid search optimisation using a RandomForestClassifier on the iris dataset:

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import time


def random_search_optimisation(X_train, X_test, y_train, y_test):
    """
    Perform random search optimisation on a random forest classifier

    :param X_train: features in training set
    :param X_test: features in test set
    :param y_train: target in training set
    :param y_test: target in test set
    """
    model = RandomForestClassifier()

    param_distributions = {
        'n_estimators': [10, 50, 100],
        'max_depth': [10, 20, 30],
        'max_features': ['sqrt', 'log2']
    }

    random_search = RandomizedSearchCV(
        estimator=model,
        param_distributions=param_distributions,
        cv=10,
        scoring='accuracy',
        random_state=42)

    random_search.fit(X_train, y_train)

    best_model = random_search.best_estimator_
    predictions = best_model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)

    print('-' * 100)
    print('Random search results')
    print("Best params found: ", random_search.best_params_)
    print("Best accuracy found: ", random_search.best_score_)
    print("Accuracy on test set: ", accuracy)
    print('-' * 100)


def grid_search_optimisation(X_train, X_test, y_train, y_test):
    """
    Perform grid search optimisation on a random forest classifier

    :param X_train: features in training set
    :param X_test: features in test set
    :param y_train: target in training set
    :param y_test: target in test set
    """
    model = RandomForestClassifier()

    param_grid = {
        'n_estimators': [10, 50, 100],
        'max_depth': [10, 20, 30]
    }

    grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=10, scoring='accuracy')
    grid_search.fit(X_train, y_train)

    best_model = grid_search.best_estimator_
    predictions = best_model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)

    print('=' * 100)
    print('Grid search results')
    print("Best params found: ", grid_search.best_params_)
    print("Best accuracy found: ", grid_search.best_score_)
    print("Accuracy on test set: ", accuracy)
    print('=' * 100)


"""
Load iris dataset and split into training and test sets
"""
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
start_time = time.time()
grid_search_optimisation(X_train, X_test, y_train, y_test)
print(f"Grid search took: {time.time() - start_time} seconds")
start_time = time.time()
random_search_optimisation(X_train, X_test, y_train, y_test)
print(f"Random search took: {time.time() - start_time} seconds")