brain_age_prediction.utils package
Submodules
brain_age_prediction.utils.chek_model_type module
Function to check if the ‘model_type’ used in many funcions has an accepted value. In particular, it should be a string that can only be: ‘structural’, ‘functional’ or ‘joint’.
- brain_age_prediction.utils.chek_model_type.check_model_type(model_type)
Check if the specified ‘model_type’ is valid.
- Parameters:
model_type (str) – String indicating the model type.
- Raises:
AssertionError – If ‘model_type’ is not one of ‘structural’, ‘functional’, or ‘joint’.
This function ensures that the provided ‘model_type’ is one of the allowed values: ‘structural’, ‘functional’, or ‘joint’. If ‘model_type’ is not valid, an AssertionError is raised with a descriptive error message.
brain_age_prediction.utils.custom_models module
functions needed to create NN models
- brain_age_prediction.utils.custom_models.create_functional_model(dropout, hidden_neurons, hidden_layers)
Create and compile a functional model.
- Parameters:
dropout (float) – The dropout rate for regularization.
hidden_neurons (int) – The number of neurons in each hidden layer.
hidden_layers (int) – The number of hidden layers.
- Returns:
The compiled model with Mean Absolute Error (MAE) as the loss function and Adam optimizer with a learning rate of 0.001.
- Return type:
keras.models.Sequential
This function constructs a sequential neural network model with dropout regularization, batch normalization, and specified hidden layers and neurons. The output layer has a linear activation function, and the model is compiled using the Mean Absolute Error (MAE) loss function and the Adam optimizer.
- brain_age_prediction.utils.custom_models.create_joint_model(dropout, hidden_neurons, hidden_layers, model_selection=False)
Create and compile a joint model that combines structural and functional features.
- Parameters:
dropout (float) – The dropout rate for regularization.
hidden_neurons (int) – The number of neurons in each hidden layer.
hidden_layers (int) – The number of hidden layers.
model_selection (bool) – If True, the model is created for model selection purposes.
- Returns:
The compiled joint model with Mean Absolute Error (MAE) as the loss function and Adam optimizer with a learning rate of 0.01.
- Return type:
keras.models.Model
Create the joint model. It consists of two branches which are basically the structural and functional model, with hyperparameters equal to the ones individually selected during model selection. These two branches are joined using a concatenate layer. After the concatenate layer, add a number of hidden layers equal to ‘hidden_layers’, each one with a number of units equal to ‘hidden_units’. A dropout equal to ‘dropout’ is also applied, and a batch normalisation.
The input ‘model_selection’ assumes categorical values, and it indicates if the created model is to be used for model selection purposes. This needs to be specified because scikit learn wrappers employed to do model selection don’t support multi input models. So in this case a workaround was needed: the firs layer is a single layer which takes the concatenated structural and functional features, then this layer is split through Lambda layers. At this point the structure is the same as described before.
The model is compiled using the Mean Absolute Error (MAE) loss function and the Adam optimizer with a learning rate of 0.01.
- brain_age_prediction.utils.custom_models.create_structural_model(dropout, hidden_neurons, hidden_layers)
Creates and compiles the structural model.
- Parameters:
dropout (float) – The dropout rate for regularization.
hidden_neurons (int) – The number of neurons in each hidden layer.
hidden_layers (int) – The number of hidden layers.
- Returns:
The compiled model with Mean Absolute Error (MAE) as the loss function and Adam optimizer with a learning rate of 0.001.
- Return type:
keras.models.Sequential
This function constructs a sequential neural network model with dropout regularization, batch normalization, and specified hidden layers and neurons. The output layer has a linear activation function, and the model is compiled using the Mean Absolute Error (MAE) loss function and the Adam optimizer.
- brain_age_prediction.utils.custom_models.load_best_hyperparams()
Return the best hyperparameters found for both the structural and functional models.
- Returns:
Tuple containing the best hyperparameters for the structural and functional models.
- Return type:
tuple
This function loads and returns the best hyperparameters found for both the structural and functional models. The hyperparameters are stored in separate pickle files.
- brain_age_prediction.utils.custom_models.load_model(model_type)
Load a saved Keras model and compile it.
- Parameters:
model_type (str) – The type of model to load (‘structural’, ‘functional’, or ‘joint’).
- Returns:
The compiled Keras model.
- Return type:
keras.models.Model
This function loads a saved Keras model and its weights based on the provided ‘model_type’ (either ‘structural’, ‘functional’, or ‘joint’). The model is then compiled using the Mean Absolute Error (MAE) loss function and the Adam optimizer with a learning rate of 0.001.
Note: Make sure that the saved model files are present in the specified paths.
brain_age_prediction.utils.loading_data module
functions useful to load the datasets and preprocess data
- brain_age_prediction.utils.loading_data.load_dataset(dataset_name)
Load the dataset as pandas dataframes and return two different dataframes: - One for the TD group - One for the ASD group
- Parameters:
dataset_name (str) – The name of the dataset.
- Returns:
Two pandas dataframes for the TD and ASD groups, respectively.
- Return type:
tuple
This function loads a dataset from a CSV file into a pandas dataframe. It then separates the dataframe into two based on the diagnostic group (TD or ASD). The resulting dataframes are returned as a tuple.
- brain_age_prediction.utils.loading_data.load_train_test(split=0.3, seed=7)
Load both the structural and functional datasets. Apply preprocessing to input features. Split the data into train and test according to the “split” variable.
- Parameters:
split (float) – The ratio of the dataset to include in the test split.
random_state (int) – Seed for the random state of the train_test_split function (for reproducibility).
- Returns:
Tuple containing training and test sets for structural and functional data.
- Return type:
tuple
This function loads both structural and functional datasets, preprocesses the input features using the preprocessing function, and then splits the data into training and testing sets.
- brain_age_prediction.utils.loading_data.preprocessing(df)
Takes in input a pandas dataframe and returns a numpy array of the features used as input for the learning process. Additionally, it applies a RobustScaler preprocessing to the input features.
- Parameters:
df (pandas.DataFrame) – The input dataframe.
- Returns:
The preprocessed numpy array of features.
- Return type:
numpy.ndarray
This function extracts relevant features from the input dataframe, converts them to a numpy array, and applies RobustScaler for preprocessing to handle outliers.
brain_age_prediction.utils.model_selection_utils module
Functions used to perform Model Selection
- brain_age_prediction.utils.model_selection_utils.model_selection(search_space, x_train, y_train, model_type, max_epochs=300)
Perform k-fold cross-validation for model selection using grid search.
- Parameters:
search_space (list) – A list of lists defining the combination of possible hyperparameters.
x_train (numpy.ndarray) – Input features.
y_train (numpy.ndarray) – Targets.
model_type (str) – The type of model to perform model selection for (‘structural’, ‘functional’, or ‘joint’).
max_epochs (int) – Maximum number of training epochs (default=300).
This function performs k-fold cross-validation using grid search to find the optimal hyperparameters for the specified model type. It saves the optimal hyperparameters to a file.
- brain_age_prediction.utils.model_selection_utils.print_grid_search_results(grid_result, filename)
Prints the results of the grid search and saves them to a file.
- Parameters:
grid_result – The output of grid_search.fit.
filename (str) – The name to assign to the saved file.
This function prints the best score and parameters found during a grid search, along with the mean and standard deviation of test scores for each combination of hyperparameters. It also saves the best hyperparameters to a file.
brain_age_prediction.utils.stats_utils module
Useful statistical tools
- brain_age_prediction.utils.stats_utils.correlation(x, y, permutation_number=1000)
Calculate Pearson correlation coefficient and its p-value between two arrays.
- Parameters:
x (array-like) – First array for correlation.
y (array-like) – Second array for correlation.
permutation_number (int) – Number of permutations for computing the empirical p-value. Default is 1000.
- Returns:
Tuple containing the Pearson correlation coefficient and its empirical p-value.
- Return type:
tuple
This function calculates the Pearson correlation coefficient (r) between two arrays ‘x’ and ‘y’. Additionally, it performs a permutation test to estimate the empirical p-value of the correlation coefficient.
- brain_age_prediction.utils.stats_utils.empirical_p_value(group1, group2, num_permutations=100000)
Calculate the empirical p-value for the difference in means between two groups using permutation testing.
- Parameters:
group1 (array-like) – Data for the first group.
group2 (array-like) – Data for the second group.
num_permutations (int) – Number of permutations to perform for the permutation test. Default is 100,000.
- Returns:
Empirical p-value for the observed difference in means.
- Return type:
float
This function performs a permutation test to estimate the empirical p-value for the difference in means between two groups. The observed test statistic is the difference in means between group2 and group1.
The function generates permuted test statistics by randomly permuting the data between the two groups and calculates the difference in means for each permutation. The empirical p-value is then calculated as the proportion of permuted differences in means that are greater than or equal to the observed difference in means.