GAparsimony.util package
GAparsimony.util.complexity module
Complexity module.
This module contains predefined complexity functions for some of the most popular algorithms in the scikit-learn library:
linearModels_complexity: Any algorithm from `sklearn.linear_model’. Returns: 10^9·nFeatures + (sum of the squared coefs).
svm_complexity: Any algorithm from `sklearn.svm’. Returns: 10^9·nFeatures + (number of support vectors).
knn_complexity: Any algorithm from `sklearn.neighbors’. Returns: 10^9·nFeatures + 1/(number of neighbors)
mlp_complexity: Any algorithm from `sklearn.neural_network’. Returns: 10^9·nFeatures + (sum of the ANN squared weights).
randomForest_complexity: Any algorithm from `sklearn.ensemble.RandomForestRegressor’ or ‘sklearn.ensemble.RandomForestClassifier’. Returns: 10^9·nFeatures + (the average of tree leaves).
xgboost_complexity: XGboost sklearn model. Returns: 10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)
decision_tree_complexity: Any algorithm from ‘sklearn.tree’. Return: 10^9·nFeatures + (number of leaves) (Experimental)
Otherwise:
generic_complexity: Any algorithm. Returns: the number of input features (nFeatures).
Other complexity functions can be defined with the following interface.
def complexity(model, nFeatures, **kwargs):
pass
return complexity
- GAparsimony.util.complexity.decision_tree_complexity(model, nFeatures, **kwargs)
Complexity function for decision tree model.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
10^9·nFeatures + (number of leaves)
- Return type
int
- GAparsimony.util.complexity.generic_complexity(model, nFeatures, **kwargs)
Generic complexity function.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
nFeatures.
- Return type
int
- GAparsimony.util.complexity.knn_complexity(model, nFeatures, **kwargs)
Complexity function for KNN models.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
10^9·nFeatures + 1/(number of neighbors)
- Return type
int
- GAparsimony.util.complexity.linearModels_complexity(model, nFeatures, **kwargs)
Complexity function for linear models.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
10^9·nFeatures + (sum of the model squared coefs).
- Return type
int
- GAparsimony.util.complexity.mlp_complexity(model, nFeatures, **kwargs)
Complexity function for MLP models.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
10^9·nFeatures + (sum of the ANN squared weights)
- Return type
int
- GAparsimony.util.complexity.randomForest_complexity(model, nFeatures, **kwargs)
Complexity function for Random Forest models.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
10^9·nFeatures + (the average of tree leaves)
- Return type
int
- GAparsimony.util.complexity.svm_complexity(model, nFeatures, **kwargs)
Complexity function for SVM models.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
10^9·nFeatures + (number of support vectors)
- Return type
int
- GAparsimony.util.complexity.xgboost_complexity(model, nFeatures, **kwargs)
Complexity function for XGBoost model.
- Parameters
model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.
- Returns
10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)
- Return type
int
GAparsimony.util.fitness module
- GAparsimony.util.fitness.getFitness(algorithm, metric, complexity, cv=RepeatedKFold(n_repeats=5, n_splits=10, random_state=42), minimize=False, test_size=0.2, random_state=42, n_jobs=- 1, ignore_warnings=True)
Fitness function for GAparsimony.
- Parameters
algorithm (object) – The machine learning function to optimize.
metric (function) – A function that computes the fitness value.
complexity (function) – A function that calculates the complexity of the model. There are some functions available in GAparsimony.util.complexity.
cv (object, optional) – An sklearn.model_selection function. By default, is defined RepeatedKFold(n_splits=10, n_repeats=5, random_state=42).
minimize (bool, optional) – False, if the objective is to minimize the metric, to maximize it, set to True.
test_size (float, int, None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, model is not tested with testing split returning fitness_test=np.inf. Default 0.2.
random_state (int, optional) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. Default 42
n_jobs (int, optional) – Number of jobs to run in parallel. Training the estimator and computing the score are parallelized over the cross-validation splits.
-1
means using all processors. Default -1
Examples
Usage example for a regression model
from sklearn.svm import SVC from sklearn.metrics import cohen_kappa_score from GAparsimony import getFitness from GAparsimony.util import svm_complexity fitness = getFitness(SVC, cohen_kappa_score, svm_complexity, cv, maximize=True, test_size=0.2, random_state=42, n_jobs=-1)
GAparsimony.util.order module
- GAparsimony.util.order.order(obj, kind='heapsort', decreasing=False, na_last=True)
Function to order vectors
This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.
- Parameters
obj (numpy.array) – Array to order.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.
decreasing (bool, optional) – If we want decreasing order.
na_last (bool, optional) – For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.
GAparsimony.util.parsimony_monitor module
- GAparsimony.util.parsimony_monitor.parsimony_monitor(object, digits=7, *args)
Functions for monitoring GA-PARSIMONY algorithm evolution
Functions to print summary statistics of fitness values at each iteration of a GA search.
- Parameters
object (object of GAparsimony) – The GAparsimony object that we want to monitor .
digits (int) – Minimal number of significant digits.
*args – Further arguments passed to or from other methods.
- GAparsimony.util.parsimony_monitor.parsimony_summary(object, *args)
GAparsimony.util.population module
- class GAparsimony.util.population.Chromosome(params, name_params, const, cols, name_cols)
Bases:
object
- __init__(params, name_params, const, cols, name_cols)
This class defines a chromosome which includes the hyperparameters, the constant values, and the feature selection.
- Parameters
params (numpy.array) – The algorithm hyperparameter values.
name_params (list of str) – The names of the hyperparameters.
const (numpy.array) – A dictionary with the constants to include in the chomosome.
cols (numpy.array) – The probabilities for selecting the input features (selected if prob>0.5).
name_cols (list of str) – The names of the input features.
- params
A dictionary with the parameter values (hyperparameters and constants).
- Type
dict
- columns
A boolean vector with the selected features.
- Type
numpy.array of bool
- property columns
- property params
- class GAparsimony.util.population.Population(params, columns, population=None)
Bases:
object
- CATEGORICAL = 2
- CONSTANT = 3
- FLOAT = 1
- INTEGER = 0
- __init__(params, columns, population=None)
This class is used to create the GA populations. Allow chromosomes to have int, float, and constant values.
- Parameters
params (dict) –
It is a dictionary with the model’s hyperparameters to be adjusted and the search space of them.
{ "<< hyperparameter name >>": { "range": [<< minimum value >>, << maximum value >>], "type": GAparsimony.FLOAT/GAparsimony.INTEGER }, "<< hyperparameter name >>": { "value": << constant value >>, "type": GAparsimony.CONSTANT } }
columns (int or list of str) – The number of features/columns in the dataset or a list with their names.
population (numpy.array, optional) – It is a float matrix that represents the population. Default None.
- population
The population.
- Type
- _min
A vector of length params+columns with the smallest values that can take.
- Type
numpy.array
- _max
A vector of length params+columns with the highest values that can take.
- Type
numpy.array
- _params
Dict with the parameter values.
- Type
dict
- const
Dict with the constants values.
- Type
dict
- colsnames
List with the columns names.
- Type
list of str
- getChromosome(key)
This method returns a chromosome from the population.
- Parameters
key (int) – Chromosome row index .
- Returns
A Chromosome object.
- Return type
- property paramsnames
- property population
- update_to_feat_thres(popSize, feat_thres)
Module contents
- GAparsimony.util.order(obj, kind='heapsort', decreasing=False, na_last=True)
Function to order vectors
This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.
- Parameters
obj (numpy.array) – Array to order.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.
decreasing (bool, optional) – If we want decreasing order.
na_last (bool, optional) – For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.
- GAparsimony.util.parsimony_monitor(object, digits=7, *args)
Functions for monitoring GA-PARSIMONY algorithm evolution
Functions to print summary statistics of fitness values at each iteration of a GA search.
- Parameters
object (object of GAparsimony) – The GAparsimony object that we want to monitor .
digits (int) – Minimal number of significant digits.
*args – Further arguments passed to or from other methods.