GAparsimony.util package

GAparsimony.util.complexity module

Complexity module.

This module contains predefined complexity functions for some of the most popular algorithms in the scikit-learn library:

linearModels_complexity: Any algorithm from `sklearn.linear_model’. Returns: 10^9·nFeatures + (sum of the squared coefs).
svm_complexity: Any algorithm from `sklearn.svm’. Returns: 10^9·nFeatures + (number of support vectors).
knn_complexity: Any algorithm from `sklearn.neighbors’. Returns: 10^9·nFeatures + 1/(number of neighbors)
mlp_complexity: Any algorithm from `sklearn.neural_network’. Returns: 10^9·nFeatures + (sum of the ANN squared weights).
randomForest_complexity: Any algorithm from `sklearn.ensemble.RandomForestRegressor’ or ‘sklearn.ensemble.RandomForestClassifier’. Returns: 10^9·nFeatures + (the average of tree leaves).
xgboost_complexity: XGboost sklearn model. Returns: 10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)
decision_tree_complexity: Any algorithm from ‘sklearn.tree’. Return: 10^9·nFeatures + (number of leaves) (Experimental)

Otherwise:

generic_complexity: Any algorithm. Returns: the number of input features (nFeatures).

Other complexity functions can be defined with the following interface.

def complexity(model, nFeatures, **kwargs):
    pass

return complexity

GAparsimony.util.complexity.decision_tree_complexity(model, nFeatures, **kwargs)

Complexity function for decision tree model.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (number of leaves)

Return type

int

GAparsimony.util.complexity.generic_complexity(model, nFeatures, **kwargs)

Generic complexity function.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

nFeatures.

Return type

int

GAparsimony.util.complexity.knn_complexity(model, nFeatures, **kwargs)

Complexity function for KNN models.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + 1/(number of neighbors)

Return type

int

GAparsimony.util.complexity.linearModels_complexity(model, nFeatures, **kwargs)

Complexity function for linear models.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (sum of the model squared coefs).

Return type

int

GAparsimony.util.complexity.mlp_complexity(model, nFeatures, **kwargs)

Complexity function for MLP models.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (sum of the ANN squared weights)

Return type

int

GAparsimony.util.complexity.randomForest_complexity(model, nFeatures, **kwargs)

Complexity function for Random Forest models.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (the average of tree leaves)

Return type

int

GAparsimony.util.complexity.svm_complexity(model, nFeatures, **kwargs)

Complexity function for SVM models.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (number of support vectors)

Return type

int

GAparsimony.util.complexity.xgboost_complexity(model, nFeatures, **kwargs)

Complexity function for XGBoost model.

Parameters

model (model) – The model for calculating complexity.
nFeatures (int) – The number of input features the model has been trained with.
**kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)

Return type

int

GAparsimony.util.fitness module

GAparsimony.util.fitness.getFitness(algorithm, metric, complexity, cv=RepeatedKFold(n_repeats=5, n_splits=10, random_state=42), minimize=False, test_size=0.2, random_state=42, n_jobs=- 1, ignore_warnings=True)

Fitness function for GAparsimony.

Parameters

algorithm (object) – The machine learning function to optimize.
metric (function) – A function that computes the fitness value.
complexity (function) – A function that calculates the complexity of the model. There are some functions available in GAparsimony.util.complexity.
cv (object, optional) – An sklearn.model_selection function. By default, is defined RepeatedKFold(n_splits=10, n_repeats=5, random_state=42).
minimize (bool, optional) – False, if the objective is to minimize the metric, to maximize it, set to True.
test_size (float, int, None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, model is not tested with testing split returning fitness_test=np.inf. Default 0.2.
random_state (int, optional) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. Default 42
n_jobs (int, optional) – Number of jobs to run in parallel. Training the estimator and computing the score are parallelized over the cross-validation splits. -1 means using all processors. Default -1

Examples

Usage example for a regression model

from sklearn.svm import SVC
from sklearn.metrics import cohen_kappa_score

from GAparsimony import getFitness
from GAparsimony.util import svm_complexity

fitness = getFitness(SVC, cohen_kappa_score, svm_complexity, cv, maximize=True, test_size=0.2, random_state=42, n_jobs=-1)

GAparsimony.util.order module

GAparsimony.util.order.order(obj, kind='heapsort', decreasing=False, na_last=True)

Function to order vectors

This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.

Parameters

obj (numpy.array) – Array to order.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.
decreasing (bool, optional) – If we want decreasing order.
na_last (bool, optional) – For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.

GAparsimony.util.parsimony_monitor module

GAparsimony.util.parsimony_monitor.parsimony_monitor(object, digits=7, *args)

Functions for monitoring GA-PARSIMONY algorithm evolution

Functions to print summary statistics of fitness values at each iteration of a GA search.

Parameters

object (object of GAparsimony) – The GAparsimony object that we want to monitor .
digits (int) – Minimal number of significant digits.
*args – Further arguments passed to or from other methods.

GAparsimony.util.parsimony_monitor.parsimony_summary(object, *args)

GAparsimony.util.population module

class GAparsimony.util.population.Chromosome(params, name_params, const, cols, name_cols)

Bases: object

__init__(params, name_params, const, cols, name_cols)

This class defines a chromosome which includes the hyperparameters, the constant values, and the feature selection.

Parameters

params (numpy.array) – The algorithm hyperparameter values.
name_params (list of str) – The names of the hyperparameters.
const (numpy.array) – A dictionary with the constants to include in the chomosome.
cols (numpy.array) – The probabilities for selecting the input features (selected if prob>0.5).
name_cols (list of str) – The names of the input features.

params

A dictionary with the parameter values (hyperparameters and constants).

Type: dict

columns

A boolean vector with the selected features.

Type: numpy.array of bool

property columns

property params

class GAparsimony.util.population.Population(params, columns, population=None)

Bases: object

CATEGORICAL = 2

CONSTANT = 3

FLOAT = 1

INTEGER = 0

__init__(params, columns, population=None)

This class is used to create the GA populations. Allow chromosomes to have int, float, and constant values.

Parameters

params (dict) –

It is a dictionary with the model’s hyperparameters to be adjusted and the search space of them.

{
    "<< hyperparameter name >>": {
        "range": [<< minimum value >>, << maximum value >>],
        "type": GAparsimony.FLOAT/GAparsimony.INTEGER
    },
    "<< hyperparameter name >>": {
        "value": << constant value >>,
        "type": GAparsimony.CONSTANT
    }
}

columns (int or list of str) – The number of features/columns in the dataset or a list with their names.
population (numpy.array, optional) – It is a float matrix that represents the population. Default None.

population

The population.

Type: Population

_min

A vector of length params+columns with the smallest values that can take.

Type: numpy.array

_max

A vector of length params+columns with the highest values that can take.

Type: numpy.array

_params

Dict with the parameter values.

Type: dict

const

Dict with the constants values.

Type: dict

colsnames

List with the columns names.

Type: list of str

getChromosome(key)

This method returns a chromosome from the population.

Parameters: key (int) – Chromosome row index .
Returns: A Chromosome object.
Return type: Chromosome

property paramsnames

property population

update_to_feat_thres(popSize, feat_thres)

Module contents

GAparsimony.util.order(obj, kind='heapsort', decreasing=False, na_last=True)

Function to order vectors

This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.

Parameters

obj (numpy.array) – Array to order.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.
decreasing (bool, optional) – If we want decreasing order.
na_last (bool, optional) – For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.

GAparsimony.util.parsimony_monitor(object, digits=7, *args)

Functions for monitoring GA-PARSIMONY algorithm evolution

Functions to print summary statistics of fitness values at each iteration of a GA search.

Parameters

object (object of GAparsimony) – The GAparsimony object that we want to monitor .
digits (int) – Minimal number of significant digits.
*args – Further arguments passed to or from other methods.