GAparsimony.util package

GAparsimony.util.complexity module

Complexity module.

This module contains predefined complexity functions for some of the most popular algorithms in the scikit-learn library:

  • linearModels_complexity: Any algorithm from `sklearn.linear_model’. Returns: 10^9·nFeatures + (sum of the squared coefs).

  • svm_complexity: Any algorithm from `sklearn.svm’. Returns: 10^9·nFeatures + (number of support vectors).

  • knn_complexity: Any algorithm from `sklearn.neighbors’. Returns: 10^9·nFeatures + 1/(number of neighbors)

  • mlp_complexity: Any algorithm from `sklearn.neural_network’. Returns: 10^9·nFeatures + (sum of the ANN squared weights).

  • randomForest_complexity: Any algorithm from `sklearn.ensemble.RandomForestRegressor’ or ‘sklearn.ensemble.RandomForestClassifier’. Returns: 10^9·nFeatures + (the average of tree leaves).

  • xgboost_complexity: XGboost sklearn model. Returns: 10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)

  • decision_tree_complexity: Any algorithm from ‘sklearn.tree’. Return: 10^9·nFeatures + (number of leaves) (Experimental)

Otherwise:

  • generic_complexity: Any algorithm. Returns: the number of input features (nFeatures).

Other complexity functions can be defined with the following interface.

def complexity(model, nFeatures, **kwargs):
    pass

return complexity
GAparsimony.util.complexity.decision_tree_complexity(model, nFeatures, **kwargs)

Complexity function for decision tree model.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (number of leaves)

Return type

int

GAparsimony.util.complexity.generic_complexity(model, nFeatures, **kwargs)

Generic complexity function.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

nFeatures.

Return type

int

GAparsimony.util.complexity.knn_complexity(model, nFeatures, **kwargs)

Complexity function for KNN models.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + 1/(number of neighbors)

Return type

int

GAparsimony.util.complexity.linearModels_complexity(model, nFeatures, **kwargs)

Complexity function for linear models.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (sum of the model squared coefs).

Return type

int

GAparsimony.util.complexity.mlp_complexity(model, nFeatures, **kwargs)

Complexity function for MLP models.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (sum of the ANN squared weights)

Return type

int

GAparsimony.util.complexity.randomForest_complexity(model, nFeatures, **kwargs)

Complexity function for Random Forest models.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (the average of tree leaves)

Return type

int

GAparsimony.util.complexity.svm_complexity(model, nFeatures, **kwargs)

Complexity function for SVM models.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (number of support vectors)

Return type

int

GAparsimony.util.complexity.xgboost_complexity(model, nFeatures, **kwargs)

Complexity function for XGBoost model.

Parameters
  • model (model) – The model for calculating complexity.

  • nFeatures (int) – The number of input features the model has been trained with.

  • **kwargs – A variable number of named arguments.

Returns

10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)

Return type

int

GAparsimony.util.fitness module

GAparsimony.util.fitness.getFitness(algorithm, metric, complexity, cv=RepeatedKFold(n_repeats=5, n_splits=10, random_state=42), minimize=False, test_size=0.2, random_state=42, n_jobs=- 1, ignore_warnings=True)

Fitness function for GAparsimony.

Parameters
  • algorithm (object) – The machine learning function to optimize.

  • metric (function) – A function that computes the fitness value.

  • complexity (function) – A function that calculates the complexity of the model. There are some functions available in GAparsimony.util.complexity.

  • cv (object, optional) – An sklearn.model_selection function. By default, is defined RepeatedKFold(n_splits=10, n_repeats=5, random_state=42).

  • minimize (bool, optional) – False, if the objective is to minimize the metric, to maximize it, set to True.

  • test_size (float, int, None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, model is not tested with testing split returning fitness_test=np.inf. Default 0.2.

  • random_state (int, optional) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. Default 42

  • n_jobs (int, optional) – Number of jobs to run in parallel. Training the estimator and computing the score are parallelized over the cross-validation splits. -1 means using all processors. Default -1

Examples

Usage example for a regression model

from sklearn.svm import SVC
from sklearn.metrics import cohen_kappa_score

from GAparsimony import getFitness
from GAparsimony.util import svm_complexity

fitness = getFitness(SVC, cohen_kappa_score, svm_complexity, cv, maximize=True, test_size=0.2, random_state=42, n_jobs=-1)

GAparsimony.util.order module

GAparsimony.util.order.order(obj, kind='heapsort', decreasing=False, na_last=True)

Function to order vectors

This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.

Parameters
  • obj (numpy.array) – Array to order.

  • kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.

  • decreasing (bool, optional) – If we want decreasing order.

  • na_last (bool, optional) – For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.

GAparsimony.util.parsimony_monitor module

GAparsimony.util.parsimony_monitor.parsimony_monitor(object, digits=7, *args)

Functions for monitoring GA-PARSIMONY algorithm evolution

Functions to print summary statistics of fitness values at each iteration of a GA search.

Parameters
  • object (object of GAparsimony) – The GAparsimony object that we want to monitor .

  • digits (int) – Minimal number of significant digits.

  • *args – Further arguments passed to or from other methods.

GAparsimony.util.parsimony_monitor.parsimony_summary(object, *args)

GAparsimony.util.population module

class GAparsimony.util.population.Chromosome(params, name_params, const, cols, name_cols)

Bases: object

__init__(params, name_params, const, cols, name_cols)

This class defines a chromosome which includes the hyperparameters, the constant values, and the feature selection.

Parameters
  • params (numpy.array) – The algorithm hyperparameter values.

  • name_params (list of str) – The names of the hyperparameters.

  • const (numpy.array) – A dictionary with the constants to include in the chomosome.

  • cols (numpy.array) – The probabilities for selecting the input features (selected if prob>0.5).

  • name_cols (list of str) – The names of the input features.

params

A dictionary with the parameter values (hyperparameters and constants).

Type

dict

columns

A boolean vector with the selected features.

Type

numpy.array of bool

property columns
property params
class GAparsimony.util.population.Population(params, columns, population=None)

Bases: object

CATEGORICAL = 2
CONSTANT = 3
FLOAT = 1
INTEGER = 0
__init__(params, columns, population=None)

This class is used to create the GA populations. Allow chromosomes to have int, float, and constant values.

Parameters
  • params (dict) –

    It is a dictionary with the model’s hyperparameters to be adjusted and the search space of them.

    {
        "<< hyperparameter name >>": {
            "range": [<< minimum value >>, << maximum value >>],
            "type": GAparsimony.FLOAT/GAparsimony.INTEGER
        },
        "<< hyperparameter name >>": {
            "value": << constant value >>,
            "type": GAparsimony.CONSTANT
        }
    }
    

  • columns (int or list of str) – The number of features/columns in the dataset or a list with their names.

  • population (numpy.array, optional) – It is a float matrix that represents the population. Default None.

population

The population.

Type

Population

_min

A vector of length params+columns with the smallest values that can take.

Type

numpy.array

_max

A vector of length params+columns with the highest values that can take.

Type

numpy.array

_params

Dict with the parameter values.

Type

dict

const

Dict with the constants values.

Type

dict

colsnames

List with the columns names.

Type

list of str

getChromosome(key)

This method returns a chromosome from the population.

Parameters

key (int) – Chromosome row index .

Returns

A Chromosome object.

Return type

Chromosome

property paramsnames
property population
update_to_feat_thres(popSize, feat_thres)

Module contents

GAparsimony.util.order(obj, kind='heapsort', decreasing=False, na_last=True)

Function to order vectors

This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.

Parameters
  • obj (numpy.array) – Array to order.

  • kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.

  • decreasing (bool, optional) – If we want decreasing order.

  • na_last (bool, optional) – For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.

GAparsimony.util.parsimony_monitor(object, digits=7, *args)

Functions for monitoring GA-PARSIMONY algorithm evolution

Functions to print summary statistics of fitness values at each iteration of a GA search.

Parameters
  • object (object of GAparsimony) – The GAparsimony object that we want to monitor .

  • digits (int) – Minimal number of significant digits.

  • *args – Further arguments passed to or from other methods.