gp

The gp sub-package contains a genetic programming framework that produces Push programs.

The genetic program framework is heavily inspired by the Scikit-learn machine learning framework. PyshGP is designed to be usable using only the base classes, many of which are sklearn estimators. The PyshGP genetic programming framework aims to be easily embedded into Data Science/Machine Learning pipelines, over being used as a standalone tool.

Aside from the general goal of Inductive Program Synthesis, the PyshGP genetic programming framework does not have a single intended application (ie. regression, classification, etc).

Base

TODO: Module docstg

class pyshgp.gp.base.PyshBase(atom_generators='default', operators='default', error_threshold=0, max_generations=1000, population_size=300, selection_method='lexicase', n_jobs=1, initial_max_genome_size=50, program_growth_cap=100, verbose=0, epsilon='auto', tournament_size=7, simplification_steps=2000, keep_linear=False)

Base class for all PushGP evolvers.

TODO: Add validation checks.

Parameters:

atom_generators : list or str, optional (default=’default’)

Atom generators used to generate random Push programs. If 'default' then all atom generators are used.

operators : list or str, optional (default=’default’)

List of tuples. Each tuple contains a VariationOperator and a float. The float determines the relative probability of using the VariationOperator to produce a child. If 'default' a commonly used set of genetic operators is used.

error_threshold : int or float, optional (default=0)

If a program’s total error is ever less than or equal to this value, the program is considered a solution.

max_generations : int, optional (default=1000)

Max number of generation before stopping evolution.

population_size : int, optional (default=300)

Number of Individuals to have in the population at any given generation.

selection_method : str, optional (default=’lexicase’)

Method to use when selecting parents. Supported options are ‘lexicase’, ‘epsilon_lexicase’, and ‘tournament’.

n_jobs : int or str, optional (default=1)

Number of processes to run at once during program evaluation. If -1 the number of processes will be equal to the number of cores.

initial_max_genome_size : int, optional (default=50)

Max number of genes to have in each randomly generated genome.

program_growth_cap : int, optional (default=100)

TODO: Implement this feature.

verbose : int, optional (default=0)

If 1, will print minimal information while evolving. If 2, will print as much information as possible during evolution however this might slightly impact runtime. If 0, prints nothing during evolution.

epsilon : float or str, optional (default=’auto’)

The value of epsilon when using ‘epsilon_lexicase’ as the selection method. If auto, epsilon is set to be equal to the Median Absolute Deviation of each error.

tournament_size : int, optional (default=7)

The size of each tournament when using ‘tournament’ selection.

simplification_steps : int, optional (default=2000)

Number of steps of automatic program simplification to perform.

choose_genetic_operator()

Normalizes operator probabilities so that values sum to 1.

init_executor()

Initializes a pool of processes.

This requires pathos.multiprocessing because the standard multiprocessing library does not support pickling lambda and non-top level functions. Pathos specifically makes use of the dill package.

Todo

TODO: If there is away around using pathos, it would be great to remove this dependency.

init_population()

Generate random population of Individuals with Push programs.

make_spawner(num_inputs)

Creates a spawner object used to generate random code.

Parameters:

num_inputs : int

The number of inputs instructions to generate at add to the Spawner. This should be set to the number of input values (features) that will be supplied to Push programs during evaluation.

output_types : list

A list of pysh types. The spawner will include instructions which ouput a list of outputs with the corresponding type in each index.

print_monitor(generation)

Prints a basic set of values that can be used to manually monitor run health.

TODO: Add validation check for if population exists.

Parameters:

generation : int

The generation number.

print_monitor_verbose(generation)

Prints all implemented values that can be used to manually monitor run health.

TODO: Add validation check for if population exists.

Parameters:

generation : int

The generation number.

class pyshgp.gp.base.PyshEstimatorMixin

A Mixin class for the Sklearn estimators included in pyshgp.

evolve(X, y)

Main evolutionary loop for the sklearn estimators in pyshgp.

Parameters:

X : {array-like, sparse matrix}, shape = (n_samples, n_features)

Samples.

y : {array-like, sparse matrix}, shape = (n_samples, 1)

Target values.

pyshgp.gp.base.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Parameters:

a : 1-D array-like or int

If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

size : int or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

replace : boolean, optional

Whether the sample is with or without replacement

p : 1-D array-like, optional

The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

Returns:

samples : single item or ndarray

The generated random samples

Raises:

ValueError

If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

See also

randint, shuffle, permutation

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4])
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0])

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0])
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0])

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'],
      dtype='|S11')
pyshgp.gp.base.random() → x in the interval [0, 1).

Population

Classes that reperesents Individuals and Populations in evolutionary algorithms.

class pyshgp.gp.population.Individual(genome)

Holds all information about an individual in the PushGP framework.

The main role of an Individual is to hold a Push program that determines the Individual’s behavior. An Individual’s push program comes from a Plush genome, which is also stored in the Individual. Genomes are what pyshgp’s VariationOperators manipulation. An Individual is created based off a genome, and it’s program is set by translating the genome into into a program.

Parameters:

genome : list of genes

List of plush genes.

Attributes

genome Plush Genome of individual.
program Push program of individual.
error_vector (list) A list of numeric error values.
total_error (float) A single numeric error value. Generally some aggregate of the error_vector.
genome

Plush Genome of individual.

program

Push program of individual. Taken from Plush genome.

run_program(inputs, output_types, print_trace=False)

Runs the Individual’s program.

Parameters:

inputs : list

List of input values that can be accessed by the Individual’s program.

print_trace : bool, optional

If True, prints the current program element and the state of the stack at each step of executing the program.

output_types : list

A list of pysh types. The spawner will include instructions which ouput a list of outputs with the corresponding type in each index.

Returns:

The final state of the push Interpreter after executing the program.

class pyshgp.gp.population.Population

Pyshgp population of Individuals.

average_error()
Returns:The average total error found in the population.
best_program()
Returns:The program of the Individual with the lowest total error.
best_program_error_vector()
Returns:The program of the Individual with the lowest total error.
epsilon_lexicase_selection(epsilon='auto')

Returns an individual that does the best on the fitness cases when considered one at a time in random order.

Parameters:

epsilon : float, array-like or str, optional (default=’auto’)

If an individual is within epsilon of being elite, it will remain in the selection pool. If ‘auto’, epsilon is set at the start of each selection even to be equal to the Median Absolute Deviation of each error.

Returns:

individual : Individual

An individual from the population selected using lexicase selection.

evaluate_by_dataset(X, y, mode, pool=None)

Evalutes the population based on the specified mode.

Parameters:

X : {array-like, sparse matrix}, shape = (n_samples, n_features)

Samples.

y : {array-like, sparse matrix}, shape = (n_samples, 1)

Target values.

mode : str

Valid options include “regression” and “classification”.

pool : pathos.multiprocessing.Pool, optional

Pool of processes to evaluate in parallel.

evaluate_by_function(error_function, pool=None)

Evaluates every individual in the population, if the individual has not been previously evaluated.

Parameters:

error_function : function

The error function which takes a push program as input and Returns an error vector

pool : pathos.multiprocessing.Pool, optional

Pool of processes to evaluate in parallel.

lexicase_selection()

Returns an individual that does the best on the fitness cases when considered one at a time in random order.

http://faculty.hampshire.edu/lspector/pubs/lexicase-IEEE-TEC.pdf

Returns:

individual : Individual

An individual from the population selected using lexicase selection.

lowest_error()
Returns:The lowest total error found in the population.
select(method='lexicase', epsilon='auto', tournament_size=7, cap=2)

Selects a individual from the population with the given selection method.

Parameters:

method : str, optional (default=’lexicase’)

The selection method to be used when selecting parents. Supported options are ‘lexicase’, ‘epsilon_lexicase’, and ‘tournament’.

epsilon : int, str, optional (default=’auto’)

The value of epsilon when using ‘epsilon_lexicase’ as the selection method. If auto, epsilon is set to be equal to the Median Absolute Deviation of each error.

tournament_size : int, optional (default=7)

The size of each tournament when using ‘tournament’ selection.

tournament_selection(tournament_size=7)

Returns the individual with the lowest error within a random tournament.

Parameters:

tournament_size : int, optional (default=7)

Size of each tournament.

Returns:

individual : Individual

An individual from the population selected using tournament selection.

unique()
Returns:The number of unique programs found in the population.

Simplificaiton

The simplification module contains functions that help when automatically simplifying Push genomes and Push programs.

TODO: function parameter docstrings

pyshgp.gp.simplification.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Parameters:

a : 1-D array-like or int

If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

size : int or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

replace : boolean, optional

Whether the sample is with or without replacement

p : 1-D array-like, optional

The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

Returns:

samples : single item or ndarray

The generated random samples

Raises:

ValueError

If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

See also

randint, shuffle, permutation

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4])
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0])

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0])
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0])

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'],
      dtype='|S11')
pyshgp.gp.simplification.noop_n_random_genes(genome, n)

Returns a new genome that is identical to input genome, with n genes replaced with noop instructions.

Parameters:

genome : list of Genes

List of Plush genes.

n : int

Number of gnese to switch to noop.

pyshgp.gp.simplification.randint(low, high=None, size=None, dtype='l')

Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

Parameters:

low : int

Lowest (signed) integer to be drawn from the distribution (unless high=None, in which case this parameter is one above the highest such integer).

high : int, optional

If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None).

size : int or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

dtype : dtype, optional

Desired dtype of the result. All dtypes are determined by their name, i.e., ‘int64’, ‘int’, etc, so byteorder is not available and a specific precision may have different C types depending on the platform. The default value is ‘np.int’.

New in version 1.11.0.

Returns:

out : int or ndarray of ints

size-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.

See also

random.random_integers
similar to randint, only for the closed interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.

Examples

>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
>>> np.random.randint(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Generate a 2 x 4 array of ints between 0 and 4, inclusive:

>>> np.random.randint(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])
pyshgp.gp.simplification.silent_n_random_genes(genome, n)

Returns a new genome that is identical to input genome, with n genes marked as silent.

Parameters:

genome : list of Genes

List of Plush genes.

n : int

Number of gnese to switch to silent.

pyshgp.gp.simplification.simplify_by_dataset(individual, X, y, mode, steps=1000, verbose=0)

Simplifies the genome (and program) of the individual based on a dataset by randomly removing some elements of the program and confirming that the total error remains the same or lower. This is acheived by silencing some genes in the individual’s genome.

Parameters:

individual : Individual

The individual to simply.

X : {array-like, sparse matrix}, shape = (n_samples, n_features)

Samples.

y : {array-like, sparse matrix}, shape = (n_samples, 1)

Labels.

mode : str

Valid options include “regression” and “classification”

steps : int, optional (default=1000)

Function to used to calculate the error of the individual. Sklearn scoring functions are supported.

verbose :int, optional (default=0)

When greater than 0, verbose printing is enabled.

pyshgp.gp.simplification.simplify_by_function(individual, error_function, steps=1000, verbose=0)

Simplifies the genome (and program) of the individual based on a function by randomly removing some elements of the program and confirming that the total error remains the same or lower. This is acheived by silencing some genes in the individual’s genome.

Parameters:

individual : Individual

The individual to simply.

error_function : function

Error function used to evaluate the individual’s program.

steps : int, optional (default=1000)

Function to used to calculate the error of the individual. Sklearn scoring functions are supported.

verbose :int, optional (default=0)

When greater than 0, verbose printing is enabled.

pyshgp.gp.simplification.simplify_once(genome)

Silences or noops between 1 and 3 random genes.

Parameters:

genome : list of Genes

List of Plush genes.

Variation

The variation module defines classes for variation operators (aka genetic operators). These operators are used in evoluation to create new children from selected parents.

class pyshgp.gp.variation.Alternation(rate=0.01, alignment_deviation=10)

Uniformly alternates between the two parents.

More information can be found on the this Push-Redux page.

Parameters:

rate : float, optional (default=0.01)

The probablility of switching which parent program elements are being copied from. Must be 0 <= rate <= 1. Defaults to 0.1.

alignment_deviation : int, optional (default=10)

The standard deviation of how far alternation may jump between indices when switching between parents.

produce(parents, spawner=None)

Produces a child using the UniformMutation operator.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner, optional

A spawner that can be used to create random Push code. Not used by this operator.

class pyshgp.gp.variation.FlipBooleanMutation(rate=0.01)

Randomly flips the boolean literal genes.

class pyshgp.gp.variation.Genesis(max_genome_size)

Creates an entirely new (and random) genome.

class pyshgp.gp.variation.LiteralMutation(pysh_type, rate=0.01)

Base class for all constant mutators.

produce(parents, spawner)

Produces a child by perturbing some floats in the parent.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner

A spawner that can be used to create random Push code.

class pyshgp.gp.variation.PerturbCloseMutation(rate=0.01, standard_deviation=1)

Randomly perturbs the number of close markers on each gene.

produce(parents, spawner=None)

Produces a child by perturbing some floats in the parent.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner

A spawner that can be used to create random Push code.

Returns:

A child Individual.

class pyshgp.gp.variation.PerturbFloatMutation(rate=0.01, standard_deviation=1)

Randomly perturbs the genes containing float literals.

class pyshgp.gp.variation.PerturbIntegerMutation(rate=0.01, standard_deviation=1)

Randomly perturbs the genes containing integer literals.

class pyshgp.gp.variation.RandomAdditionMutation(rate=0.01)

Randomly adds new genes.

produce(parents, spawner)

Produces a child by perturbing some floats in the parent.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner

A spawner that can be used to create random Push code.

class pyshgp.gp.variation.RandomDeletionMutation(rate=0.01)

Randomly removes some genes.

produce(parents, spawner)

Produces a child by perturbing some floats in the parent.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner

A spawner that can be used to create random Push code.

class pyshgp.gp.variation.RandomReplaceMutation(rate=0.01)

Randomly replaces genes.

produce(parents, spawner)

Produces a child by perturbing some floats in the parent.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner

A spawner that can be used to create random Push code.

class pyshgp.gp.variation.Reproduction

Clones the parent genome.

class pyshgp.gp.variation.TweakStringMutation(rate=0.01, char_tweak_rate=0.1)

Randomly tweaks the string values in string literal genes.

class pyshgp.gp.variation.UniformMutation(rate=0.01, literal_tweak_rate=0.5, float_standard_deviation=1.0, int_standard_deviation=1.0, string_char_tweak_rate=0.1)

A simple mutation operator that mutates all genes.

produce(parents, spawner)

Produces a child by perturbing some floats in the parent.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner

A spawner that can be used to create random Push code.

class pyshgp.gp.variation.VariationOperator(num_parents)

The base class for all variation operators.

Parameters:

num_parents : int

Number of parent Individuals the operator needs to produce a child Individual.

produce(parents, spawner)

Produces a child.

class pyshgp.gp.variation.VariationOperatorPipeline(operators)

Variation operator that chains together other variation operators.

Parameters:

operators : list of VariationOperators

A list of operators to apply in order to produce the child Individual.

produce(parents, spawner)

Produces a child using the VariationOperatorPipeline.

Parameters:

parents : list of Individuals

A list of parents to use when producing the child.

spawner : pyshgp.push.spawn.Spawner

A spawner that can be used to create random Push code.