Actions

dANN

Genetic Wavelets

From Syncleus Wiki

dANN Information
General
Description An Artificial Intelligence Library written in Java.
Last Activity Today
License OSCL Type C
Homepage dANN
Download
Distributions 2.1
Documentation Documentation Snapshot
Development http://gerrit.syncleus.com/dANN
JIRA Bug Tracking
Sonar Technical Debt
Support
IRC Room #dANN on irc.freenode.org
Mailing Lists Directory of Mailing Lists

The Genetic Wavelet Algorithm is a novel form of evolutionary algorithm that is heavily biologically-inspired, involving concepts such as DNA, enhancers, promoters, transacting/transcription factors, as well as the gene regulatory network, receptors, signals, keys, and more. Genetic Wavelets are to classical AI algorithms what biological genetics are to neurons.

The Genetic Wavelet algorithm has a population of individuals that are evaluated by a fitness function, and it uses recombination operators to create new generations of the population, therefore it can be classified as an evolutionary algorithm. However, there are more details that go into the Genetic Wavelets than classic GAs. Below is a flow chart with the basic steps of the Genetic Wavelets.

In relation to other EAs & GAs, Genetic Wavelets is similar. It follows the basic steps paradigm of population, crossover, and mutation. The large difference comes in the way Genetic Wavelets represents individuals in the population and their chromosomes. The steps of population evaluation(fitness function), convergence check, population size, and the step of mating are all the parts that need to be implemented when solving any problem. Before implementation, one must know the details of Genetic Wavelets. The steps and specifications are explained below.

Population Initialization

Error creating thumbnail: File missing
Genetic Wavelets flowchart

Individuals are represented as organisms. An organism is made up of one or more cells. A cell has a nucleus which contains one or more chromosomes. There can be single-cellular implementations of Genetic Wavelets or multi-cellular. An example of a multi-cellular implementation would be a neural network.

The implementation of the organism is dependent on the problem and how the programmer wants the genes to be represented in the algorithm. This includes whether it is going to be a single-cellular or a multi-cellular implementation. This organism also is where genes would be mapped to traits.

Regardless of how an individual is implemented, a cell is initiated the same way. A cell starts off with one chromosome, and a mutability factor, a floating point variable between zero and ten. A cell can have more chromosomes if mutation occurs; when it is instantiated, a random number(between zero and ten) is tested against its mutation factor, if the random number is less than the mutation factor, another chromosome is added. This continues until a random number greater than the mutability factor is chosen.

Genetic Wavelets represents chromosomes as a pair of chromatids. Each chromatid has a set of genes. There is a right chromatid and a left chromatid. They are joined at what is called the centromere position, which dictates where the gene will be split for crossover. Each chromatid is instantiated with its own mutability factor, then one or more gene is added to the chromatid, depending on the same method above for adding chromosomes to the cell.

The chromosome representation is done as objects in lists, therefore an allele is not dependent on a position in a string, but on an index in a list. At each index of the list is a gene object. There are three types of genes: promoter genes, signal genes, and external signal genes. Each gene has a list containing zero or more receptors. Each gene has an output which is computed by its expression function. This output is a floating point value and is essentially its value, however the output of can also be represented by a set of wavelets, more on this further on. Before additional detail on the types of genes, it is important to know about keys.

The step of population initialization starts with filling the population with randomly generated individuals. Each one of those individuals is created with one or more cells, each containing one or more chromosomes. Each chromosome is initialized with one or more gene, which could be of any type, selected randomly.

Keys

Genetic Wavelets use keys, which use schemata to encode their values. A key is a variable length string that is evolved during the genetic wavelet algorithm using x as a wild card character. A key, specifically a signal type key, might look like this:

Error creating thumbnail: File missing
Genetic Wavelets Signal Key Example: 01x01x10

A key is a schema[1], but it is not used for chromosome representation, instead it is used to model communications between cells. Keys are used to represent two important components of Genetic Wavelets - receptors and signals. The interaction is loosely modeled after hormones binding to cells.

Keys are a way of bypassing the complexities involved in modeling biological signal key, receptor, and protein folding - to provide a similarly functioning model that can be emulated with much simpler computations.

There are restrictions for signals binding to receptors. A signal can bind to a receptor if and only if the receptor matches the signal on all non-x points. For example, the signal key "101x11" can bind to the receptor "10xx11," but "101x1" cannot bind to the same receptor.

Error creating thumbnail: File missing
Genetic Wavelets Key Bind Example.
Signal 01101x10 successfully binds to receptor 110xx10.
Error creating thumbnail: File missing
Genetic Wavelets Key Mismatch Example.
Signal 0101xx1 fails to bind with receptor 110xx10.

Keys do not have to be equal in length in order to bind to one another; a smaller length receptor would bind with more signals, because it would bind to more sites on a signal. These signal and receptors are an integral part of Genetic Wavelets. Keys can mutate along with the system, however they are not changed on crossover. A key is initialized with one or more values depending on the mutation factor of the gene that is initializing it. All of the genes in Genetic Wavelets use keys in some way.

Genes

There are three types of genes; they all have a few things in common. First, they all have one or more receptors. Receptors have a receptor key. A gene starts off with at least one receptor, depending on mutation. Every gene also has an expression function, as mentioned before. The expression function is calculated using the receptors to map each one to a dimension in a wavelet, more details on how the expression function is calculated can be found in the next section. The floating point value output of the expression function is known as the current activity of a gene – it is the value that it outputs.

The first type of gene is the signal gene. Signal genes are responsible for creating a signal key that may or may not bind to other genes' or its own receptors. Signals genes each have a concentration which is calculated in the tick step. A signal gene outputs its signal local to the cell it resides in. A signal binding to a receptor will increase the output of that receptor by its concentration in the affected gene’s expression function.

An external signal gene is similar to a signal gene, but it has a direction that it is facing – it can either be inward facing or outward facing. If it is outward facing, it outputs signals to other cells, and its receptors receive signals from within the cell. If it is inward facing, then it outputs signals to the local cell, and receives signals from other external signal genes in other cells. These genes are only used in multi-cellular implementations.

The last type of gene is a promoter gene. Promoter genes exist to promote a gene that is a certain distance away from it on the same chromatid. Promotion of a gene is calculated and affects the current activity of a gene. The current activity is increased by the product of its expression function output and how much it is being promoted. The gene that a promoter gene affects can change due to mutation.

Pre-Tick, Tick, Wavelets, and The Expression Function

The pre-tick step is the first step in the simulation phase. This is where each gene calculates its new activity from its expression function. This is stored as its pending activity, because it is not applied as its expression yet. Each signal that is being expressed in the cell is tested to see if it binds to each receptor on the cell. If it does bind to a cell, then its value is changed, if not its value is zero.

This step in the simulation calculates and applies the promotion value. The current activity is set to the sum of the pending activity and the product of the pending activity and its gene’s promotion. Promotion does not affect value of the expression function, just changes its output.

The expression function is the core of the Genetic Wavelets algorithm, and the origin of the algorithm’s name. The expression function is a collection of one or more wavelets. A wavelet is a wave that starts out with an amplitude of zero which increases and then goes back to zero. A common example of a wavelet is the wave caused by a heartbeat. A wavelet is described by a few different properties: amplitude, center point, phase, form, and distribution. When a gene is initialized, the expression function is created with one initial wavelet that is generated randomly. The number of wavelets in the expression function is dictated by mutation. When a gene mutates, its expression function has a chance to add, remove, or modify wavelets. Below are two examples of what the expression function could look like. The first is a two dimensional one (two receptors) with one wavelet, the second one is what it would look like if another random wavelet were to be added to it.

Error creating thumbnail: File missing
Wavelet example
Error creating thumbnail: File missing
Wavelet overlay

The expression function is multi-dimensional, each dimension corresponds with a receptor, and the receptors value. A receptors value is zero, unless a signal binds to it. Each wave in the expression function will have the same dimensional values. So, in the above examples you see each wave has two dimensions – x and y. Therefore if these waves were expression functions, the genes that they were from would only have two receptors on them. Or they could have twenty receptors and only have two of them actually be bound to signals.

The expression function can be used in two ways, through its floating point output, and as a single function that it can represent. Because the expression function is a function, it can be used as part of the solution. For instance, by using any wavelet transform technique[2], multiple wavelets can be combined into a single wavelet; using a technique on the two wavelets above could result in the wavelet below.

Wavelets are specifically for defining how transcription factors effect expression enhancement/suppression - allowing for complex patterns of expression to emerge from particular combinations of transcription factors. Wavelets are away of bypassing the parts of nature that are not effieciently modeled in computers. Normally if we replicated biology we would need to model the complexities of proteins and protein folding, which we unfortunately need to avoid. But we are still able to model the effect on expression by "bypassing" this aspect, which is not necessary anyway because we are not modeling the constraints of a physical world.

Using a gene’s expression function as a way to map the organisms behavior depending on different input could yield powerful results. Again, the way the genes affect an organism is up to the details of the problem and its implementation. Either way, the expression function’s method of adding multiple multi-dimensional wavelets is an effective way of building a diverse function. The algorithm can also be implemented by using the expression function’s floating point output.

The way the expression function finds the floating point output corresponds with the multiple wavelets. The floating point output comes from the sum of an output from each individual wavelet. The output for each wavelet collapses the wavelet, the formula for the collapse is below.
Error creating thumbnail: File missing
Wavelet Transform Example

Where a is the amplitude of the wavelet; d is the distance from the center (the sum of each dimensional value’s distance from center on that dimension); f is the form of the wavelet; p is the phase of the wavelet; r is the distribution of the wavelet. Let Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): c(a,d,p)=a*Sin(2*\pi *(d+{\frac {p}{360}})) . The floating point output for the wavelet is o, and Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): o(a,c,d,f)=d*{\frac {c}{\left\vert c\right\vert }}*|a*(\left\vert {\frac {c}{a}}\right\vert *f)| .

Whether using the floating point output or the wave function output, the expression function is what is used to calculate the values of each gene depending on the signals that exist in a cell, and the receptors that those signals bind to. This process was modeled off of Gene Regulatory Networks, which is the structure by which organisms propagate their genes.

Evaluating the population and Convergence Check

The fitness function can be implemented like that in any other GA, and it will differ greatly depending on the problem that is trying to be solved. Each gene has two potential outputs that could be used in the fitness function as mentioned above. So it needs to be decide whether the implementation is going to be dealing with graphs or values. The evaluation step starts with going through each individual, and evaluating them. Evaluation has to do with how well their genes perform in the fitness function. These details are dependent upon the problem being solved by the algorithm. The convergence check is also dependent on the implementation, traditional checks for convergence were outlined in previous chapters.

Mate and Mutate

The mating step is another item that needs to be implemented, but it is not necessarily dependent upon the problem. The crossover operator is already defined for chromosomes. What needs to be defined is which individuals will mate with other individuals. When this occurs, mutation in the genes is simulated using each chromatid’s mutability factor. The number of chromosomes can also be mutated based upon the mutation factor in each cell of the organism.

Which individuals get chosen for the next generation of the population also needs to be decided by whoever is implementing the algorithm. There are many methods of choosing this that have been described in previous chapters, any of which could possibly be implemented.

See Also


Cite error: <ref> tags exist, but no <references/> tag was found