HOME OF THE CREATIVITY MACHINE
The Big Bang of Machine Intelligence!
The IEI Blog is now open for discussion of new business ventures and opportunities!
Neural Networks That Autonomously Create and Discover
(US Patent 5,659,666)
Stephen L. Thaler, Ph.D.
President and CEO, Imagination Engines, Inc.(Adapted from "Neural Networks that Create and Discover", PC AI, May / June, 1996)
When the internal architecture of a trained artificial neural network is gradually relaxed or destroyed, that network tends to spontaneously produce a succession of "impressions" from it's learned knowledge domain. I refer to this state as "dreaming." A dreaming net's output stream often holds a mixture of both straightforward and hybridized exemplars from its training set. If we allow a second neural network to watch for any useful concepts that emerge from the first, we form a so-called "Creativity Machine." Creativity machines may perform remarkable feats of invention and discovery, ranging from the composition of music to the prediction of totally new ultrahard materials.
The history of Artificial Intelligence is replete with attempts at duplicating many aspects of human ingenuity. For the most part, researchers have emulated creativity via algorithmic schemes, laboriously developed over months or years to demonstrate specific features of creativity within well-delineated problem domains (Rowe and Partridge, 1993). BACON, for instance, gleaned mathematical relationships from bodies of raw astronomical data, rediscovering Kepler's third law of planetary motion. AM and EURISKO attempted to discover mathematical concepts, while GLAUBER and STAHL set out to deduce qualitative laws, particularly within the field of chemistry.
Although each of these efforts was admirable within its time period and technical context, all utilized heuristic and algorithmic search mechanisms, busily following decision trees constructed from hard-won rules and models. Most of the tedium of constructing these systems involved the definition of many domain-specific rules (and perhaps procedures for generating rules from those rules).
The advent of artificial neural networks may greatly accelerate such efforts to reproduce human creativity, since the newer systems self-organize to form their own rules about what they experience. That is, within training, we need only expose an artificial neural network to a few judiciously chosen data points within any given database to accurately generalize the relationship between all involved parameters. In the midst of such training, connection weights between processing units grow and dissolve to simulate the underlying schema and mechanisms. Specific pathways develop within the network, embodying sufficient logical, comparative, and algebraic relationships to accurately generalize what it has experienced. We thereby reduce the production of models to the iterative application of historical examples to the network, thus freeing the creativity researcher from the tricky and often time-consuming ordeal of gleaning all of the whys and wherefores within any given problem domain. In short, the design of artificially creative systems receives a significant boost, allowing us to concentrate on the mechanisms of creativity rather than the acquisition of problem-specific knowledge.
A further boon to the investigation of creativity is that once we develop such neural network based systems we attain a purely connectionist paradigm to compare with likewise connectionist neurobiology. After all, we anticipate that all brain function, including creativity, somehow originates in the collective behavior of neurons. Therefore, if we can attain any degree of originality from these artificial systems using a limited palette of neurobiological analogs such as computational neurons and connection weights then we may more readily make contact and comparisons with brain studies. Attaining these network-based creativity systems we may eventually announce various degrees of success in modeling human inventiveness and then identify exactly what elements are needed to bridge the gap.
In this article, we preview just such a connectionist paradigm for creativity. It starts from the raw, essential ingredients of simulated biological neural nets, artificial neurons (processing units) and synapses (connection weights). In addition to this elementary mix, we add just one more easily implemented ingredient...noise.
Imagination from Noise
Noise (a.k.a. chaos) is everywhere in the physical world. We would be naive to believe that it is not part of biological networks. Within the artificial neural network community, research on noise mostly deals with improvements to the self-organizational process of neural networks (LeCun, 1990), the simulation of brain pathologies (Plaut, 1993), and modeling of unpredictability in human decision-making (Aihara, in Freeman, 1994).
No one, as far as I know, has tried to harness the effect of noise within neural nets to produce any type of useful information. In the discussion that follows, I describe how such characteristically human accomplishments as invention, discovery, and general creativity may arise from sources of noise inside biological neural networks.
We generally believe that neural networks require inputs to function. That is, a given network may represent a mathematical mapping between some set of inputs and a corresponding set of outputs. In biology, the inputs form the basis of sensory impressions from the external environment, appropriately vectorized, and then passed on to a cascaded system of neural networks to activate a complex chain of associations. Therefore, an associative network may categorize the presentation of a taste vector to the tongue (consisting of the four taste components of sweetness, saltiness, sourness, and bitterness) as a fudge sundae. Subsequent nets within a cascade may register suitable pleasure at the experience, while others may associate an expanding waistline accompanying the sizeable caloric intake. Feedback effects may even change our perception of the taste by meddling with the magnitudes of the original taste input components, perhaps producing an unpleasant taste to curb our appetites.
Suppose we deny all inputs to an artificial network. This negation of information is analogous to cutting off sensory input to the brain (as in a sensory deprivation tank). If we now add to the original neural network some form of random disturbance* consisting of small, variable modulations to connection weights, the result (in my view) is equivalent to the process of internal imagery within the mind. That is, the network visits tastes, or taste vectors, it has previously experienced, while occasionally experimenting with new combinations of the elementary sensory categories. Ever so small proportions of saltiness may be blended with a pure sweetness vector to produce a previously untasted flavor.
If such spontaneous hybridization of tastes occurs within networks of the human brain, the resulting "taste imagery" could motivate an individual to rummage through the refrigerator or to concoct a new recipe from raw kitchen ingredients. The action of acquiring the snack may spontaneously arise within novel movement planning centers of the brain as noise likewise produces alternative plans of action to choose from. The fascinating aspect of this process is that the image of this admittedly humble culinary invention, as well as the correct series of motions to acquire it, could originate from internal neurobiological chaos.
An Trivial Example of Virtual Input Effect
Consider a simple feedforward network that maps the polar coordinates on a unit circle to the corresponding cartesian coordinates, x and y. The network then absorbs the implicit constraint relationship
x2 + y2 = 1,
within its connection weights and biases, insuring that the output vector (x,y) lies on the unit circle. If we now hold the input at a constant value of 45 degrees while applying small random perturbations to the connection weights, we witness a series of output vectors, V generally obeying the above constraint equation. For all intents and purposes, it appears as though a series of q values are being applied to the corresponding network input. We call these apparent theta input values 'virtual inputs' and the network's false perception that a series of theta values are being applied, 'virtual input effect.' By progressively increasing the magnitude of the random disturbances applied to the connection weights, we see output vectors such as V' and V" which gradually begin to depart from the unit circle. In essence, V would represent either a vector presented to the network in training, or a novel vector obeying the cumulatively learned constraint relation. The novel vectors V' and V" would then represent small departures away from the knowledge domain represented by the unit circle.
I have referred to this process as the "virtual input effect" wherein an artificial or a biological network tumbles through a set of both plausible and slightly mutated possibilities within its knowledge domain as a result of internal noise (Thaler, 1993,1995). (See the sidebar.) I've seen this effect in thousands of networks, both simple and complex, with myriad forms of connectivity and activation.
Figure 1. The Virtual Dancer.
In Figure 1, for instance, we see the effect of randomly altering the connection weights within a neural network that has "seen," through training, twelve arbitrary poses of a human volunteer. The network outputs the x-y projections of major skeletal joint and terminal positions. Here I've set the network inputs to constant values, thereby freezing network outputs. In essence, the network is "blindfolded" If I now apply small variable disturbances to randomly selected connection weights, we see a progression of network outputs corresponding to various stick figure configurations. We note that none of the positions shown are part of the exemplars shown to the network in training. Nevertheless, as we allow such disturbances to sporadically hop from one connection weight to another, we see a succession of plausible body positions as though we were viewing a series of still photos or a movie of a would be dancer, as intimated within the Figure.
Within this choreography, we see realistic motion, with all joints moving as a unit in realistic net translations of the stick figure, and surprisingly physically sound positioning of the figure's center of gravity. We observe that this fidelity to realistic motion did not result from hours of tedious analysis and equation-writing. The sequence is instead the product of a minute's training of a simple feedforward network. As a result of this training, the network has captured all the implicit rules of body kinematics that keep all terminations and joints within realistic range and relationship to one another. In effect, the network generalizes what constitutes realistic body motion. We have simply utilized internal network noise to interrogate the net for numerous instances of that motion.
Figure 2. Virtual Auto Designs
This general process may be repeated for diverse problems such as automobile body design. For the sake of discussion I have shown a network the silhouettes of a dozen late model automobiles, representing their shapes as a series of simple polygons defined through the x-y coordinates of their vertices. Again, as in the case of the dancer, as we allow small magnitude disturbances to randomly hop from synapse to synapse, we see a progression of body shapes the network has never before seen. As we turn up the magnitude and the numbers of such hopping disturbances, we finally produce a nonsensical shape, more likely a Picasso creation than of any type of Detroit design. (This progression from the plausible to the absurd points out the fact that there are limited regimes of degradation to a network mapping which result in useful notions.)
In a related yet more sophisticated exercise, we can train a neural network to map automobile performance characteristics to the required design specifications. Again, the training technique is rather straightforward, requiring that we repeatedly expose a network to examples of performance inputs and corresponding design outputs. The network then self-organizes to capture all of the implicit rules of automotive design. Months of interviewing a Detroit designer are unnecessary.
If we now subject this trained automotive network to a similar sequence of random weight modulations, with inputs either fixed at indeterminate values or free to wander, we find that the network's outputs produce many possible automobile designs. Rather than generating myriad nonsensical combinations of design specifications, the soft constraints represented within the slightly perturbed connection weights guarantee the plausibility of the car designs that emerge from the net. For instance we see how such a free-wheeling network may concoct sensible combinations of engine displacement and numbers of cylinders so that all four cylinder cars imagined by the net have displacements between 1.5 and 2.5 liters at approximately 100 horsepower. This is just one of many constraint relationships locked within the network and then generalized in the emerging car designs.
It's important to realize that the invention of new automobile types originates within the progressive relaxation of these once rigorous constraint relationships. We therefore see known car types smoothly transitioning to those never before seen, yet generally satisfying the underlying principles of automotive design. Essentially, through the gradual degradation of the network mapping, through its sundry connection weights and biases, we are systematically departing from the known universe of automobile characteristics and into the surrounding shell of plausible, yet radical car designs. This process is very much akin to the procedure for expanding an analytical function of a single variable about a point on a line as in a Taylor series expansion. Here, we are "expanding" a very complex, nonlinear vector function, specifying some concept, about a region of multidimensional space. In contrast to the exorbitant task of permuting all values of each parameter, producing a nightmarish combinatorial explosion of unrealistic possibilities, we very efficiently narrow the search space in the quest for any new concept.
As an example of this "generalized expansion about a concept," consider the effect of progressively increasing the disturbance to each connection weight in the automobile network, thereby relaxing the rule that 1.5 to 2.5 l displacements correspond to 4 cylinder engines. Instead we produce a softened rule in which a 5 cylinder engine may fall within the displacement range ordinarily associated with 4 cylinder engines. Whether such a design concept may or may not have usefulness or practicality remains to be seen.
Suppose we were to train a network to judge dance choreographies. We could obtain a training set by having a panel of dance aficionados watch a hundred performances of the rather unpredictable neural dancer described above and offer esthetically driven numerical scores. The network trained on this data would then map, say, a series of ten consecutive dance steps, ascertained through successive joint coordinates, to a numerical score ranging from 0 to 10, reflecting the gamut between awkward and pleasing movements. In effect, the network has learned the implicit rules and patterns of the test panel's choreographic tastes.
If we now connect the inputs of this network with the outputs of the randomly perturbed network that is "dreaming" various bodily movements, we gain the unique advantage of capturing any appealing dance sequences emerging from the noise. If we cyclically refresh the dreaming network, restoring its connection weights periodically and then reapplying synaptic chaos, we obtain a system that may autonomously invent new dance choreographies. The possibilities are endless considering in the most conservative case, connection weight pruning as the internal perturbation. There, even with a modestly sized network of 50 weights, there would be 50! or roughly 3x1064 different destruction scenarios and a proportional number of dance sequences to consider.
Figure 3. The Fundamental Creativity Machine Architecture
Figure 4. Ultrahard Materials Predicted by Creativity Machine
This general architecture, consisting of first a network subjected to internal noise sources (dreaming, if you will) and a second network watching the notions produced by the first, is what I call the "Creativity Machine" or "Creativity Machine Paradigm" (Figure 3). Further, I refer to the first network as an "imagination engine" (IE) and the second, an "alert associative center" (AAC). Just such a scheme is shown in Figure 2, above, where we have depicted the car silhouette network described above subjected to internal noise disturbances thereby producing a sequence of simplistic auto designs. In like manner to the choreographic machine described above, a second supervising network has been trained to duplicate a test panel's subjective responses to say the esthetic appeal and utility of various candidate designs. In Figure 3, we see the imagination engine visiting just three possibilities, A, B, and C. Designs A and B do not simultaneously satisfy the AAC's criteria of appeal and utility, whereas possibility C does. (Blue neuron coloration signifies low output, whereas red represents high output). Of course, in a real computer implementation, the IE would be producing hundreds or thousands of different design possibilities per second with the AAC, acting as a neural surrogate for the test panel, capturing and then storing to file a multitude of both appealing and useful car silhouettes.
Similarly, we could train an AAC network to map car design parameters to performance values, taking the stream of imagined automobile designs emerging from a dreaming network, and handing those outputs to this policing network so that it may instantaneously evaluate each design's performance features. Therefore, to design an automobile that accelerates from 0-60 mph in at most 7 seconds, achieves a highway mileage of at least 30 mpg, and costs less than $30,000, we capture designs that emerge from the dreaming network and fall within the range of these AAC evaluated performance values.
Some Preliminary Creativity Machine Accomplishments (1996)
In a manner similar to that used to generate new dances and automobile designs we have readily shifted between diverse creative activities, ranging from the generation of myriad musical 'hooks,' to the devising of new stock market strategies, to the synthesis of novel materials. In Figure 4, for instance, we see the results of a very short run of a Creativity Machine designed to predict new ultrahard materials. To construct this machine, we trained the IE network by exposing it to about 200 examples of chemically correct formulas such as H20, Fe2O3, etc. Thus trained, the network developed its own internal connection strengths emulating the complex rules of how any elements A and B are plausibly combined into the formula AxBy (with integer x and y representing the naturally constrained proportionality of these elements). With the application of noise to its architecture, the network output consists of a stream of plausible chemical formulas the network had never seen such as H2O2, and Fe3O4. We trained the AAC network for this Creativity Machine to map such chemical formulas as AxBy to hardness by exposure to 200 examples of chemical compounds and their literature "Knoop" hardness values. Once connected to the outputs of the noise activated IE network, the AAC then evaluated the hardness of each of the emerging hypothetical compounds, with the hardest of these recorded within an archival file to generate the plot of Figure 4.
There, simply for the price of electricity and the short time to train the two respective networks, we have rediscovered many of the known ultrahard materials, including cubic boron nitride and boron carbide. In Figure 4, for instance, we have labeled several materials as "inventions," meaning that neither component networks in the Creativity Machine had "seen" the respective materials in training. Therefore, whereas humans may have slowly discovered these ultrahard materials over the last century, the Creativity Machine, suitably focused on the problem of ultrahard materials, was able to zero in on these materials in a matter of seconds or minutes. Among these discoveries is a controversial new material conjectured to be the beta phase of carbon nitride, C3N4. The machine also takes the daring step (not me) of proposing a series of ultrahard hydrogen-deficient polymers of carbon, boron, and beryllium. From a material scientist's perspective, these polymers make sense, since even diamond may be considered to be a carbon polymer with successively more hydrogen subtracted from its structure, until carbons bond with each other to form a dense three-dimensional network. The synthetically discovered hydrocarbon polymer looks surprisingly like the well-known "amorphic" or "diamond-like carbons" that are characteristically very hard when hydrogen is present in atomic fractions less than 5-10%. Both boron and beryllium could potentially produce analogous ultrahard networks. (Realize of course, that the plot in Figure 4 represents just a very short run of this Creativity Machine and that the more prolonged and informative runs contain highly proprietary information to the materials trade.)
In a sense this process of ultrahard materials discovery could be likened to data mining in which the AAC examines a large, fixed data base of known chemical compounds, identifying the hardest of these. Such an approach would be well within the realm of known neural network technology. The unique advantage offered by the Creativity Machine approach is that the data base is not real and static, but virtual and dynamic. The IE network manufactures an ever-growing number of plausible chemical compounds, many of which may not have been previously synthesized or discovered, thus sidestepping the problem of acquiring vast magnetic databases through hands-on research. Instead we rely upon a few strategically chosen examples to serve network training. In the case of simple binary compounds, this advantage may not at first seem significant considering only 100 elements and 100x100 = 10,000 simple combinations, but stop to consider the staggering number of just potential compounds of carbon and hydrogen (i.e., the hydrocarbons).
Especially evident in this materials synthesis problem is the economy of effort in efficiently generating plausible chemical formulas to be considered by the AAC network mapping chemical formula to hardness. We observe that significant effort has gone into a genetic algorithm (GA) approach (V. Venkatasubramanian, 1995) to attempt what this chemical creativity machine has already accomplished. The problem with applying GA to this problem is that of combinatorial explosion in which highly unlikely chemical species are produced. Again, this is a non-problem for the Creativity Machine since all component networks contain all of the required constraints to effectively narrow the problem's search space. I believe that this unique advantage will repeat itself over and over again within diverse applications.
Creativity is in essence a search process, something that a number of past AI systems have been able to attain with various levels of efficiency. Some produce a plethora of possibilities without regard to the problem's specific requirements and then enlist specialized algorithms to sift through the extensive computational possibilities amassed, in a so-called "Neo-Darwinian" search. Alternately, some search engines laboriously generate possible solutions rigorously obeying the relevant constraints in "Neo-Lamarckian" schemes. Not only are both of these approaches time consuming, the required software development may consume extraordinary schedule and resources as programmers convert the underlying rules and reasons into extensive tables of if-then statements.
Here, I have successfully utilized a purely connectionist plan to implement a search technique that is an efficient compromise between the Neo-Darwinian and Neo-Lamarckian schemes. Such a preferred discovery scheme is termed "multi-stage, " (Partridge and Rowe) employing both partial censoring (the IE output) and then suppression (in the AAC stage). More importantly, we train all components by example, oftentimes reducing to minutes what would have required months or years to accomplish by algorithmic means. To accentuate this later point, I have proven the flexibility and turn-around capacity of the concept by embarking upon such diverse tasks as musical composition to materials synthesis. We also note that In view of the purely neurological means to attain these successes, I maintain that this new and very general scheme may be a potential model of human creativity and the stream of consciousness which fuels it. Intuitively, the application of noise sources to the internal architecture of neural networks appears to model the very desultory progression of human thought and invention. Moreover, detailed study of the temporal distribution of both Creativity Machine output and articulated human thought shows these rhythms to be identical (Thaler, 1995).
Although the cognitive scheme advanced may fan the fires of debate within both the AI and consciousness communities, the so-called "Creativity Machine" architecture, utilizing the virtual input effect, has proven to be an entirely neuronal approach to the problem of autonomous discovery, deviating radically in plan from past rule- and model-based attempts. Rather than being a particular system, it may be considered to be a general computational paradigm that may be very flexibly applied to just about any problem domain imaginable. Its only drawback will be the inevitable sociological battles to demonstrate the novelty and utility of concepts it produces. But isn't this the plight of any creative human mind?
* The Creativity Machine or "Device for the Autonomous Generation of Useful Information" is the patented concept of Imagination Engines, Inc (US 5,659,666). Currently, licensing and consulting is available through that business entity.
Freeman, D.H.(1994). Brainmakers, Simon and Schuster, New York, N.Y.,117-118.
LeCun, Y.Denker, J.S., & Solla, S.A. (1990). Optimal Brain Damage, In D.S.Touretsky (Ed.), Advances in neural information processing systems (pp. 598-605) San Mateo, CA: M.Kaufman.
Plaut, D.C. (1993). Deep Dyslexia: A case of connectionist neuropsychology. Cognitive Neuropsychology, 10(5), 377-500.
Rowe, J. and Partridge, G.(1993). Creativity: a survey of AI approaches, Artificial Intelligence Review, 7, 43-70.
Thaler, S. (1993). 4-2-4 encoder death, Proceedings of the World Congress on Neural Networks (Vol. 2, pp. 180-183), Portland Oregon.
Thaler, S. (1995). 'Virtual Input Phenomena' within the death of a simple pattern associator, Neural Networks, 8(1), 55-65.
Thaler, S. (1995). Network cavitation as a model of stream of consciousness, Under Review by Neural Networks.
Venkatasubramanian, V. (1995). Computer-Aided Molecular Design Using Neural Networks and Genetic Algorithms, Bulletin of the American Physical Society, 40(1), 591.
Stephen Thaler is the founder and President of St. Louis-based Imagination Engines, Inc.
* Perturbations which you might find within a
biological network such as fluctuations in cell membrane potential, neuronal
cross-talk, and quantum mechanical effects.
© 1997-2016, Imagination Engines, Inc. | Creativity Machine®, Imagination Engines®, Imagitron®, and DataBots® are registered trademarks of Imagination Engines, Inc.
1550 Wall Street, Ste. 300, St. Charles, MO 63303 • (636) 724-9000