Confabulation (neural networks)

A confabulation, also known as a false, degraded, or corrupted memory, is a stable pattern of activation in an artificial neural network or neural assembly that does not correspond to any previously learned patterns. The same term is also applied to the (nonartificial) neural mistake-making process leading to a false memory (confabulation).

YouTube Encyclopedic

1/2
Views:
6 511
1 320

Transcription

In this video, I will introduce Restricted Boltzmann Machines. These have a much simplified architecture in which there are no connections between hidden units. This makes it very easy to get the equilibrium distribution of the hidden units if the visible units are given. That is, once you've clamped the datavector on the visible units, The equilibrium distribution of the hidden units can be computed exactly in one step because they're all independent of one another, given the states of the visible units. The proper Boltzmann machine learning algorithm is still slow for a restricted Boltzmann machine. But in 1998, I discovered a very surprising shortcut that leads to the first efficient learning algorithm for Boltzmann machines. Even though this algorithm has theoretical problems, it works quite well in practice. And it led to a revival of interest in Boltzmann machine learning. In a restricted Boltzmann machine, we restrict the connectivity of the network in order to make both inference and learning easier. So, it only has one layer of hidden units and there's no connections between the hidden units. There's also no connections between the visible units. So, the architecture looks like that, it's what computer scientists call a bipartite graph. There's two pieces, and within each piece, there's no connections. The good thing about an RBM is that if you clamp a datavector in the visible units, you can reach thermal equilibrium in one step. That means with a datavector clamped, we can quickly compute the expected value of vihj because we can compute the exact probability with each j will turn on, and that is independent of all the other units in the hidden layer. The probability that j will turn on is just the logistic function of the input that it gets from the visible units and quite independent of what other hidden units are doing. So, we can compute that probability all in parallel and that's a tremendous win. If you want to make a good model of a set of binary vectors, then the right algorithm to use for a restricted Boltzmann machine is one introduced by Tieleman in 2008 that's based on earlier work by Neal. In the positive phase, you clamp the datavector on the visible units. You then compute the exact value of the expectation vihj for all pairs of invisible in the hidden unit. And you could do that cuz vi is fixed, and you can compute vj exactly. And then, for every connected pair of units, you average the expected value of vihj over all the data vectors in the mini batch. For the negative phase, you keep a set of fantasy particles that is global configurations. And then, you update each fantasy particle a few times by using alternating parallel updates. So, after each weight update, you update the fantasy particles a little bit and that should bring them back to close to equilibrium. And then, for every connected pair of units, you average vihj over all the fantasy particles, and that gives you your negative statistics. This algorithm actually works very well, and allows RBMs to build good density models or sets of binary vectors. Now, I am going to go on to our learning algorithm that is not as good at building density model but is much faster. So, I'm going to start with a picture of an inefficient learning algorithm for restrictive Boltzmann machines. We're going to start by clamping a datavector on the visible units, and we're going to call that time t0. So, we're going to use times now, not to denote weight updates, but to denote steps in a Markov chain. Given that visible vector, we now update the hidden units. So, we choose binary states for the hidden units and we measure the expected value, vihj, for all pairs of visible and binary units that are connected. And I'll call that vihj zero to indicate that it's measured at time zero, With the hidden units being determined by the visible units. And, of course, we can update all the hidden units in parallel. We then use the hidden vector to update all the visible units in parallel, and again we update all the hidden units in parallel. So, the visible vector t1 = one, we'll call a reconstruction, or a one-step reconstruction, And we can keep going with the alternating chain that way, Updating visible units, and then hidden units, Each set being updated in parallel. And after we've gone for a long time, We'll get to some state of the visible units, or I'll call t infinity to indicate it needs to be a long time and the system will be at thermal equilibrium, and now, we can measure the correlation of vi and hj after the chains run for a long time and I'll call that vihj infinity. And the visible state we have after a long time, I'll call it fantasy. So now, the learning rule is simply, we change Wij by the learning rate times the difference between vihj at time zero and vihj at time infinity. And, of course, the problem with this algorithm is that we have to run this chain for a long time before it reaches thermal equilibrium. And if we don't run it for long enough, the learning may go wrong. In fact, that last statement is very misleading. It turns out that even if we only run the chain for a short time, the learning still works. So, here's the very surprising shortcut. You just run the chain up, down, and up again. So, from the data, you generate a hidden state, from that. You generate a reconstruction, and from that, you generate another hidden state. And you may have a statistics once you've done that. So, instead of using the statistics measured at equilibrium, we're using the statistics measured after doing one full update of the Markov chain. The learning rule is, and the same as before, except this much quicker to compute, and this is clearly is not doing maximum likelihood learning because the term we are using for negative statistics is wrong. But the learning, nevertheless, works quite well. Next week, we'll understand a bit more about why it works well. But for now, we'll just see that it does. So, the obvious question is why does actual cut work at all? And here's the reasoning. If we start the chain at the data, the Markov chain will wander away from the data and towards its equilibrium distribution. That is towards things that is initial weights like more than the data. We can see what direction it's wandering in after only a few steps. And if we know the initial weights aren't very good, it's a waste of time to go all the way to equilibrium. We know how to change them to stop it wandering away from the data without going all the way to equilibrium. All we need to do is lower the probability of the reconstructions of confabulations as a psychologist would call them, it produces after one full step, and then, raise the probability of the data. That will stop it wandering away from the data. Once the data and the places it goes to after one full step have the same distribution, then the learning will stop. So, here's a picture of what's going on. Here's the energy surface in the space of global configurations. Here's a data point on the energy surface, and by data point, I mean, both the visible vector and the particular hidden vector that we got by stochastic updating the hidden units. So, that hidden vector is a function of what the data point is. So, starting at that data point, we run the Markov chain for one full step to get a new visible vector and the hidden vector that goes with it. So, a reconstruction of the data point plus the hidden vector that goes with that reconstruction. We then change the weights to pull the energy down at the data point, and pull to the energy up the reconstruction. And the effect of that would be to make the surface look like this. And you'll notice we're beginning to construct an energy minimum at the data. You'll also notice that far away from the data, things have stayed pretty much as they were before. So, this shortcut of only doing one full step to get the reconstruction fails for places that are far away from the data. We need to worry about regions of the data-space that the model likes but which are very far from any data point. These low energy holes cause the normalization term to be big, and we can't sense them if we use the shortcut. If we use persistent particles, where we remembered their states, and after each update, we updated them a few more times, then they would eventually find these holes. They'd move into the holes, and the learning would cause the holes to fill up. A good compromise between speed and correctness is to start with small weights and to use CD1, that is contrust divergence with one full step to get the negative data. Once the weights have grown a bit, the Markov chain is mixing more slowly, and now we can use CD3. Once the weights have grown more, we can use CD5, or nine, or ten. So, by increasing the number of steps as the weights grow, we can keep the learning working reasonably well, even though the mixing rate of the Markov chain is going

Cognitive science

In cognitive science, the generation of confabulatory patterns is symptomatic of some forms of brain trauma.^[1] In this, confabulations relate to pathologically induced neural activation patterns depart from direct experience and learned relationships. In computational modeling of such damage, related brain pathologies such as dyslexia and hallucination result from simulated lesioning^[2] and neuron death.^[3] Forms of confabulation in which missing or incomplete information is incorrectly filled in by the brain are generally modelled by the well known neural network process called pattern completion.^[4]

Neural networks

Confabulation is central to a theory of cognition and consciousness by S. L. Thaler in which thoughts and ideas originate in both biological and synthetic neural networks as false or degraded memories nucleate upon various forms of neuronal and synaptic fluctuations and damage.^[5]^[6] Such novel patterns of neural activation are promoted to ideas as other neural nets perceive utility or value to them (i.e., the thalamo-cortical loop).^[7]^[8] The exploitation of these false memories by other artificial neural networks forms the basis of inventive artificial intelligence systems currently utilized in product design,^[9]^[10] materials discovery^[11] and improvisational military robots.^[12] Compound, confabulatory systems of this kind^[13] have been used as sensemaking systems for military intelligence and planning,^[12] self-organizing control systems for robots and space vehicles,^[14] and entertainment.^[12] The concept of such opportunistic confabulation grew out of experiments with artificial neural networks that simulated brain cell apoptosis.^[15] It was discovered that novel perception, ideation, and motor planning could arise from either reversible or irreversible neurobiological damage.^[16]^[17]

Computational inductive reasoning

The term confabulation is also used by Robert Hecht-Nielsen in describing inductive reasoning accomplished via Bayesian networks.^[18] Confabulation is used to select the expectancy of the concept that follows a particular context. This is not an Aristotelian deductive process, although it yields simple deduction when memory only holds unique events. However, most events and concepts occur in multiple, conflicting contexts and so confabulation yields a consensus of an expected event that may only be minimally more likely than many other events. However, given the winner take all constraint of the theory, that is the event/symbol/concept/attribute that is then expected. This parallel computation on many contexts is postulated to occur in less than a tenth of a second. Confabulation grew out of vector analysis of data retrieval like that of latent semantic analysis and support vector machines. It is being implemented computationally on parallel computers.

References

^ Gazzaniga, M. S. (1995). "The Cognitive Neurosciences", A Bradford Book, The MIT Press, Cambridge, Massachusetts.
^ Plaut, D.C. (1993). "Deep Dyslexia: A case of connectionist neuropsychology" Archived 2016-03-03 at the Wayback Machine. Cognitive Neuropsychology, 10(5), 377-500.
^ Yam, P. (1993). "Daisy, Daisy" Do computers have near-death experience, Scientific American, May 1993.
^ "Neural associative memory". rni.org. Archived from the original on 2008-10-20. Retrieved 2009-07-22.
^ Thaler, S. L. (1997a). U.S. 5,659,666, "Device for the Autonomous Generation of Useful Information", Issued 8/19/1997.
^ Thaler, S. L. (1997b). "A Quantitative Model of Seminal Cognition: the creativity machine paradigm", Proceedings of the Mind II Conference, Dublin, Ireland, 1997.
^ Thaler, S. L. (2011, "The Creativity Machine Paradigm: Withstanding the Argument from Consciousness" Archived 2012-11-15 at the Wayback Machine
^ Thaler, S. L. (2013) "The Creativity Machine Paradigm, Encyclopedia of Creativity, Invention, Innovation, and Entrepreneurship", (ed.) E.G. Carayannis, Springer Science+Business Media
^ Pickover, C. A. (2005). Sex, Drugs, Einstein, & Elves, SmartPublications, Petaluma, CA.
^ Plotkin, R. (2009). The Genie in the Machine: How Computer-Automated Inventing is Revolutionizing Law and Business, Stanford University Press
^ Thaler, S. L. (1998). Predicting ultra-hard binary compounds via cascaded auto- and hetero-associative neural networks, Journal of Alloys and Compounds, 279(1998), 47-59.
^ ^a ^b ^c Hesman, T. (2004). The Machine That Invents, St. Louis Post-Dispatch, Jan. 25, 2004.
^ Thaler, S. L. (1996). A Proposed Symbolism for Network-Implemented Discovery Processes, In Proceedings of the World Congress on Neural Networks, (WCNN’96), Lawrence Erlbaum, Mawah, NJ.
^ Patrick, M. C., Stevenson-Chavis, K., Thaler, S. L. (2007). "Demonstration of Self-Training Autonomous Neural Networks in Space Vehicle Docking Simulations", Aerospace Conference, 2007, 3–10 March 2007 IEEE
^ Yam, P. (1995). As They Lay Dying, Scientific American, May 1995.
^ Thaler, S. L. (1995). Death of a gedanken creature, Journal of Near-Death Studies, 13(3), Spring 1995.
^ Thaler, S. L. (2012). The Creativity Machine Paradigm: Withstanding the Argument from Consciousness, APA Newsletter, Volume 11, Number 2, Spring 2012.
^ Hecht-Nielsen, R (2005). "Cogent confabulation" Archived 2016-01-14 at the Wayback Machine. Neural Networks 18:111-115.

This page was last edited on 5 April 2024, at 01:19

From Wikipedia, the free encyclopedia