Submitted by jrohwer on Tue, 2006-03-21 22:17
Although, as Doug pointed out in class, there are important differences between connectionist networks on computers and neurons in the brain, I still think that the name "Neural Networks" is justified by the fundamental similarity between the two. This fundamental similarity, which I think is very important and which also implies many promising possibilities for simulated neural nets--based on phenomena in the brain--that have not yet been explored in AI research, ...this fundamental similarity is the way in which information is manipulated through destroying some info and then copying the result and distributing it to create new info. That is, when the activations of input nodes in a computer network are summed in a node, which node each of those signals came from is lost. But the information that remains--the combined strength of those inputs--is then copied and distributed to nodes in the next layer, where the process is repeated. I think that this method of information manipulation is an incredibly important concept in and of itself; therefore, we should acknowledge connectionist networks' debt to the structure of the brain. Furthermore, I think there is (or will eventually be) a lot more we can do with neural nets based on observations of how the brain works. Recently in one of my biopsych classes we've been studying learning at the synaptic level. Sadly, current understanding of the mechanism of learning at a higher level is pretty limited, at least from what I can tell. However, what seems important is the fact that the brain somehow accomplishes the learning of complex behaviors (through mechanisms of which do have some rudimentary knowledge, making eventual understanding not improbable) which certainly involve highly complex recurrent circuitry--something that, unless I am mistaken, we do not have an effective learning algorithm for. Backprop only "really" works for feed-forward networks--it can train an Elman network (simple recurrent network), but this is just an approximation of true recurrent processing. I hope someone will correct me if I've got this wrong, but a fully recurrent network can't be trained using backpropagation because when the flow of signals goes through loops it becomes impossible (or is it just very difficult?) to keep track of which nodes were responsible for the final error. Anyway, my point is that somehow the brain accomplishes the training of such complex networks, and it has something to do with nitric oxide--we actually understand pretty well how release of NO at the synapse is responsible for strengthening the connection. It is known that if a neuron is highly stimulated (saturated for a period of time with lots of inputs, so that it is kept at its action potential and keeps on firing) the synaptic connection will be strengthened by NO acting on various mechanisms, resulting in LTP (long-term potentiation) of that connection, which is a form of learning. I won't go into the details of the process. But a question that I have not found an answer to is: how does the brain know to repeat the behavior in the first place so that it can be strengthened? How does the information that it is a "good" behavior affect the learning mechanism in the first place? I wish there were detailed knowledge of this, because I think it would be directly applicable to training fully recurrent artificial networks. Somehow the brain's reward circuits (which I guess act as a sort of fitness function, evaluating the success of an action based on pre-programmed criteria such as satiation of hunger or whatever) encourage repetition of the behavior/circuit activation that was rewarded, leading to strengthening of the connections involved. Obviously we understand how this works at an abstract level--the question is: what is the molecular mechanism?? And how is the behavior refined to closer approximate the desired output? At the risk of making this post a little too long, here is a hypothesis that may or may not have some small degree of merit/applicability to artificial networks: First: there are a few different ways to classify training of artificial networks (I got these definitions from Wikipedia). Some tasks, such as the ones we studied in class, are best addressed with supervised learning, in which each input is paired with a desired output and the fitness function measures the network’s deviation from this target output. There is also unsupervised learning, in which the inputs are not associated with a specific target output, and thus the fitness function must be based on the network’s deviated from some other more general function (more general than a one-to-one pairing of outputs to inputs as in supervised learning, that is). Finally, there is reinforcement learning, where neither the input nor the target output is picked out beforehand. The system interacts with some environment and receives feedback on its performance, which acts as a fitness function. This last method is clearly the type of training our brains must undergo (although we are still capable of learning tasks better suited to the first two). The point is that the brain’s reward pathways give us feedback (according to, at least at first, and at the most basic level, biologically preprogrammed criteria) regarding our performance as we interact with the world, and that we modify our behavior accordingly, even though this behavior is based on enormously complex neural circuits. I think a fundamental part of being able to act on this feedback must be short-term memory. Actually, that sounds pretty obvious, but I’ll try to relate it to how recurrent neural nets could learn. How do we learn that a sequence of motor movements results in a reward? Or, to use a well-studied example, how do rats learn that the sequence of motor movements involved in pushing a level leads to their receiving food, which in turn leads to satiation of their hunger? I’ll ignore the satiating hunger part because I think they learn pretty early on, or are born knowing, that foods that smell a certain way are good, or something. But still, they must somehow remember what they did to get the food so that when they get it (that is, when they experience the reward) they can repeat the behavior. A context bank like Elman’s sounds like the right idea here, but it would be insufficient to retain memory of a complex sequence of actions. So, maybe each node in the network could keep a running record of its past activations for some time period (or some n number of impulses) determined by the complexity of the behavior we’re trying to train. In fact, there could be a mechanism somewhat like this in the brain—neurons are pretty complicated, and it’s not inconceivable that some could be specialized to be easily primed by even a moderate level of stimulation for future action potentials. This is purely speculation. Anyway, it doesn’t really matter how it’s done in the brain if this would work for simulated neural nets. Actually, this whole thing depends on a pretty complex system that interacts with an environment in at least some approximation of real time, so maybe it’s not that relevant to what we’ve been discussing in class. But here is a final idea on this, or at least an attempt to make what I’ve been saying relate to a coherent idea… We tend to remember sequences of actions in abstract terms, like “the rat pressed the lever.” We don’t remember the activation of all the neurons in all the circuits in the rat’s brain that were responsible for the action. So what if we could design a simulated neural net that could do the same sort of thing? What if we gave each input node and each highly-recurrently-linked node (since these would probably take a few iterations to stabilize at whatever level of activation was optimal for accomplishing the task, so we could try letting them just “jump ahead” to that state of activation in order to repeat the task) a memory of its past activations, and then allowed some part of the network to process that list of past activations in conjunction with feedback on performance, with the goal of finding patterns of activation that result in rewards. This would be easier than the reinforcement learning situation, too, because it would basically be a set of inputs (an array of activations of lots of nodes at a few different points in time) corresponding to outputs (the times when the network’s performance was evaluated positively or negatively by the fitness function). Finally, the trained output of that second section of network would be an input to the first section—if training was successful, this input would contain the (possibly complex) pattern(s) of activation that led to the desired complex behavior. And these patterns could, and probably must, exist across time or across iterations; but it seams feasible that somehow the output of the second section could be synchronized with the first section (for example, by creating a simple counter at the input of the first section so that the inputs could choose which part of the pattern/which term of the sequence they should be following, depending on what iteration they were on); thus, the encouragement of complex rewarding behaviors could be accomplished. I wonder: does this make sense to anyone? I think there are lots of things like this that we could play around with to try to give neural nets better problem solving capabilities with just a little help in their design. After all, the brain doesn’t start as a randomly connected feed forward network with only one section. There are lots of genetically coded, special purpose modules that interact in set ways to accomplish the intelligent behavior of which we are capable. So why not try this with neural nets?