Chapter 19: Learning in Neural and Belief Nets
Slides for the May 30th lecture on artificial neural networks may be
found here. If you have difficulty
viewing these PostScript slides, please send mail to the TA at dnoelle@cs.ucsd.edu.
- How the Brain Works
- The neuron:
- soma: cell body
- dendrites: are plentiful, lead to cell body, carry info to soma
- axon: just one, lead away from soma
- synapse: dendrite/soma junction
- May be excitatory or inhibitory
- action potential: electrical pulse sent down axon
- The brain proper
- cerebral cortex: outer layer of brain that does most processing
- aphasia: inability to speak
- Mappings between areas of brain and parts of body they control/monitor:
- may change with time
- may have multiple maps
- Brain exhibits graceful degradation
- Neural Networks
- Have units connected by links
- Links have associated weights
- Units have time-varying activation levels
- Maybe computed as a linear function on inputs
- Maybe a step function or sigmoid
- Network structures:
- Feed forward:
- Links unidirectional
- no loops
- no internal state (other than weights)
- Recurrent
- May have internal state
- Computation may become unstable or oscillate
- Examples:
- Hopfield Nets:
- bidirectional, symmetric weights
- activation function is sign function (+1 or -1)
- All units are both input and output
- is an associative memory:
- After training, when queried it returns the training image that
the query image was most similar to.
- Each weight encodes part of all images
- Boltzmann machines
- symmetric weights
- Has separate input/hidden/output units?
- stochastic activation: prob(output=1) = some function of weighted input
- Similar to simulated annealing
- Are a special case of belief nets with
stochastic simulation algorithm
- Local encoding vs distributed encoding
- Perceptrons
- Single-output-layered, feed forward
- activation is step function (0 or 1) on weighted sum of inputs
- Advantages:
- Can do:
AND, OR, NOT,
majority fnc
- more compact representation of representing majority fnc than
decision tree
- Disadvantages:
- Inputs can only push output in one direction no matter what other
inputs are.
- output space must be linearly separable
- can't represent:
XOR
- Learning
- Rosenblatt (1960) "May learn any linearly separable function"
- form of gradient descent
- Algorithm:
- Randomize weights
- For each learning epoch update all weights
- W[j] += alpha * I[j] * Err
- where:
- alpha = learning rate (gen constant?)
- I[j] = input value
- Err = (correct) - (given)
- Multilayer Feed-Forward
- Backpropogation: method of learning
- Hidden units:
- Too few: learning faster but may not be enuf to represent function
- Too many: learning slow, may memorize examples and not generalize
- try cross-validation
- Weight updating:
- W[j,i] += alpha * a[j] * Err[i] * g'(in[i])
- where:
- i = output node
- j = hidden node
- alpha = learning rate
- a[j] = activation of hidden node j
- Err[i] = (correct output node val) - (given output node val)
- g' = derivative of activation function
- in[i] = weighted sum of inputs going into output node
- Algorithm:
- Pass a pattern thru the net
- Compute Err[i] * g'(in[i]) for the output nodes
- For the output layer to the input layer do
- propogate Err[i] * g'(in[i]) values back
- update weights between layers
- It's a good idea to keep Err[i] * g'(in[i]) values around to save time
- Is a gradient descent search:
- Gradient divided among units, so each can update locally
- Generally use sigmoid function
- Discussion:
- Expressiveness: are attribute based so don't have expressive
power of logic
- Computational efficiency: depends on training time
- Generalization: general good when output varies smoothly with
input
- Sensitivity to noise: generally tolerant
- Transparancy: Not. Are a "black box". Hard to reverse
engineer
- Prior knowledge: Hard to apply because not transparent
- Applications of Neural Nets
- Pronunciation: Nettalk (Sejnowski and Rosenberg 1987)
- Handwritten character recognition