# Symboic and N

CS623: Introduction to Computing with Neural Nets (lecture-4) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Weights in a ff NN wmn is the weight of the connection from the nth neuron to the mth neuron E vs W surface is a complex surface in the space defined by

the weights wij E w gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error mn m wmn

n wmn E wmn Sigmoid neurons Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used! Sigmoid neurons with easy-to-compute derivatives used!

y 1 as x y 0 as x Computing power comes from non-linearity of sigmoid function. Derivative of Sigmoid function 1 y x 1 e x dy

1 e x ( e ) x 2 x 2 dx (1 e ) (1 e ) 1 x

1 e 1 y (1 y ) 1 x 1 e Training algorithm Initialize weights to random values. For input x = , modify weights as follows Target output = t, Observed output = o

E wi wi 1 E (t o) 2 Iterate until E 2< (threshold)threshold) Calculation of wi n 1 E E net where : net wi xi Wi net Wi i 0 E o net o net Wi (t o)o(1 o) xi

E wi ( learning constant, 0 1) wi wi (t o)o(1 o) xi Observations Does the training technique support our intuition? The larger the xi, larger is wi Error burden is borne by the weight values corresponding to large input values

Observations contd. wi is proportional to the departure from target Saturation behaviour when o is 0 or 1 If o < t, wi > 0 and if o > t, wi < 0 which is consistent with the Hebbs law Hebbs law nj wji ni

If nj and ni are both in excitatory state (threshold)+1) Then the change in weight must be such that it enhances the excitation The change is proportional to both the levels of excitation wji e(threshold)nj) e(threshold)ni) If ni and nj are in a mutual state of inhibition (threshold) one is +1 and the other is -1), Then the change in weight is such that the inhibition is enhanced (threshold)change in weight is negative) Saturation behavior

The algorithm is iterative and incremental If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region. The weight values hardly change in the saturation region If Sigmoid Neurons Are Used, Do We Need MLP? Does sigmoid have the power of separating non-linearly separable data? Can sigmoid solve the X-OR problem

(threshold)X-ority is non-linearly separable data) link O 1 yu O = 1 / 1+ e -net yl net

O = 1 if O > yu O = 0 if O < yl Typically yl << 0.5 , yu >> 0.5 Inequalities O = 1 / (threshold)1+ e net ) W2 X2 W1 W0

X0=-1 X1 <0, 0> O=0 i.e 0 < yl 1 / 1 + e(threshold)w1x1- w2x2+w0) < yl i.e. (threshold)1 / (threshold)1+ ewo)) < yl (threshold)1)

<0, 1> O=1 i.e. 0 > yu 1/(threshold)1+ e (threshold)w1x1- w2x2 + w0)) > yu (threshold)1 / (threshold)1+ e-w2+w0)) > yu (threshold)2) <1, 0> O=1 i.e. (threshold)1/1+ e-w1+w0) > yu (threshold)3)

<1, 1> O=0 i.e. 1/(threshold)1+ e-w1-w2+w0) < yl (threshold)4) Rearranging, 1 gives 1/(threshold)1+ ewo) < yl i.e. 1+ ewo > 1 / yl i.e. Wo > ln (threshold)(threshold)1- yl) / yl)

(threshold)5) 2 Gives 1/1+ e-w2+w0 > yu i.e. 1+ e-w2+w0 < 1 / yu i.e. e-w2+w0 < 1-yu / yu i.e. -W2 + Wo < ln (threshold)1-yu) / yu i.e. W2 - Wo > ln (threshold)yu / (threshold)1yu)) (threshold)6) 3 Gives W1 - Wo > ln (threshold)yu / (threshold)1- yu))

(threshold)7) 4 Gives -W1 W2 + Wo > ln (threshold)(threshold)1- yl)/ yl) (threshold)8) 5 + 6 + 7 + 8 Gives 0 > 2ln (threshold)1- yl )/ yl + 2 ln yu / (threshold)1 yu ) i.e. 0 > ln [ (threshold)1- yl )/ yl * yu / (threshold)1 yu )] i.e. (threshold)(threshold)1- yl ) / yl)

* (threshold)yu / (threshold)1 yu )) < 1 i. [(threshold)1- yl ) / (threshold)1- yy )] * [yu / yl] < 1 ii. 2) Yu >> 0.5 iii. 3) Yl << 0.5 From i, ii and iii; Contradiction, hence sigmoid cannot compute X-OR

Exercise Use the fact that any non-linearly separable function has positive linear combination to study if sigmoid can compute any non-linearly separable function. Non-linearity is the source of power y y = m1(h1.w1 + h2.w2) + c1 h1 = m2(w3.x1 + w4.x2) + c2

w2 w1 h2 w6 x1 h2 = m3(w5.x1 + w6.x2) + c3 h1 Substituting h1 & h2 y = k1x1 + k2x2 + c

w5 w4 w3 x2 Thus a multilayer network can be collapsed into an eqv. 2 layer n/w without the hidden layer Can a linear neuron compute X-OR?

yL yU y = mx + c y > yU is regarded as y = 1 y < yL is regarded as y = 0 yU > yL Linear Neuron & X-OR We want

y w2 x2 y = w1x1 + w2x2 + c w1 x1 Linear Neuron 1/4 for (1,1), (0,0) y < yL For (0,1), (1,0) y > yU yU > yL Can (w1, w2, c) be found Linear Neuron 2/4 (0,0) y = w1.0 + w2.0 + c =c y < yL c < yL - (1) (0,1) y = w1.1 + w2.0 + c y > yU w1 + c > yU - (2) Linear Neuron 3/4 1,0 w2 + c > yU - (3) 1,1 w1 + w2 + c < yL - (4) yU > yL - (5) Linear Neuron 4/4 c < yL - (1) w1 + c > yU

- (2) w2 + c > yU - (3) w1 + w2 + c < yL - (4) yU > yL - (5)

Inconsistent Observations A linear neuron cannot compute XOR A multilayer network with linear characteristic neurons is collapsible to a single linear neuron. Therefore addition of layers does not contribute to computing power. Neurons in feedforward network must be non-linear Threshold elements will do iff we can linearize a nonlinearly function.

## Recently Viewed Presentations

• FAYOL'S PRINCIPLES OF MANAGEMENT. DEFINITION OF MANAGEMENT. A few definitions by some of the prominent writers on the subject are given below: (1) "Management is the art of knowing what you want to do and then seeing that it is...
• Horn Clause Computation with DNA Molecules Satoshi Kobayashi Journal of Combinatorial Optimization, v.3 pp.277-299, 1999 Summarized by In-hee Lee 1.Introduction Horn program Subclass of the formulas of first-order logic Have cloas relation to PROLOG language. Finite set of Horn clauses...
• Recent results from Belle: Reflections on Beauty Kay Kinoshita University of Cincinnati Belle Collaboration CP Asymmetry in Standard Model Weak interaction of quarks (u,c,t), (d,s,b) {mass <- CKM matrix -> weak} eigenstates CKM - 3x3, unitary by definition -> 4...
• Garamond Arial Wingdings Calibri Tekton Pro Ripple 1_Ripple Naming Ionic Compounds Simple Ionic Compounds Slide 3 Ions of Non-metals Example You Try Ionic Compound with Multivalent Elements Slide 8 You Try Writing a Chemical Formula Three Line Method Example Example...
• to define real world problems of interest, research the causes of those problems using real-time global texts and then create solutions to address the . problems. Will . advance . students' understanding of comprehension, analysis, and evaluation of text as...
• RCA-branchesâ€¦. 7) Posterior descending artery-The PDA courses in the inferior interventricular groove gives rise to a number of small inferior septal branches which supply lower part of IVS and interdigitate with superior septal branches from LAD.
• About the Speakerâ€¦Kurt Rosentreter. CPA, CA, CFP, CLU, TEP, FMA, CIMA, FCSI, CIM. CPA Canada course instructor for the last ten years (Ontario, B.C., Manitoba). Twenty-five years of personal finance industry experience. Past co-founder of the \$2 Billion Investment Counsel...
• Writing Test Review Ernest Hemingway, when asked what was the most frightening thing he ever encountered, answered: "A blank sheet of paper."