Neural Networks - courses.cs.washington.edu

Neural Networks - courses.cs.washington.edu

Neural Networks Geoff Hulten The Human Brain (According to a computer scientist) Send electro-chemical signals Network of ~100 Billion Neurons Each ~1,000 10,000 connections Activation time ~10 ms second ~100 Neuron chain in 1 second Image from Wikipedia Artificial Neural Network Grossly simplified approximation of how the brain works Artificial Neuron (Sigmoid Unit) Features used as input to an initial set of artificial neurons Output of artificial neurons used as input to others Output of the network used as prediction Mid 2010s image processing ~50-100 layers ~10-60 million artificial neurons

50 million into 100 billion 0.05% Example Neural Network Fully connected network Single Hidden Layer 0 1 connection per pixel + bias 2313 weights to learn 0 0 1 connection per pixel + bias 1 connection per pixel + bias 576 Pixels (Normalized) 1 connection per pixel + bias

0 5 Weights 2,308 Weights Input Layer ( =1) 0 Hidden Layer Output Layer Decision Boundary for Neural Networks Neural network with single node in output layer (no hidden layer) 20 Hidden Nodes Concept Linear Model Non-linear decision boundary - Enabled by non-linear (sigmoid) activation - Complexity via network structure

10 Hidden Nodes 6 Hidden Nodes Underfitting 4 Hidden Nodes 1 Layer Neural Network Example of Predicting with Neural Network Sigmoid Function Hidden Layer 0.5 -1.0 1.0 0.0 ~0.5 1.0 1.5 0.5 1.0

( =1 )= 0.82 ~0.75 0.25 1.0 1.0 1.0 0.5 -1.0 Input Layer Output Layer Activations Example for Blink Task Very limited feature engineering on input Scale Normalize

Hidden nodes learn useful features so you dont have to Trick is figuring out how many neurons to use and how to organize them Positive Weight? ( =1) Weights from Hidden Node 1 Negative Weight? Input Image (Normalized) Weights from Hidden Node 2 Logistic Regression with responses as input Multi-Layer Neural Networks

Fully connected network Two Hidden Layers 2333 weights to learn Filters on filters, for example maybe: Layer 1 learns eye shapes Layer 2 learns combinations 1 connection per pixel + bias 1 connection per pixel + bias ( =1) 1 connection per pixel + bias 576 Pixels (Normalized) Output Layer 1 connection per pixel + bias 2,308 Weights 5 Weights 20 Weights

Hidden Layer Hidden Layer Decision Boundary for Multi-Layer Neural Networks Concept 20 Hidden Nodes 20 Per Layer 10 Hidden Nodes 10 Per Layer 6 Hidden Nodes 6 Per Layer 4 Hidden Nodes 4 Per Layer 1 Hidden Layer Neural Network 2 Hidden Layer Neural Network

Did not converge Linear Model Much more powerful - Difficult to converge - Easy to overfit - Later lecture: how to adapt Best Fit Output Layer Single network (training run), multiple tasks is a vector, not a single value Hidden nodes learn generally useful filters () () () 576 Pixels (Normalized)

(h) Hidden Layer Output Layer Neural Network Architectures/Concepts Fully connected layers Recurrent Networks (LSTM & attention) Convolutional Layers Embeddings MaxPooling Residual Networks Activation (ReLU) Batch Normalization Softmax Dropout Will explore in more detail later

Loss For Neural Networks Mean Squared Error (MSE): Cross Entropy (BCE): In Book Use for Assignment 1 2 2 ( ( .5 1 ) + ( .1 0 ) ) =.135 2 B 1 ( )=

.5 .1 1 0 .1 .95 1 1 Optimizing Neural Nets Back Propagation Gradient descent over entire networks weight vector Easy to adapt to different network architectures Converges to local minimum (usually wont find global minimum) Training can be very slow! For this weeks assignmentsorry For next week well use public neural network software In general very well suited to run on GPU 1. Forward Propagation

Conceptual Backprop with MSE h : = ( ( ) ) () 2. Back Propagation 3. Update Weights h1 With MSE: ~0.5 2. Figure out how much each part contributes to the error. 1.0 0.5 1 h=h (1h ) ~0.75

h h 2 3. Step each weight to reduce the error it is contributing to ~0.82 1. Figure out how much error the network makes on the sample: 1. Forward Propagation Backprop Example 2. Back Propagation 3. Update Weights =0.1 0.5 -1.0 h1

1.0 = (1 )( ) =0.027 ~0.5 1.0 ~0.82 Error = ~0.18 0.25 0.5 1 1.0 ~0.75 1.0 005 005

25 0.5 -1.0 1.0 h 2 h=h (1h ) h 2= .005 h Backprop Algorithm Initialize all weights to small random number (-0.05 0.05) While not time to stop repeatedly loop over training data: Input a single training sample to network and calculate for every neuron Back propagate the errors from the output to every neuron

Downstream error This nodes effect on error Update every weight in the network Stopping Criteria: # of Epochs (passes through data) Training set loss stops going down Accuracy on validation data Backprop with Hidden Layer 1. (or multiple outputs) 2. Back Propagation 3. Update Weights +) h1,1 1,1 2,1 Forward Propagation h 2,1

1,1 2,2 1.0 0.5 1 h1,2 h 2,2 = (1 )( ) h=h (1h ) h Stochastic Gradient Descent

Gradient Descent Calculate gradient on all samples Step Per Sample Gradient Stochastic Gradient Descent Calculate gradient on some samples Step Stochastic can make progress faster (large training set) Stochastic takes a less direct path to convergence Gradient Descent Stochastic Gradient Descent Batch Size: N instead of 1 Local Optimum and Momentum Local Optimum Loss Why is this okay? In practice: Neural networks overfit

Momentum Power through local optimums Converge faster (?) Parameters Dead Neurons & Vanishing Gradients Neurons can die * Large weights (positive or negative) cause gradients to vanish Test: Assert if this condition occurs What causes this Poor initialization of weights Optimization that gets out of hand Input variables unnormalized What should you do with Neural Networks? As a model (similar to others weve learned) Fully connected networks Few hidden layers (1,2,3)

A few dozen nodes per hidden layer Leveraging recent breakthroughs Understand standard architectures Get some GPU acceleration Get lots of data Do some feature engineering Normalization Tune parameters # layers # nodes per layer Be careful of overfitting Simplify if not converging Craft a network architecture More on this next class Summary of Artificial Neural Networks Model that very crudely approximates the way human brains work Neural networks learn features (which we might have hand

crafted without them) Each artificial neuron is a linear model, with non-linear activation function Many options for network architectures Neural networks are very expressive, can learn complex concepts (and overfit) Backpropagation is a flexible algorithm to learn neural networks

Recently Viewed Presentations

  • 슬라이드 1 - Daum

    슬라이드 1 - Daum

    070609-070610. 선돌회 모임 울진 응봉산(매봉산)에서 모임중 오후 산행중 울진 응봉 산 산행중 12개 다리 중 1번.3번 다리에서 병남이.유득이와 옛 우정을 삼키면서 찰칵
  • Office of Aviation Safety Bali Hai Helicopter Tours

    Office of Aviation Safety Bali Hai Helicopter Tours

    Accident Flight SFAR 71 - Special Operating Rules for Air Tour Operators in the State of Hawaii Eighth tour, departed at 1600 Radar Ground Track 1634 1639 1645 Accident Flight Images SFAR 71 SFAR 71 Deviations Deviations approved case-by-case Required...
  • ERYTHROPOEISIS Hemocytoblast Pluripotential Stem Cell Basophilic PolychromaProerythroblast Erythroblast

    ERYTHROPOEISIS Hemocytoblast Pluripotential Stem Cell Basophilic PolychromaProerythroblast Erythroblast

    Basophilic Polychroma-Orthochroma-Hemocytoblast Proerythroblast Erythroblast tophilic tophilic Nuclear Reticulocyte Erythroblast Erythroblast Extrusion 1. dispersed 1. clumped 1. condensed 1. condensed 1. anucleate Pluripotential chromatin eccentric chromatin chromatin Stem Cell nucleus 2. spherical 2. grey-green 2. nucleoli 2. no 2. pink ...
  • Wars of Religion: 1560-1648 - Ap Euro

    Wars of Religion: 1560-1648 - Ap Euro

    I. Habsburg-Valois Wars (c. 1519-1559) A. Treaty of . Cateau-Cambrèsis (1559) Ended Hapsburg-Valois Wars (last purely dynastic wars in 16th century) The wars had been political in nature (not religious) as both France and the HRE were Catholic.
  • March 2002 doc.: IEEE 802.15-02/197r0 Project: IEEE P802.15

    March 2002 doc.: IEEE 802.15-02/197r0 Project: IEEE P802.15

    An AP may not be able to provide the QoS requested in a TSpec if the respective WSTA operates in Power-Save mode. Currently, a WSTA can change power save modes at any time. A method is needed to determine the...
  • El Imperativo Familiar y Formal

    El Imperativo Familiar y Formal

    El Imperativo Familiar y Formal Pero, ¿cómo formamos el imperativo? Familiar Tú+ Formación: Tercera persona del presente Ar - a Er/Ir - e Irregulares: «Di Sal Pon sé Ve Ven haz ten» Formal Ud.
  • Safe Sense Technologies

    Safe Sense Technologies

    Shadi A.K Shirazi Jamie Westell Arash Jamalian Arash * Arash * Arash * Arash * Shadi * Arash * Arash * Arash pressure mat * Pressure mat results * The maximum current available for the motor 20 Amps (H-Bridge limitation)...
  • Risk Assessment of Maritime Navigation across the Greater

    Risk Assessment of Maritime Navigation across the Greater

    Safety of Life at Sea (SOLAS) Convention, requires contracting governments to provide navigation safety services. MARPOL Convention - preventing . and minimizing pollution from ships - both accidental pollution and that from routine operations ...