Segmentation and tracking of the upper body model from range data with applications in hand gesture recognition Navin Goel Intel Corporation Department of Computer Science, University of Nevada, Reno Overview Introduction Overall System Upper Body Model Segmentation Problem Tracking Color Based Segmentation Results Conclusion and Future Work Introduction Applications 3D editing system/ HCI systems, American Sign Language Recognition,

Entertainment, Industrial Control, Video coding, teleconferencing Requirements Background and illumination independent, Occlusions and self occlusions of the body components, Robust hand free initialization, Robust tracking. Overall System Stereo (RGB+Z) video sequence Invalid Track Initial Segmentation Tracking x y z Valid Track

Train Reco Upper Body Model Color video sequence Color-based segmentation Hue Moments Calculation h1 h 2 ... h6 Upper Body Model P(Oij | ) O i , j [O id, j , O ic, j ] [( xij , y ij , z ij ), hij ] P(O , q , J , L | ) ij ij J , L , qij C

P(Oij , qij , J , L | ) P(Oij | qij , L) P(qij | J , L) P( J | L) P( L) LHa Wl Hal El Fl LHe LU LF Sl Ul LU LT N He Sr T Oij Er Ur

LHa LF Wr Fr L J Har C O SizeHead Upper Body Model Head Normal component model P(O id, j | He, LHe ) K He N (O ij | He , C He ) U(Oij | He , LHe ) Neck Neck WidthTorso Planar component model 1 d i, j P(O | T , LT ) KT 2 z2

exp( ( zij ij ) 2 2 2 z ) U([ xij , yij ]T | [ Tx , Ty ]T , [ LTx , LTy ]T ) ij axij byij c Elbow Linear component models O K P (O | A) exp d ij d ij A 2 2 ij ( r , , ) Wrist

2 2 ij 2 U O d | rmax , rmax ij 2 2 Upper Body Model Linear PDF Parameters: P( A | J c , J p ) ([rmax , , ]T [rJ c , J c , J c ]T ) Where, (rJ , J , J ) are the spherical coordinates of Jc with the origin in Jp c c c The conditional probability of a joint Jc given its parent joint Jp and the anthropological measure L is given by: K J c if J c [ min , max ] and

[ , ] Jc min max P( J c | J p , L) and r L Jc 0 otherwise Where, KJc is a normalization constant, min , min and max , max represent the minimum and maximum values of parameters J c J c The Segmentation Problem Looking for all possible joint configuration is computationally impractical. Therefore, segmentation takes place in two stages . Stage I

Stage II Simplifying assumptions Only one user is visible and his/hers torso is the largest body component, The torso plane is perpendicular to the camera and, Head is in vertical position. Notations Q A , Q B , J A , J B state assignments and joint for the arm and body (head &torso) regions. The Upper Body Segmentation. Stage I Step 1 Estimate the torso plane parameters from all data using EM. Estimate the torso and head bounding box, and the plane that includes N. Step 2 Estimate theq~ head blob parameters from all data using EM. Step 3 Compute ij arg max log P Oij , qij , J B q B ij Step 4 Estimate the joints: LHe y N Hex , He y , aN x bN y c 2 T LTx

S l Tx , N y , aS l x bS l y c 2 T LTx S r Tx , N y , aS r x bS r y c 2 T Step 5 Repeat steps 3-4 until convergence of log P(O F | J B , Q B ) The Upper Body Segmentation. Stage II Given the fix positions of Sl and Sr, we sub sample the joint space to get NE=18 possible positions for each of the joints El and Er. Given each position of the elbow joints we search for NW = 16 possible positions for each of the joints Wl , Wr. Step 1. For each possible arm parameters estimate the mean of the linear pdfs corresponding to the upper and fore arms, and the mean of the normal pdf for the hands, Step 2. For each joint configuration JA: a) compute the best state assignment of the observation vectors given the joint

configuration, b) compute the observation likelihood given the joint configuration. Step 3. Find the max likelihood over all joint configuration and determine the best set of joints and the corresponding best state assignment. Arm Tracking for each joint Jp we build a set of [Jc1, Jc2, Jc3, Jc4, Jc5] five possible child joint positions such that each of them lies on the surface of the sphere with parent joint as the center. Z ~ ~ t 1 t 1 Jc2 Jc1 = (r,,) joint center from last frame Jc3 Jc1 Jc Jc2 = (r,-,) 5 Jc4 Y Jc4 = (r,,-)

Jc3 = (r,,+) Jc5 = (r,+,) X Step 1 estimate the mean of the linear pdfs corresponding to the upper and fore arms, and the mean of the normal pdf for the hands Step 2 for each joint configuration we determine the best state assignment of the observations ~ ~ ~ qij ( J A ) arg max log P(Oij , J A , qij ) qij A QA ( J A ) {qij ( J A ) | ij A} Step 3 the max log likelihood log P * (OA , QA , J A ) determines the best joint configuration. Color Based Segmentation Depth Segmentation Pixels with no depth information cannot be assigned to body components by the previous segmentation algorithm. Need to estimate the depth of all pixels and perform global segmentation. qij* arg max P(Oij | qij ) all qij XZ iZ jZ

P Oij | qij k P , , Z | k P OijC | k f f iZ jZ , YZ , Z [ z min ...z max ] f f P Oijd | qij k P X z , Yz , Z | k , (i, j ) P X W , YW ,W | l , (i, j ) dW dZ Z l k W Z Color Based Segmentation In practice iZ

jZ ~ iZ k jZ k l Z k arg max P , , Z k | k P OijC | k P l , , Z l | l P OijC | l f f allZ k l k f f Z / Z k Suppose, k = left forearm, then l = all the body components except left forearm, and if Zk = a then Zl = [zmin zmax] > a. ~ ~ i Z k jZ k ~ P Oij | qij k P , , Z k | qij k P OijC | qij k f f

Color Segmentation Upper Body Segmentation and Tracking. Results Conclusion and Future Work Contributions Articulated upper body model from dense disparity maps, Linear pdf for the fore arms and upper arms, Hand free initialization of the system from the optimal joint configuration, Upper body tracking, seen as a particular case of the initialization. Future work Improvements to the background segmentation, Learn the anthropological measures, Integration with other HCI systems (gesture reco, face reco, speech reco, speaker identification etc.)