WEARABLE SENSING ANALYSIS IDENTIFYING ALCOHOL DRINKING FROM DAILY PHYSIOLOGICAL DATA Thesis Defense of Masters Degree Chen Zhang Advisor: Dr. Yi Shang Contents Introduction Problem Contribution Motivation Related Work System Improvement Data Analysis Pipeline Conclusion and Future Work 2 Introduction A practical psychology study about Alcohol Craving Cooperated with Psychology Sciences Based on questionnaires and body physiological readings v8 lsr ds Sensor Suites Data Stored Data Logging A mobile ambulatory assessment system Data collecting system links wearable sensors with smartphones and servers

Dispatching/Collecting surveys, and physiological sensor measurements 3 Problem During implementing Extendable issue Craving Emotion HIV Pain Similar studies required system migrating with little changes, esp. for survey part Security issue Personal information and responses were transmitting through public network After data collected and stored Analysis issue Only preliminary analysis has been applied since collected Ways of preprocessing data and evaluation on that is blank 4 Contribution and Motivation Improvements on existing application A truly randomized survey scheduler - meet original system design A redesign of OOP for survey component - Uniform structure facilitating system migration A novel Digital Envelope approach for a secure data-transfer encryption - Enhanced security is guaranteed

New pipeline on data analysis Data cleaning procedures - Methods and rules are first time defined and tried, on real subjects Physiological features extraction - Useful features are extracted and testified Classification on drinking from daily readings - Based on machine learning methods, with nearly 80% accuracies 5 Related Work Ganesan, Ramachandran, Mohan Gobi, and Kanniappan Vivekanandan. "A Novel Digital Envelope Approach for A Secure ECommerce Channel." IJ Network Security 11.3 (2010): 121-127. Introduced concept of Digital Envelope Ways to combine two methods together, to get both fast and strengthened dataencryption channel 6 Related Work (contd) Plarre, Kurt, et al. "Continuous inference of psychological stress from sensory measurements collected in the natural environment." Information Processing in Sensor Networks (IPSN), 2011 10th International Conference on. IEEE, 2011. Ways of data preprocessing/cleaning are discussed in detail, rules and standards are given as references. Many useful features are introduced by their work Physiological responding, such as HRV and BR, can be useful in proposing a behavior related classifier 7 Related Work (contd) Hossain, Syed Monowar, et al. "Identifying drug (cocaine) intake events from acute physiological response in the presence of free-living physical activity." Proceedings of the 13th international symposium on Information processing in sensor networks. IEEE Press, 2014. Acute stimulation would instantly be reflected on physiological reactions, dosages may affect differently

The way of physiological responding to events can be examined through shortterm and long-term changing. MACD line can do the job of measuring and discriminating 8 Contents Introduction Related Work System Improvement Random Scheduler OOP Redesign Digital Envelope Data Analysis Pipeline Conclusion and Future Work 9 System Improvement Random Scheduler Random Survey Requirements Randomly appeared Six times in a day Without overlapping 1st RS 2nd RS 3rd RS 4th RS 5th RS 6th RS Time of a day

Necessary safe gap between two consecutive surveys Randomized schedule during this period of time Implements Started at noon of a day or time of finishing a Morning Report Ended at midnight of a day Divided with 3x(6+1) equal portions Left necessary gaps between two consecutive surveys Scheduled surveys randomly within each selected portion 10 System Improvement OOP Redesign UML graph generated by codes Category Question Answer Three layer Parents - Abstract Children - Semi-abstract Grandchildren - Implemented Three Objects Survey Category

Question Answer Advantages Readable and maintainable Highly Reusable (within project) Ease of extendable (across projects) 11 System Improvement Digital Envelope Encryption Methods Sender Receiver Symmetric - AES Single key cipher Shared secret Fast Asymmetric RSA Key cipher pairs Public and private Slow 12 Digital Envelope (contd) Digital Envelope AES Fast but has key sharing issue RSA Slow but has private key Combine Digital Envelope

Both fast And secure 13 Contents Introduction Related Work System Improvement Data Analysis Pipeline Pipeline Overview Data Cleaning Feature Extraction Classification Conclusion and Future Work 14 Statistical Overview SEM Hexoskin Two types of sensor SEM and Hexoskin ECG(256 Hz), RIP(128 Hz), Temperature, etc. Raw data (RR Int.)

Days of available RR Intervals, by March 31 2016 30 25 SEM 101 days 20 Hexoskin 236 days15 1382.8h /w 22 users 10 403 drinking surveys 5 0 6 16 13 7 8 2 15 22 15 14 7 24 14 25 22 14 9

16 16 24 16 16 8 8 1001 1004 1005 1007 1008 1010 1013 1014 1017 1019 1020 1021 1022 1024 1025 1026 1027 1028 1029 1030 1031 1032 Hexoskin SEM 15 Data Analysis Pipeline Data Collection (step 1,2) Data Cleaning (3-7) Feature Extraction (8) Classification (9) 1 3 2 4

v8 lsr ds 8 9 Classification Trimming and Matching Data Logging Sensor Suites Feature Extraction 5 6 7 Valid User Selection Outliers Removal Gaps and Insufficient Data Activities Removal 16 Data Cleaning Five Steps

Trimming and Matching Outliers Removal Activities Removal Gaps and Insufficient Data Valid User Selection 1 2 Trimming and Matching 3 4 5 Valid User Selection Outliers Removal Gaps and Insufficient Data Activities Removal 17 1. Trimming and Matching Trimming Blank and Zero values appeared in the dataset Data points under indications of LOW confidence Extreme values that out of normal range

Heart rate (40~200), Breathing rate (10~100) Matching Unifying DateTime format Using the same time-zone Align data from different sources by timestamps 18 2. Outliers removal Assumption RR Intervals follow Normal distribution Outliers defined as: Data away from 1-min moving average 2*standard deviations 19 3. Activity Affected Data Removal Intensive activities are main confounders Significantly change physio. readings Much stronger than alcohol does Screening out the affected minutes Calculate activity magnitude Standard deviation in 10s window Set a threshold to discriminate Majority will decide the action RR Intervals

Acc Readings 20 4. Gaps and Insufficient Data Removal Less than 50 data points in a one-minute window 10 minutes duration without or with few data points 21 5. Valid User Selection Some users or some days of a user contain too much invalid data Extreme value Blank value Noisy value Rules to screen out Less than 1.5h in a day More than 75% invalid 22 Statistics After Cleaning Cleaned data SEM 76 days Hexoskin 140 days Good Bad

Valid data out of total in minutes valid_mins total_mins 14000 12000 10000 8000 6000 4000 2000 0 ' ' ' ' ' ' ' ' ' ' ' ' ' ' 4' ' ' 8' ' ' 4' ' 4 7 2 1 5

7 0 3 9 4 17 1 2 5 6 8 9 0 1 0 0 0 0 0 1 1 1 1 2 2 02 2 02 02 2 02 3 03 03 01 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m em m m em m m em m xo xo xo xo xo xo xo xo xo xo xo xo xo xo e e e e e e

e e e e e e e e e e e e e e 's 's 's 's 's 's 's 's 's 'h 'h 'h 'h 'h 'h 'h 'h 'h 'h 'h 'h 'h 'h 0 Data valid rate SEM 77.5% (7.8h per day) Hexoskin 63.1% (5.6h per day) Overall only 68% data are valid 0 7 ' Users Drink days Valid

days Total days Drink rate Valid rate SEM 9 24 76 (35725 mins) 101 (46087 mins) 31.58% 77.52% Hexoskin 13 34 140 (47243 mins) 236 (74902 mins) 24.29% 63.07%

Total 21 58 216 (82968 mins) 337 (120989) 26.85% 68.57% 23 Contents Introduction Related Work System Improvement Data Analysis Pipeline Pipeline Overview Data Cleaning Feature Extraction Classification Conclusion and Future Work 24 Feature Extraction Category

Restriction Features Non-HRV 1 minute HR, mean, median, 20th percentile, 80th percentile, quartile deviation --(6 features) HRV 1 minute Orig. RR: RMSSD, SDSD, NN50, pNN50, NN20, pNN20 Norm. RR: Variance; LowBand(0.1-0.2Hz), MB(0.2-0.3Hz), HB(0.3-0.4Hz), LB/HB --(11 features) 5 minutes Norm. RR: SDANN; LowFrequency(0.04-0.15Hz), HF(0.15-0.4Hz), LF/HF --(4 features) Basic Breath rate, Minute Ventilation, Inspiration duration, expiration duration, respiration duration, I/E, stretch Aggregation Mean, median, 80th percentile, quartile deviation Per cycle of respiration

RSA Respiratory Sinus Arrhythmia RIP (Resp.) HRV-Resp. ECG features (Heart Related) QD quartile deviation, (Q3-Q1) / 2 RMSDD rms of successive differences SDSD std of successive differences NN50 # of pairs of successive RRs differ > 50ms pNN50 proportion of NN50 divided by total PSD method total spectral power of RR Interval for each predefined band RIP features (Respiration Related) Breathing Cycle an inhalation and an exhalation period Minute Ventilation volume of inhaled air in a minute Stretch volume of air breathed from last breathing cycle I/E ratio of inhalation and exhalation duration Other features RSA difference of RR within each breathing cycle ECG based features duration of QT, PR, TH 25

Feature Visualization SEM Hexoskin 76 days with 24 drinking days 140 days with 34 drinking days Only ECG features 1777 mins drinking Only ECG features 2125 mins drinking Both ECG and RIP 1320 mins drinking Plot for one day: Whole datasets: Hexoskin: 23 features Non-Drink / Drink 26 Feature Analysis Dosage Levels Histograms of Heart Rate and pNN20 Represented as heat map, for different dosage levels The more dosage of drinking, the higher HR, lower HRV 27 Feature Analysis Pairwise Correlation Top-left most shows feature with least correlation with all others PCA: 95% info, 25 to 10 features Ranks of features with least correlation in each category 1-5 6-10

11-15 Non-HRV 1 2 3 1min HRV 1 1 2 5min HRV 1 2 BR 2 16-20 21-23 Total 6 5

2 11 1 4 2 28 Contents Introduction Related Work System Improvement Data Analysis Pipeline Pipeline Overview Data Cleaning Feature Extraction Classification Conclusion and Future Work 29 Classification Class labeling expansion 1 hour Drinking surveys: ID, DF and RS Mark duration of 1 hour Half hour before and after

Classifiers and Evaluators Working with WEKA machine learning toolbox J48 decision tree, Adaboost J48 and SVM Confusion Matrix, Accuracy, Kappa and ROC kappa >0.4 is acceptable, >0.75 is better Purpose: to classify drinking and non-drinking from physiologic features. 30 Results on ECG Features - SEM Datasets SEM: ECG (1777 mins drink) Training and testing with 10-fold cross validation Benchmark is 50%, randomly selecting from non-drink features Adaboost J48 decision tree gives the highest performance, 82.3% accuracy. Result with J48 decision tree Accuracy SEM (1777) 78.5% Kappa 0.5696 ROC 0.801 Comparison among three methods Confusion Matrix Accuracy Kappa

ROC 0 1 J48 Decision Tree 78.5% 0.5696 0.785 Non-drink 1377 423 J48 with Adaboost 82.3% 0.6467 0.823 Drink 347 1430 SVM (normalized) 66.8%

0.337 0.668 31 Results on ECG Features - Hexoskin Datasets Hexoskin: ECG (2125 mins drink) Benchmark 50% Accuracy Kappa ROC J48 Decision Tree 66.8% 0.3357 0.701 J48 with Adaboost 68.6% 0.371 0.748 SVM (normalized)

60.1% 0.2021 0.601 Confusion Matrix 0 1 Non-drink 1489 711 Drink 725 1400 Non-drink 1508 692 Drink 668 1457 Non-drink

1262 938 Drink 789 1336 Comparison Again, Adaboost J48 gives the best performance Comparatively worse than those of SEM results (82.3%) 32 Results on All Features - Hexoskin Hexoskin: ECG + RIP (1320 mins drink) Adaboost J48 Classifier Benchmark 50% Comparison ECG and RIP perform the best (74.1%) Compare ECG only and RIP only ECG only with 70.9% RIP only with 59.6% Accuracy Kappa ROC ECG & RIP (1320) 74.1%

0.4821 0.819 ECG Only (1320) 70.9% 0.4179 0.771 RIP Only (1320) 59.6% 0.1855 0.628 ECG & BR (1320) 71.2% 0.423 0.785 ECG & MV (1320) 72.5% 0.4499

0.802 Confusion Matrix 0 1 Non-drink 1039 361 Drink 343 977 Non-drink 1004 396 Drink 395 925 Non-drink 1013 387 Drink

712 608 Non-drink 1011 389 Drink 395 925 Non-drink 1043 357 Drink 390 930 ECG with one of RIP features With BR (71.2%), with MV (72.5%) Both give acceptable results (k>0.4) 33 Case Study Most Recent 5 Users Users {1032,1031,1030,1029,1028}, they supposed to be more stable ECG and RIP Features(323)

ECG Only Features(323) Accuracy Kappa ROC Accuracy Kappa ROC J48 81.3% 0.6255 0.827 J48 81.3% 0.6235 0.822 J48 Ada 87.1% 0.7414 0.936

J48 Ada 85.9% 0.7144 0.918 SVM 70.3% 0.4011 0.701 SVM 66.9% 0.3331 0.667 Accuracy Kappa ROC BR Only Features(323) MV Only Features(323) Accuracy Kappa

ROC J48 77.7% 0.5583 0.823 J48 75.9% 0.5126 0.813 J48 Ada 78.8% 0.5761 0.845 J48 Ada 75.8% 0.5067 0.816 SVM 76.6%

0.5295 0.766 SVM 55.3% 0 0.5 34 Conclusion and Future Work Conclusion In field study, about 1/3 more of data is needed before preprocessing Body physiological measurements could reflect drinking respondings Adaboost J48 is the most suitable classifier with this thesis work ECG features alone can do the job, but with addition information provided by RIP features, the performance is improved Future work Craving as another aspect of study with alcohol drinking Probabilistic models with time relations can be followed Advanced machine learning methods with spectrograms 35 Thank You! Question?

Recently Viewed Presentations

  • Telecom Sectors in Odisha -

    Telecom Sectors in Odisha -

    Local response system was robust - fire, local police supplemented with ODRAF and NDRF . Engagement of Volunteers and Civil Defence . Pre-defined roles & responsibilities led to good coordinated response. Inclusive response for PWD as part of the system....
  • Limits to growth and sustainable use of resources

    Limits to growth and sustainable use of resources

    "Limits to Growth," the report was called, and besides a shock it also caused outrage worldwide. Several years after the first phase of environmental awareness and shortly before the first oil crisis (1973), "Limits" brought the message that the world...
  • Excretion of Liquid Waste - Crestwood Middle School

    Excretion of Liquid Waste - Crestwood Middle School

    Nitrogenous Wastes. Nitrogenous Waste. Fist sized organ located in the lower back . Juxtamedullary nephrons- long loops that go deeply . Receives blood from a renal artery from which urine is produced. ... Excretion of Liquid Waste Last modified by:
  • onehitwonders


    Click here for Instructions Click here to Start
  • The translation shift approach - Islamic University of Gaza

    The translation shift approach - Islamic University of Gaza

    Vinay and Darbelnet's model. Vinay and Darbelnet carried out a comparative stylistic analysis of French and English. They looked at texts in both languages, noting differences between the languages and identifying different translation strategies and 'procedures'
  • AVAMS Stage 4 Overview - E-LIS

    AVAMS Stage 4 Overview - E-LIS

    Its archive includes a large number of student productions from what are now significant Australian directors such as Philip Noyce, Gillian Armstrong and Jane Campion. It includes the audio-visual records of significant politicians, for example former Prime Ministers Harold Holt,...
  • RENEW's Mission

    RENEW's Mission

    Our Mission RENEW International fosters spiritual renewal in the Catholic tradition by empowering individuals and communities to encounter God in everyday life,


    Three of the kinetic units, referred to as basic units, may be defined arbitrarily. The fourth unit, referred to as a derived unit, must have a definition compatible with Newton's 2nd Law, International System of Units (SI): The basic units...