Diapositive 1 - Unistra

Diapositive 1 - Unistra

Short overview of Weka 1 Weka: Explorer Visualisation Attribute selections Association rules Clusters Classifications Weka: Memory issues Windows Edit the RunWeka.ini file in the directory of installation of Weka maxheap=128m -> maxheap=1280m Linux Launch Weka using the command ($WEKAHOME is the

installation directory of Weka) Java -jar -Xmx1280m $WEKAHOME/weka.jar 3 ISIDA ModelAnalyser Features: Imports output files of general data mining programs, e.g. Weka Visualizes chemical structures Computes statistics for classification models Builds consensus models by combining different individual models 4

Foreword For time reason: Not all exercises will be performed during the session They will not be entirely presented neither Numbering of the exercises refer to their numbering into the textbook. 5 Ensemble Learning Igor Baskin, Gilles Marcou and Alexandre Varnek 6 Hunting season Single hunter Courtesy of Dr D. Fourches

Hunting season Many hunters What is the probability that a wrong decision will be taken by majority voting? Probability of wrong decision ( < 0.5) < 0.5)) Each voter acts independently 45% 40% 35% 30% 25% =0.4 =0.3 =0.2 =0.1 20%

15% 10% 5% 0% 1 3 5 7 9 11 13 15

17 19 More voters less chances to take a wrong decision ! 9 The Goal of Ensemble Learning Combine base-level models which are diverse in their decisions, and complementary each other Different possibilities to generate ensemble of models on one same initial data set Compounds - Bagging and Boosting

Descriptors - Random Subspace Machine Learning Methods - Stacking 10 Principle of Ensemble Learning Perturbed sets ENSEMBLE Matrix 1 Learning algorithm

Model M1 Matrix 2 Learning algorithm Model M2 Matrix 3 Learning algorithm Model Me Training set

C1 D1 Dm Consensus Model Cn Compounds/ Descriptor Matrix 11 Ensembles Generation: Bagging

Compounds - Bagging and Boosting Descriptors - Random Subspace Machine Learning Methods - Stacking 12 Bagging

Bagging = Bootstrap Aggregation Introduced by Breiman in 1996 Based on bootstraping with replacement Usefull for unstable algorithms ( < 0.5)e.g. decision trees) Leo Breiman (1928-2005) Leo Breiman (1996). Bagging predictors. Machine Learning. 24(2):123-140. 13 Bootstrap Sample Si from training set S Training set S Dm D1

Dm D1 C1 C3 C2 C2 C3 Si C2 C4

C4 . . . . . . Cn C4 All compounds have the same probability to be selected Each compound can be selected several times or

even not selected at all (i.e. compounds are sampled randomly with replacement) Efron, B., & Tibshirani, R. J. (1993). "An introduction to the bootstrap". New York: Chapman & Hall 14 Bagging Data with perturbed sets of compounds Training set S1 C1 C3

Cn ENSEMBLE Learning algorithm Model M1 C1 C2 C4 . . . C4 C2

C8 C2 S2 C9 C7 C2 C2 Voting (classification) Learning algorithm Model M2 C1 Se

C4 C1 C3 C4 C8 Consensus Model Averaging (regression) Learning algorithm Model Me 15 Classification - Descriptors ISIDA descritpors: Sequences

Unlimited/Restricted Augmented Atoms Nomenclature: txYYlluu. x: type of the fragmentation YY: fragments content l,u: minimum and maximum number of constituent atoms Classification - Data Acetylcholine Esterase inhibitors ( < 0.5) 27 actives, 1000 inactives) 16 Classification - Files train-ache.sdf/test-ache.sdf Molecular files for training/test set train-ache-t3ABl2u3.arff/test-ache-t3ABl2u3.arff descriptor and property values for the training/test set

ache-t3ABl2u3.hdr descriptors' identifiers AllSVM.txt SVM predictions on the test set using multiple fragmentations 17 Regression - Descriptors ISIDA descritpors: Sequences Unlimited/Restricted Augmented Atoms Nomenclature: txYYlluu. x: type of the fragmentation YY: fragments content l,u: minimum and maximum number of constituent atoms

Regression - Data Log of solubility ( < 0.5) 818 in the training set, 817 in the test set) 18 Regression - Files train-logs.sdf/test-logs.sdf Molecular files for training/test set train-logs-t1ABl2u4.arff/test-logs-t1ABl2u4.arff descriptor and property values for the training/test set logs-t1ABl2u4.hdr descriptors' identifiers AllSVM.txt SVM prodictions on the test set using multiple fragmentations 19

Exercise 1 Development of one individual rules-based model (JRip method in WEKA) 20 Exercise 1 Load train-ache-t3ABl2u3.arff 21 Exercise 1 Load test-ache-t3ABl2u3.arff 22 Exercise 1

Setup one JRip model 23 Exercise 1: rules interpretation 187. (C*C),(C*C*C),(C*C-C),(C*N),(C*N*C),(C-C),(C-C-C),xC* 188. (C-N),(C-N-C),(C-N-C),(C-N-C),xC 189. (C*C),(C*C),(C*C*C),(C*C*C),(C*C*N),xC 24 Exercise 1: randomization What happens if we randomize the data and rebuild a JRip model ? 25

Exercise 1: surprizing result ! Changing the data ordering induces the rules changes 26 Exercise 2a: Bagging Reinitialize the dataset In the classifier tab, choose the meta classifier Bagging 27 Exercise 2a: Bagging Set the base classifier as JRip Build an ensemble of 1 model

28 Exercise 2a: Bagging Save the Result buffer as JRipBag1.out Re-build the bagging model using 3 and 8 iterations Save the corresponding Result buffers as JRipBag3.out and JRipBag8.out Build models using from 1 to 10 iterations 29 Bagging 0.88 0.86 0.84 Classification ROC AUC

0.82 0.8 AChE 0.78 0.76 ROC AUC of the consensus model as a function of the number of bagging iterations 0.74 0.72 0.7 0.68 0 1

2 3 4 5 6 7 8 9 10 Number of bagging iterations

30 Bagging Of Regression Models 31 Ensembles Generation: Boosting Compounds - Bagging and Boosting Descriptors -

Random Subspace Machine Learning Methods - Stacking 32 Boosting Boosting works by training a set of classifiers sequentially by combining them for prediction, where each latter classifier focuses on the mistakes of the earlier classifiers. AdaBoost classification Yoav Freund Regression boosting

Robert Shapire Jerome Friedman Yoav Freund, Robert E. Schapire: Experiments with a new boosting algorithm. In: Thirteenth International Conference on Machine Learning, San Francisco, 148-156, 1996. J.H. Friedman (1999). Stochastic Gradient Boosting. Computational Statistics and Data Analysis. 38:367-378. 33 Boosting for Classification. AdaBoost w w w w Training set C1 C2

C3 C4 . . . Cn C1 C2 C3 C4 S1 . . . Cn

w S2 w C1 w C2 w C3 w C.4 . . w Cn Se w w

w w w C1 C2 C3 C4 . . . n C e ENSEMBLE

e e e Learning algorithm Model M1 e Weighted averaging & thresholding e e e

e Learning algorithm Model M2 Learning algorithm Model Mb Consensus Model e

34 Developing Classification Model Load train-ache-t3ABl2u3.arff In classification tab, load test-ache-t3ABl2u3.arff 35 Exercise 2b: Boosting In the classifier tab, choose the meta classifier AdaBoostM1 Setup an ensemble of one JRip model 36 Exercise 2b: Boosting Save the Result buffer as JRipBoost1.out Re-build the boosting model using 3 and 8 iterations Save the corresponding Result buffers as

JRipBoost3.out and JRipBoost8.out Build models using from 1 to 10 iterations 37 Boosting for Classification. AdaBoost 0.83 0.82 Classification ROC AUC 0.81 0.8 AChE 0.79

0.78 ROC AUC as a function of the number of boosting iterations 0.77 0.76 0.75 0.74 0 1 2 3 4

5 6 7 8 9 10 Log(Number of boosting iterations) 38 Bagging vs Boosting 1 1

0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.8 0.75 0.75

0.7 1 10 Base learner JRip 100 Bagging Boosting 0.7 1 10 100 1000

Base learner DecisionStump 39 Conjecture: Bagging vs Boosting Bagging leverages unstable base learners that are weak because of overfitting (JRip, MLR) Boosting leverages stable base learners that are weak because of underfitting (DecisionStump, SLR) 40 Ensembles Generation: Subspace Random Compounds

- Bagging and Boosting Descriptors - Random Subspace Machine Learning Methods - Stacking 41 Random Subspace Method Tin Kam Ho

Introduced by Ho in 1998 Modification of the training data proceeds in the attributes ( < 0.5)descriptors) space Usefull for high dimensional data Tin Kam Ho (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8):832-844. 42 Random Subspace Method: Random Descriptor Selection Training set with initial pool of descriptors C1 D2 D3 D4

. . . D1 Dm Cn C1 D3 D2 Dm All descriptors have the same probability to be

selected Each descriptor can be selected only once Only a certain part of descriptors are selected in each run D4 Cn Training set with randomly selected descriptors 43 Random Subspace Method Data sets with randomly selected descriptors S1

D4 D2 D3 ENSEMBLE Learning algorithm Model M1 Voting (classification) Training set D1 D2 D3 D4 Dm S2 D1 D2 D3

Learning algorithm Model M2 Consensus Model Averaging (regression) Se D4 D2 D1 Learning algorithm Model Me 44

Developing Regression Models Load train-logs-t1ABl2u4.arff In classification tab, load test-logs-t1ABl2u4.arff 45 Exercise 7 Choose the meta method Random SubSpace. 46 Exercise 7 Base classifier: Multi-Linear Regression without descriptor selection Build an ensemble of 1

model then build an ensemble of 10 models. 47 Exercise 7 1 model 10 models 48 Exercise 7 49 Random Forest Random Forest = Bagging + Random Subspace

Particular implementation of bagging where base level algorithm is a random tree Leo Breiman (1928-2005) Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32. 50 Ensembles Generation: Stacking Compounds - Bagging and Boosting

Descriptors - Random Subspace Machine Learning Methods - Stacking 51 Stacking David H. Wolpert Introduced by Wolpert in 1992 Stacking combines base learners by means of a separate meta-learning method using their

predictions on held-out data obtained through crossvalidation Stacking can be applied to models obtained using different learning algorithms Wolpert, D., Stacked Generalization., Neural Networks, 5(2), pp. 241-259., 1992 Breiman, L., Stacked Regression, Machine Learning, 24, 1996 52 Stacking The same data set Different algorithms Data set S Learning algorithm L1

ENSEMBLE Model M1 Machine Learning Meta-Method (e.g. MLR) Training set C1 Cn D1 Dm Data set S

Data set S Learning algorithm L2 Model M2 Data set S Learning algorithm Le

Model Me Consensus Model 53 Exercise 9 Choose meta method Stacking Click here 54 Exercise 9 Delete the classifier ZeroR Add PLS classifier (default parameters) Add Regression Tree M5P (default

parameters) Add Multi-Linear Regression without descriptor selection 55 Exercise 9 Click here Select Multi-Linear Regression as meta-method 56 Exercise 9 57 Exercise 9

Rebuild the stacked model using: kNN (default parameters) Multi-Linear Regression without descriptor selection PLS classifier (default parameters) Regression Tree M5P 58 Exercise 9 59 Exercise 9 - Stacking Learning algorithm R (correlation coefficient)

RMSE MLR PLS M5P (regression trees) 1-NN (one nearest neighbour) 0.8910 0.9171 0.9176 1.0068 0.8518 0.8461 0.8455

1.1889 Stacking of MLR, PLS, M5P 0.9366 0.7460 Stacking of MLR, PLS, M5P, 1-NN 0.9392 0.7301 Regression models for LogS

60 Conclusion Ensemble modeling converts several weak classifiers ( < 0.5)Classification/Regression problems) into a strong one. There exist several ways to generate individual models Compounds Descriptors Machine Learning Methods 61 Thank you and Question s? Ducks and hunters, thanks to D. Fourches 62

Exercise 1 Development of one individual rules-based model for classification (Inhibition of AChE) One individual rules-based model is very unstable: the rules change as a function of ordering the compounds in the dataset 63 o M l de 1 2 el od M

el d o m 3 Ensemble modelling M od el 4 LR M NN

M V S Ensemble modelling kN N

Recently Viewed Presentations

  • Chapter 21: Metamorphism

    Chapter 21: Metamorphism

    Chapter 21: Metamorphism The Limits of Metamorphism Low-temperature limit grades into diagenesis The boundary is somewhat arbitrary Diagenetic/weathering processes are indistinguishable from metamorphic Metamorphism begins in the range of 100-150oC for the more unstable types of protolith Some zeolites are...
  • mClass Reading 3D What does this mean for

    mClass Reading 3D What does this mean for

    DIBELS assesses early literacy skills. TRC assesses text reading comprehension. DIBELS & TRC Elements of Reading 3D Understanding the make-up & basic features of print Hearing & using sounds in spoken words Knowing sounds of letters and sounding out written...
  • Parent Online Payments using Westpac Quickweb

    Parent Online Payments using Westpac Quickweb

    Parent Online Payments using Westpac QuickWeb. Schools Finance. April 2013. This presentation is designed for use at parent or school community meetings prior to the introduction of Parent Online Payments.
  • Discovering the Internet  Chapter 5 Complete Concepts and

    Discovering the Internet Chapter 5 Complete Concepts and

    Thesaurus.com. NetLingo. Law.com. Chapter 5: Getting More Out of the Internet. Research and Reference Tools. ... More and more companies are presenting their bills online. E-bills are less expensive to create and present than paper bills. E-bills are more convenient...
  • The Urantia Book - Paper 99 -The Social Problems of Religion

    The Urantia Book - Paper 99 -The Social Problems of Religion

    Paper 99 The Social Problems of Religion. AUDIO VERSION. 99:0.1 (1086.1) Religion achieves its highest social ministry when it has least connection with the secular institutions of society. In past ages, since social reforms were largely confined to the moral...
  • The Government and Politics of Britain and Northern Ireland

    The Government and Politics of Britain and Northern Ireland

    The act of judicial review in the UK is often seen as less significant than judicial review in the US. This is because in the US judges have the power to strike down pieces of law that are judged to...
  • Deciphering the crystal structure of NF-CU - a novel ...

    Deciphering the crystal structure of NF-CU - a novel ...

    Deciphering the crystal structure of NF-CU- a novel bispecific antibody for the treatment of acute myeloid ... (sitting drop at 20°C) screened ~ 600 conditions 1.6 . M ammonium sulfate, 0.1 M MES pH 6.0 (pH Clear Screen) dataset collection...
  • From 1789- 1848--- ROMANTICS The Romantics- revolted against

    From 1789- 1848--- ROMANTICS The Romantics- revolted against

    NeoClassical wing of David's influence- influenced by Raphael. Formal, classical and graceful, won the Academy's highest award and became the Director of the Academy. He was obsessed with line and graceful execution of it. Wanted to paint historical work, but...