Autonomous Driving 4 Machine Learning and Safety ISO 26262 Standard: Road vehicles Functional safety Part 6: Product development at the software level 5 ISO 26262 Standard Automotive Safety Integrity Level D (ASIL D) ASIL D represents likely potential for severely lifethreatening or fatal injury in the event of a malfunction
and requires the highest level of assurance that the dependent safety goals are sufficient and have been achieved. Defines requirements for quality assurance 6 Testing of Machine Learning Test Data Model
Training Data Learning Algorithm Evaluation with ML Metrics Excerpt from ISO 26262 Quality assurance for ASIL-D: Equivalence class analysis Boundary value analysis Modified Condition / Decision Coverage .
o f 10 Scientific Literature? 11 Contents Motivation Oracles and pseudo oracles
Trivial Oracles 12 Software Testing! Input: Test Driver Test Case 1 Test Object
Test Case n Result 1 Expectation 1 Result n Expectation n
Expectation: Result: 13 The Test Oracle Determines the expectation 14
The Oracle Problem 15 Testing with Pseudo Oracles Simulates a real oracle Usefull to validate properties of algorithms M. D. Davis and E. J. Weyuker, Pseudo-oracles for non-testable 16 programs, in Proceedings of the ACM 81 Conference,
Pseudo Oracles for Machine Learning Approach 1: Metamorphic Testing Approach 2: Comparison of different implementations C. Murphy, G. E. Kaiser, and M. Arias, An approach to software testing of machine learning applications. in SEKE, 17 vol. 167, 2007. Approach 1:
Metamorphic Testing How does the output change, if I manipulate the input? 18 Metamorphisches Testen Original Model Training Data Morphism
Comparison Learning Algorithm Morphed Training Data Morphed Model 19 Metamorphische Relations Naive Bayes Classifier
Feature Class Label Examples for relations Permuting class labels does not affect results Permuting feature ordering does not affect results Adding constants to numeric features does not affect results Adding new features with a constant value does not affect the results X. Xie, J. W. Ho, C. Murphy, G. Kaiser, B. Xu, and T. Y. Chen, Testing and validating machine learning
20 classifiers by metamorphic testing, Journal of Systems and Software, vol. 84, no. 4, pp. 544 558, Approach 2: Testing through other implementations Does my algorithm yield the same results as the competition? 21 Comparison of
Implementations Implementation A Model A Training Data Comparison Implementation B Model B 22
Comparison of Linear Regression McCullough B. D., Mokfi Taha, Almaeenejad Mahsa. On the accuracy of linear regression routines in some data mining packages. WIREs Data Mining Knowl Discov 2018. doi: 10.1002/widm.1279 23 Contents Motivation Oracles and pseudo oracles Trivial Oracles
24 Testing of Classification Algorithms Pilot study Simple metamorophic test that can be applied to (almost) any classification algorithm Automated tests for basic functioning Smoke testing Application of tests to state-of-the-art software 25
Six Metamorphic Tests Same results, if the data does not change 1 is added to all numeric features the order of the instances changes the order of the features changes meta data changes
The results are the opposite, if the class labels are inverted 26 Smoke Testing Validate basic properties of implementation No crashes Return values exist are not Null For machine learning
Models can be trained Predictions can be made No oracle required! 27 Design of the Smoke Tests What are good training/test data for smoke tests? 28
22 Smoke Tests for Classification Data in [0,1] Features close to machine precision Random classes Alle numeric values 0
Only single value in a class 29 Prototype atoml 30 Results for Weka Decisionpassed trees crash with data from Only
two algorithm Nur zwei Algorithmen komplett Reason: all tests! stack overflow due to recursion ohne Probleme +1 change change results Permutations of change results
Keine Probleme, sich die No problem if thewenn data does not Datenchange nicht ndern 31
Results for scikit-learn and Spark 32 Developer Feedback we c.nz a . o
33 Positive Feedback This is definitely helpful! This sounds like a useful tool. Thanks for sharing your analysis. We really need more work in this direction.
Do you have any interest in applying your knowledge to industry? 34 However However, I wanted to point out not all your expectations are warranted
I suspect that 35 Deviations not Always Wrong Minibatches and Bagging Instances/features are subdivided Partitions depend on random seed and order of data Asymmetric initializations Order as tie breaker
36 Users vs. Devs How should I know every detail? We could try and document all the cases where the result will not fulfill these invariances, but I think that might be too much. At some point we need the users
to understand what's going on. If you look at the random forest algorithm and you fix the random state it's obvious that feature order matters. - Scikit-Learn Core Developer Wekas random forest fulfills this property 37 Deviation vs. Significance
Was the final difference in accuracy statistically significant? Creator of Apache Spark 38 Further Steps 1. Significance tests for differences 2. Algorithm specific metamorphic tests 3. Other types of algorithms, e.g., clustering 4. Use combinatorical testing
39 Hyperparameters and Testing How do hyper parameters affect my tests? 40 Combinatorical Testing of hyper parameters
J. Chandrasekaran, H. Feng, Y. Lei, D. R. Kuhn, and R. Kacker, Applying combinatorial testing to data mining algorithms, in 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), March 2017, pp. 253261. 41 Agenda Einfhrung und Motivation Das Orakelproblem und Pseudoorakel
Fallstudie fr Klassifikationsprobleme Ausblick und Zusammenfassung 42 The Future (?) Quality Assurance Machine Learning Joined Conferences Tools and Methods
SIXTH GRADE SOCIAL STUDIES. pose and answer questions about geographic distributions and patterns for various world regions and countries shown on maps, graphs, charts, models, and databases.[6.3B] ... PowerPoint Presentation Last modified by:
Animals are made of complex systems of cells, which must be able to perform all of life's processes and work in a coordinated fashion to maintain homeostasis (a stable internal environment).. During a human's early development, groups of cells specialize...
Early (premodern) Modern Postmodern Cryptic - scripture means much more than what it appears to mean, has hidden meanings, might "imply" much more than what it says Relevant - scripture is about us and our time, not the times in...
Domestic violence is considered workplace violence under the terms of the Ontario OHSA. Under section 32.0.4 of the Act. if an employer is aware or ought to be aware that domestic violence that is likely to expose a worker to...
New Carry Dependence. Through substitution, every carry signal can be a function of solely c 0, x, and y. Can determine carry when inputs are ready. Avoids waiting for the carry to ripple (c. i-1). ?3=?2+?2?1+?2?1?0+?2?1?0?0
THE DEVELOPMENT OF SEX AND GENDER INTO FEMALE SPECIFIC MEDICINE By Stephanie Tran ONE-SEX MODEL TWO-SEX MODEL Thomas Laqueur Historical analysis of sex model shift Anatomical to chemical Was there truly a one-sex model as described by Laqueur?
Ready to download the document? Go ahead and hit continue!