# Learning BlackJack with ANN (Aritificial Neural Network)

Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam [email protected] ID: 9012828100 Goal Use Reinforcement Learning algorithm to learn strategies in Blackjack. Train MLP to play Blackjack without explicitly teaching the rules of the game. Develop a better strategy with ANN that beats the Dealers 17 points rule. Win % Tie % Players random moves

31% 8% Dealers 17 points rule 61% 8% Blackjack Draw cards from a deck of 52 cards to a total value as close to 21 as possible. Simplify Blackjack to allow only hit or stand in each turn. Reinforcement Learning Map situations to actions such that the reward value is

maximized. Decide which actions (hit/stand) to take by finding the actions that yields the highest reward through trial and error. Update winning probability of the intermediate states after each game. The winning probability of each state converges as the learning parameter decreases after each game. Result table from learning The first 5 columns = dealers cards next 5 columns = the players cards Card sorted in ascending order Column 11 = the winning probability of each state Column 12 & 13 = action taken by the player Action [1 0] -> hit [0 1] -> stand and [1 1] -> end state 2.0000 5.0000 0 0

2.0000 5.0000 0 0 2.0000 5.0000 10.0000 0 0 0 0 6.0000 6.0000 4.0000 6.0000 4.0000 6.0000 0 6.0000 6.0000 0 0 7.0000

0 0 0 0.3700 0.2500 0 1.0000 1.0000 0 0 1.0000 1.0000 MLP and game flow MLP Configurations

Normalization in feature vectors, and scaled to range of -5 to 5. Max. Training Epochs: 1000, epoch size = 64 Activation function (hidden layer)=hyperbolic tangent Activation function (output layer) = sigmoidal MLP1: = 0.1, = 0, MLP Config 4-10-10-10-2. 89.5%. MLP2: = 0.1, = 0.8, MLP Config 5-10-10-10-2. 91.1%. MLP3: = 0.8, = 0, MLP Config 5-10-10-10-2. 92.5%. MLP4: = 0.1, = 0, MLP Config 6-12-12-12-2.

90.2%. Experiment Results When dealer uses 17-point rule: Strategy Win % Tie % Player with MLP 56.5% 9% When player uses random moves: Strategy Win %

Tie % Dealer with MLP 68.2% 3% When both dealer and player use MLP: Strategy Win % Tie % Player with MLP 54%

3% Dealer with MLP 43% 3% Conclusion MLP network works best for highly random and dynamic games, where the game rules and the strategies are hard to define and the game outputs are hard to predict exactly. Strategies interpreted from Reinforcement Learning - Hit if less than 15, otherwise stand. As the number of game increases, the game strategies will change over time. Future work

Current hand depends on the last hands. Use card memory in Blackjack. Train ANN with a teacher to eliminate duplicate patterns (for example, 4 + 7 = 7 + 4 = 5 + 6 = ) and identify misclassified pattern Train ANN to play against different experts so that it can pick up various game strategies Include game tricks and strategies in a table for the ANN to look up Explore other learning methods

## Recently Viewed Presentations

• On scroll down the page, it will display "Section B" of SP65 E-form. On "Transmit" system will transpose the entered information on SP65 E-Form into the VCheck new transaction screen. After transposing the SP65 E-form information, below screen displays firearms...
• Clostridium perfringens. Bacillus cereus . Clostridium botulinum. These four pathogens are toxin producers. In this case, the bacteria secrete a toxin that makes people sick. Toxin producers . often become a problem when foods are prepared in large quantities and...
• Arial MS Pゴシック Times New Roman Blank Presentation The Rabbinic Tradition Goals for Today's Class Written and Oral Torah Types of Rabbinic Tradition Forms of Rabbinic Tradition-The Mishnah Forms of Rabbinic Tradition-The Talmud PowerPoint Presentation Forms of Rabbinic Tradition- The...
• Symbols of the United Kingdom of Great Britain and Northern Ireland. ... It takes its name from Tudor dynasty. It is a symbol of beefeaters and yeomen. ... This myth of Red Dragon is originated Merlin's vision of a Red...
• Ook hy is om die lewe gebring, en al sy volgelinge is uitmekaar gejaag. 38Wat die huidige geval betref, my raad aan julle is: laat staan hierdie mense en laat hulle los, want as wat hulle wil en wat hulle...
• 4. Statements and Methods Objectives "With regards to programming statements and methods, C# offers what you would come to expect from a modern OOPL…" Statements Methods Part 1 Statements… Statements in C# C# supports the standard assortment…
• LESSON 10 SANCTIFICATION Sanctification is a vital part of redemption. Without sanctification one cannot be saved.
• DET PDHPE Distance Education Programme 1. Structure & function of the ventilatory system 2. Structure & function of the cardiovascular system IB Sports, exercise and health science Sub-topics Exercise Physiology 2.2.7 Analyse cardiac output, stroke volume and heart rate data...