PowerPoint-presentatie

PowerPoint-presentatie

Learning Strategic Features for General Games Dennis Soemers June 5, 2019 @DennisSoemers Digital Ludeme Project

Computational study of traditional games throughout history - Model games in general game system (Ludii) - Generate plausible reconstructions of rulesets - Data-driven - AI self-play to ``play-test generated rulesets

AI Requirements Play approx. 1000 strategy games - and many more variants - Need General Game Players! Ideally strong, human-level AI

- Do not need super-human AI Automated strategy learning - Learning from self-play Interpretable strategies

General Game Playing (GGP) Monte Carlo tree search (MCTS) - Prevailing GGP approach - Can be improved with learned policies Repeated X times

Selection Play-out Expansion Backpropagation

Policy Learning from Self-play State of the art: Deep Learning - AlphaGo, AlphaGo Zero, AlphaZero Downsides of Deep Learning

Start learning from scratch per game - Difficult for 1000 games Requires some domain knowledge - One policy output node per action - How many actions possible in ?

- Difficult for General Game Playing Expensive General Game Features Binary features for state-action pairs

Local patterns - Use underlying graph-representation Widely applicable - Single format, many games

General Game Features Which features to use? Learn features and weights simultaneously Start with atomic features - Simple patterns with a single test

Combine pairs of features - Maximise correlation with policys objective - Minimise correlation with constituents D. J. N. J. Soemers, . Piette, C. Browne (2019). Biasing MCTS with Features for General Games. In 2019 IEEE Congress on Evolutionary Computation.

Self-play Policy Learning Objectives Minimise cross-entropy between learned policy and MCTS visit counts - AlphaGo Zero, AlphaZero, etc. - MCTS is exploratory by design - Trained policy also exploratory!

Self-play Policy Learning Objectives Do we want our trained policy to be exploratory? - Bias MCTS Selection

Yes Bias MCTS Play-out Maybe Interpret learned strategies No Use strategy for No game distance function

D. J. N. J. Soemers, . Piette, M. Stephenson, C. Browne (2019). Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates. In 2019 IEEE Conference on Games. Conclusion Promising results on 10 games so far

- All two-player, full information, deterministic Work in progress: - Scaling up to more games

Multi-player, hidden info, nondeterministic, Speeding up features Interpreting learned strategies Thank you! AI (UCT) without features in Gomoku

AI (UCT) without features in Gomoku AI (Biased MCTS) with features in Gomoku AI (Biased MCTS) with features in Gomoku

Learning Curves (CEC 2019) Learning Curves - Pruned (CEC 2019) Learning Curves (COG 2019) Policy Entropy (COG 2019)

Recently Viewed Presentations

  • Lymphatic System Mike Clark, M.D. Lymphatic System  Composed

    Lymphatic System Mike Clark, M.D. Lymphatic System Composed

    The Hassall's corpuscles used to be thought of as a place where T-cells were destroyed - recent evidence shows that this is the site of production of regulatory T-cells important in preventing autoimmune actions.
  • MID301: Windows Server AppFabric Cache: A Methodology for ...

    MID301: Windows Server AppFabric Cache: A Methodology for ...

    Case Study: Trey Research. Online portal that provides general health forums, doctor & hospital reviews and shopping cart for buying medicines from partner pharmacies. Software Systems overview. 4 Web Servers hosting the ASP.NET web application . Session state stored in...
  • Adaption of Paleoclimate Reconstructions for ...

    Adaption of Paleoclimate Reconstructions for ...

    Adaption of Paleoclimate Reconstructions for Interdisciplinary Research ... the problem We are confronted with chained reasoning, inferences and decision making in paleoclimate research, environmental studies Basic concepts of paleoclimatic methods Indirect evidence (proxies) for climatic ...
  • An Ad Omnia Approach to Defining and Achieving Private Data ...

    An Ad Omnia Approach to Defining and Achieving Private Data ...

    Some ratings not sensitive, some may be sensitive OK for Netflix to know, not OK for public to know A Publicly Available Set of Movie Rankings International Movie Database (IMDb) Individuals may register for an account and rate movies Need...
  • Kristályosodás - Glink.hu

    Kristályosodás - Glink.hu

    Times New Roman Symbol Alapértelmezett terv Microsoft Equation 3.0 Microsoft Photo Editor 3.0 Photo Fázisátalakulások Fázisátalakulások termodinamikai hajtóereje 3. dia 4. dia 5. dia 6. dia 7. dia 8. dia 9. dia Heterogén magképződés 11. dia Kristályosodási mechanizmusok és formák...
  • Web Services and the Grid - WSRF and WSRF::Lite

    Web Services and the Grid - WSRF and WSRF::Lite

    Chapter 9 Processes and Workflows COMP 4/6302
  • SETNA Presentation - California

    SETNA Presentation - California

    Define existing SETNA fee structure. Identify and define the major SETNA expenditures. Match the SETNA expenditure to the payment classification using the 9-1-1 Architecture. Discuss how Next Gen 9-1-1 will change 9-1-1 in California
  • SQL Injection

    SQL Injection

    COSC 341 at IUP. What is SQL Injection? One of the most serious threats for Web Application. Inject SQL commands into an SQL statement, via web page input. Alters an SQL statement and compromises the security of a web application.