apply reinforcement learning methods

apply reinforcement learning methods.

Project 1 – Part 2 (CMPS 499/CSCE 572)

Sum21 Game rules (actual name of this game is easy21)

The goal of this assignment is to apply reinforcement learning methods to a simple card game that we call Sum21. Here is a list of the rules to play the game:

  • The game is played with one dealer and one player.
  • The game is played with an infinite deck of cards (i.e. cards are sampled with

replacement).

  • Each draw from the deck results in a value between 1 and 10 (uniformly distributed) with

a color of red (probability 1/3) or black (probability 2/3).

  • There are no face cards in this game.
  • Ace card is always just one.
  • At the start of the game both the player and the dealer draw one black card (fully

observed).

  • Each turn the player may either stick or hit.
  • If a player hits then the player draws another card from the deck.
  • If a player stands, the player receives no further cards.
  • The values of the player’s cards are added (black cards) or subtracted (red cards).
  • If a player’s sum exceeds 21, or becomes less than 1, then the player busts and loses the

game (reward -1).

  • If the player stands then the dealer starts taking turns. The dealer always stands on any

sum of 17 or greater, and hits otherwise. If the dealer goes bust, then the player wins; otherwise, the outcome win (reward +1), lose (reward -1), or draw (reward 0) is the player with the largest sum.

Requirement: 

Please use all of the five temporal difference (TD) learning policy optimization methods introduced in the class to find optimal playing strategies for the game Sum21. 

Algorithms to be implemented

TD(0)

TD Sarsa : N-Step Sarsa, Forward view Sarsa, Backward view Sarsa

Q-learning

Submission: 

You should submit a single pdf or doc document containing the learned optimal strategies (five of them), plots/learning curves (it is better to put the five learning curves in one plot), and discussion. 

Learning curve for the optimal strategy for the optimal

Optimal stratigy table looks like this

apply reinforcement learning methods

Posted in Uncategorized