我的编程空间,编程开发者的网络收藏夹
学习永远不晚

AlphaZero并行五子棋AI

短信预约 -IT技能 免费直播动态提醒
省份

北京

  • 北京
  • 上海
  • 天津
  • 重庆
  • 河北
  • 山东
  • 辽宁
  • 黑龙江
  • 吉林
  • 甘肃
  • 青海
  • 河南
  • 江苏
  • 湖北
  • 湖南
  • 江西
  • 浙江
  • 广东
  • 云南
  • 福建
  • 海南
  • 山西
  • 四川
  • 陕西
  • 贵州
  • 安徽
  • 广西
  • 内蒙
  • 西藏
  • 新疆
  • 宁夏
  • 兵团
手机号立即预约

请填写图片验证码后获取短信验证码

看不清楚,换张图片

免费获取短信验证码

AlphaZero并行五子棋AI

Github : AlphaZero-Gomoku-MPI

Overview

This repo is based on junxiaosong/AlphaZero_Gomoku, sincerely grateful for it.

I do these things:

  • Implement asynchronous self-play training pipeline in parallel like AlphaGo Zero's way
  • Write a root parallel mcts (vote a move using ensemble way)
  • Use ResNet structure to train the model and set a transfer learning API to train a larger board model based on small board's model (like pre-training way in order to save time)

Strength

  • Current model is on 11x11 board, and playout 400 times when test
  • Play with this model, can always win regardless of black or white
  • Play with gomocup's AI, can rank around 20th-30th for some rough tests
  • When I play white, I can't win AI. When I play black, end up with tie/lose for most of my time

References

  • Mastering the game of Go without human knowledge
  • A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
  • Parallel Monte-Carlo Tree Search

Blog

  • deepmind blog
  • mpi4py blog -- author: 自可乐

Installation Dependencies

  • Python3
  • tensorflow>=1.8.0
  • tensorlayer>=1.8.5
  • mpi4py (parallel train and play)
  • pygame (GUI)

How to Install

tensorflow/tensorlayer/pygame install :

conda install tensorflow
conda install tensorlayer
conda install pygame

mpi4py install click here

mpi4py on windows click here

How to Run

  • Play with AI
python human_play.py
  • Play with parallel AI (-np : set number of processings, take care of OOM !)
mpiexec -np 3 python -u human_play_mpi.py 
  • Train from scratch
python train.py
  • Train in parallel
mpiexec -np 43 python -u train_mpi.py

Algorithm

It's almost no difference between AlphaGo Zero except APV-MCTS.
A PPT can be found in dir demo/slides

Details

Most settings are the same with AlphaGo Zero, details as follow :

  • Network Structure
    • Current model uses 19 residual blocks, more blocks means more accurate prediction but also slower speed
    • The number of filters in convolutional layer shows in the follow picture
  • Feature Planes
    • In AlphaGo Zero paper, there are 19 feature planes: 8 for current player's stones, 8 for opponent's stones, and the final feature plane represents the colour to play
    • Here I only use 4 for each player, it can be easily changed in game_board.py
  • Dirichlet Noise
    • I add dirichlet noises in each node, it's different from paper that only add noises in root node. I guess AlphaGo Zero discard the whole tree after each move and rebuild a new tree, while here I keep the nodes under the chosen action, it's a little different
    • Weights between prior probabilities and noises are not changed here (0.75/0.25), though I think maybe 0.8/0.2 or even 0.9/0.1 is better because noises are added in every node
  • parameters in detail
    • I try to maintain the original parameters in AlphaGo Zero paper, so as to testify it's generalization. Besides, I also take training time and computer configuration into consideration.

      Parameters Setting Gomoku AlphaGo Zero
      MPI num 43 -
      c_puct 5 5
      n_playout 400 1600
      blocks 19 19/39
      buffer size 500,000(data) 500,000(games)
      batch_size 512 2048
      lr 0.001 annealed
      optimizer Adam SGD with momentum
      dirichlet noise 0.3 0.03
      weight of noise 0.25 0.25
      first n move 12 30
  • Training detials
    • I train the model for about 100,000 games and takes 800 hours or so
    • Computer configuration : 2 CPU and 2 1080ti GPU
    • We can easily find the computation gap with DeepMind and rich people can do some future work

Some Tips

  • Network
    • ZeroPadding with Input : Sometimes when play with AI, it's unaware of the risk at the edge of board even though I'm three/four in a row. ZeroPadding data input can mitigate the problem
    • Put the network on GPU : If the network is shallow, it's not matter CPU/GPU to use, otherwise it's faster to use GPU when self-play
  • Dirichlet Noise
    • Add Noise in Node : In junxiaosong/AlphaZero_Gomoku, noises are added outside the tree, seemingly like DQN's \(\epsilon-greedy\) way. It's ok when I test on 6x6 and 8x8 board, but when on 11x11 some problems occur. After a long time training on 11x11, black player will always play the first stone in the middle place with policy probability equal to 1. It's very rational for black to play here, however, the white player will never see other kifu that play in the other place at first stone. So, when I play black with AI and place somewhere not the middle place, AI will get very stupid because it has never seen this way at all. Add noise in node can mitigate the problem
    • Smaller Weight with Noise : As I said before, I think maybe 0.8/0.2 or even 0.9/0.1 is a better choice between prior probabilities and noises' weights, because noises are added in every node
  • Randomness
    • Dihedral Reflection or Rotation : When use the network to output probabilities/value, it's better to do as paper said: The leaf node \(s_L\) is added to a queue for neural network evaluation, \((d_i(p),v)=f_{\theta}(d_i(s_L))\), where \(d_i\) is a dihedral reflection or rotation selected uniformly at random from \(i\) in \([1..8]\)
    • Add Randomness when Test : I add the dihedral reflection or rotation also when play with it, so as to avoid to play the same game all the time
  • Tradeoffs
    • Network Depth : If the network is too shallow, loss will increase. If too deep, it's slow when train and test. (My network is still a little slow when play with it, I think maybe 9 blocks is all right)
    • Buffer Size : If the size is small, it's easy to fit by network but can't guarantee it's performance for only learning from these few data. If it's too large, much longer time and deeper network structure should be taken
    • Playout Number : If small, it's quick to finish a self-play game but can't guarantee kifu's quality. On the contrary with more playout times, better kifu will get but also take longer time

Future Work Can Try

  • Continue to train (a larger board) and increase the playout number
  • Try some other parameters for better performance
  • Alter network structure
  • Alter feature planes
  • Implement APV-MCTS
  • Train on standard/renju rule

免责声明:

① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。

② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

AlphaZero并行五子棋AI

下载Word文档到电脑,方便收藏和打印~

下载Word文档

猜你喜欢

AlphaZero并行五子棋AI

Github : AlphaZero-Gomoku-MPIOverviewThis repo is based on junxiaosong/AlphaZero_Gomoku, sincerely grateful for it.I do
2023-01-30

怎么利用C语言实现AI五子棋游戏

本篇内容介绍了“怎么利用C语言实现AI五子棋游戏”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!目录一.如何实现二.实现代码及分析(1)菜单的
2023-06-20

Python游戏开发怎么用graphics实现AI五子棋

本篇内容介绍了“Python游戏开发怎么用graphics实现AI五子棋”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!效果展示源码impor
2023-06-25

python游戏项目:39行代码打造另类五子棋

前言freegames是Apache2许可的免费Python游戏集合,旨在用于教育和娱乐,完全是开源的,我们只要引用编写就好,专门为游戏而生那今天为大家介绍的是一款另类五子棋,虽然游戏规则跟五子棋的规则是一样的,但是这款游戏是由图形界面来生
2023-06-02

编程热搜

  • Python 学习之路 - Python
    一、安装Python34Windows在Python官网(https://www.python.org/downloads/)下载安装包并安装。Python的默认安装路径是:C:\Python34配置环境变量:【右键计算机】--》【属性】-
    Python 学习之路 - Python
  • chatgpt的中文全称是什么
    chatgpt的中文全称是生成型预训练变换模型。ChatGPT是什么ChatGPT是美国人工智能研究实验室OpenAI开发的一种全新聊天机器人模型,它能够通过学习和理解人类的语言来进行对话,还能根据聊天的上下文进行互动,并协助人类完成一系列
    chatgpt的中文全称是什么
  • C/C++中extern函数使用详解
  • C/C++可变参数的使用
    可变参数的使用方法远远不止以下几种,不过在C,C++中使用可变参数时要小心,在使用printf()等函数时传入的参数个数一定不能比前面的格式化字符串中的’%’符号个数少,否则会产生访问越界,运气不好的话还会导致程序崩溃
    C/C++可变参数的使用
  • css样式文件该放在哪里
  • php中数组下标必须是连续的吗
  • Python 3 教程
    Python 3 教程 Python 的 3.0 版本,常被称为 Python 3000,或简称 Py3k。相对于 Python 的早期版本,这是一个较大的升级。为了不带入过多的累赘,Python 3.0 在设计的时候没有考虑向下兼容。 Python
    Python 3 教程
  • Python pip包管理
    一、前言    在Python中, 安装第三方模块是通过 setuptools 这个工具完成的。 Python有两个封装了 setuptools的包管理工具: easy_install  和  pip , 目前官方推荐使用 pip。    
    Python pip包管理
  • ubuntu如何重新编译内核
  • 改善Java代码之慎用java动态编译

目录