Zero.写作动机

对给定参数区间内部进行搜索，寻找到最优参数近似解的方法有很多。比如网格搜索。但是网格搜索太过暴力，往往花销过大。这里介绍一种新的参数寻优方法——蒙特卡洛树搜索。
网络上关于蒙特卡洛方法几乎清一色都是在介绍Buffon实验并以此估计某个量。这里，我们介绍蒙特卡洛树用于参数寻优。

一、模型原理

下面推荐几个博客，这些文章已经介绍得很好了：

①https://blog.csdn.net/ljyt2/article/details/78332802
②https://www.jianshu.com/p/a34f06885ef8

二、编程实现

Version one: Python
https://www.jianshu.com/p/a34f06885ef8

Version Two: Matlab
鉴于实际需求，笔者在Python版本的基础上实现了matlab版本，涉及到matlab的面向对象编程。读者诸君按需获取即可

state.m文件

classdef State < handlepropertiesvalueroundchoicesPATHx2 %因为假定现在只用MCST找到第三步迭代的最优参数y sigmaimendmethodsfunction self= State(x2,y,sigma,im)%在这里进行初始化self.value = 0;self.round = 0;self.choices = [];self.PATH = [0.1:0.2:3];self.x2 = x2;self.y = y;self.sigma = sigma;self.im = im;endfunction state = new_state(self)choice = randperm(numel(self.PATH));choice = self.PATH(choice(1));%从一维数组中进行随机采样state = State(self.x2,self.y,self.sigma,self.im);%对于辣椒的彩色图片，第三步迭代的默认两个参数是0.7, 0.8value_ = 0;if numel(self.choices) == 1 %当前在选择第二个参数%计算潜在的valuex3 = step(self.x2, self.y, self.sigma^2, 15, 7, self.choices(1), choice);value_ = - (sum(sum((x3 - self.im).^2)) / numel(x3)); %反向来elseif numel(self.choices) == 0 %当前在选择第一个参数%计算潜在的valuex3 = step(self.x2, self.y, self.sigma^2, 15, 7, choice,0.8);value_ = - (sum(sum((x3 - self.im).^2)) / numel(x3)); %反向来elsevalue_ = 0;end%得到一个参数的选择结果state.value = self.value +  value_; %价值计算函数需要更改state.round = self.round+1;state.choices = [ self.choices,choice ];%扩充当前的选择endfunction display(self)fprintf(1,'class State:\n');%表示在终端上进行输出fprintf(1,'value = %f\n',self.value);fprintf(1,'round = %d\n',self.round);fprintf(1,'ready to show the choice array:\n');for i = 1:numel(self.choices)if i == 1fprintf(1,'[');endfprintf(1,'%d,',self.choices(i));if i == numel(self.choices)fprintf(1,']');endendendend
end

Node.m文件

classdef Node < handlepropertiesparentchildren qualityvisitstateMAX_DEPTH = 2MAX_CHOICE = numel([0.1:0.2:3]) %其实代表的是children数组的长度的上限endmethodsfunction self= Node()self.quality = 0.0;self.visit = 0;%剩下的变量没有定义endfunction add_child(self,node)fprintf(1,'printing node in function add_child\n');node          self.children = [self.children,node];node.parent = self;endfunction display(self)fprintf(1,'class Node:\n');%表示在终端上进行输出fprintf(1,'quality = %f\n',self.quality);fprintf(1,'visit = %d\n',self.visit);endfunction  child_node = expand(cnt_node)%随机选择一个之前没有扩展过的——也就是不在children列表中的一个子节点进行扩展，随机性在new_state的时候的随机函数中体现出来%返回当前结点扩展出的子节点fprintf(1,'printing node in function EXPAND\n');cnt_nodefprintf(1,'printing value of the ori_state in function EXPAND:%f\n\n',cnt_node.state.value);cnt_node.state.choicesstate = new_state(cnt_node.state);%拿到当前结点的children列表中的子节点的状态sub_state_value_list = [];for i = 1:numel(cnt_node.children)sub_state_value_list(i) = cnt_node.children(i).state.value;endfprintf(1,'printing value of the new_state in function EXPAND:%f\n\n',state.value);state.choiceswhile ismember(state.value,sub_state_value_list)fprintf(1,'printing value of the new_state in function EXPAND:%f\n\n',state.value);state.choicesstate = new_state(cnt_node.state);endchild_node = Node();child_node.state = state;add_child(cnt_node,child_node);fprintf(1,'printing value of the end_child_state in function EXPAND\n');            for i = 1:numel(cnt_node.children)fprintf(1,'printing value of the end_child_state in function EXPAND:%f\n\n',cnt_node.children(i).state.value);cnt_node.children(i).state.choicesend                        endfunction best = best_child(node)%返回当前结点的children列表中最适合作为扩展结点的子节点fprintf(1,'printing node in function BEST_CHILD\n');node   best_score = -100000000; %代表负无穷best = -1 ;%初始化for i=  1:numel(node.children)C = 1/sqrt(2.0);sub_node = node.children(i);left = sub_node.quality / sub_node.visit; %分母是被访问的次数right = 2.0*log(node.visit)/sub_node.visit;score = left+C*sqrt(right);if score >best_scorebest = sub_node;best_score = score;endendendfunction node = tree_policy(node)fprintf(1,'printing node in function TREE_POLICY\n');node   %选择+expand扩展%调用逻辑：如果当前结点还有子节点没有被添加到children列表——也就是还没有expand过，那么就从还没有扩展过的子节点中随机选择一个进行扩展，并返回该被需选中的子节点%调用逻辑：如果当前结点是叶子结点，直接返回该结点%调用逻辑：如果当前结点的所有子节点都已经被加入到了children列表，那么就从中选择一个收益最高的结点进行扩展，并且返回该结点%选择是否是叶子结点count = 0;while node.state.round < node.MAX_DEPTHfprintf(1,'running while-end with count:%d in Node.m/line73\n',count);count = count +1;if numel(node.children) < node.MAX_CHOICEnode = expand(node);returnelsenode = best_child(node);endend     endfunction expanded_value = default_policy(node)fprintf(1,'printing node in function DEFAULT_POLICY\n');node   %模拟%算一次从当前结点随机走到叶节点的收益now_state = node.state;count= 0;while now_state.round < node.MAX_DEPTHfprintf(1,'running while-end with count:%d in Node.m/line90\n',count);count = count +1;now_state = new_state(now_state);endexpanded_value = now_state.value;endfunction backup(node,reward)fprintf(1,'printing node in function BACKUP\n');node   %从当前结点带着reward回溯到根节点，并且增加路径上的每个结点的visit次数和qualitywhile ~isempty(node)               fprintf(1,'not empty\n');node.visit = node.visit +1;node.quality = node.quality+reward;node = node.parent;endendfunction best = mcts(node)%似乎是多次尝试扩展，选择当前扩展到children列表中的子节点中的收益最好的一个子结点进行扩展，并且返回该被选中的子节点%  times =  5 ;%为什么是5？times = 20;for i = 1:timesexpand = tree_policy(node);%当前结点向下选择扩展一个结点reward = default_policy(expand);%计算从该扩展结点走到叶子结点的随机一条路径的一种收益情况backup(expand,reward);endbest = best_child(node);endfunction main(self)init_state = State();init_node = Node();init_node.state = init_state;cnt_node = init_node;for i = 1:self.MAX_DEPTHcnt_node = mcts(cnt_node);endendend
end

Notice.

在matlab的实现版本中，注意两种不同的类的写法。classdef name < handle是引用类型，这样的类可以作为另外一个类的属性存在。classdef name是按value类型，这样的类如果想要使用自己的实例对象作为类的一个属性会报错。
上面的Node类和State类都属于引用类型。

（MCTS）蒙特卡洛树搜索——参数寻优相关推荐

AI强度相关的研究：MCTS 蒙特卡洛树搜索 Monte Carlo Tree Search
提供具有挑战性的人工智能对手是使视频游戏令人愉悦和身临其境的重要方面. 太简单或反之太难的游戏可能会让玩家感到沮丧或无聊. 动态难度调整是一种方法,旨在通过为对手提供量身定制的挑战来改进传统的难度选择 ...
MCTS 蒙特卡洛树搜索
<Behavior Tree Learning for Robotic Task Planning through Monte Carlo DAG Search over a Formal Gr ...
蒙特卡洛树搜索_蒙特卡洛树搜索与Model-free DRL
我们这里所说的MCTS(蒙特卡洛树搜索),是指通过蒙特卡洛评估和树搜索,对强化学习环境π(•|s)建模的方法. 何为蒙特卡洛? Monte Carlo method,也就是先从某个分布采样,再基于采样 ...
一种简单的蒙特卡洛树搜索并行化方法
监控未观察样本: 一种简单的蒙特卡洛树搜索并行化方法 Watch the Unobserved: a Sample Approach to Parallelizing Monte Carlo Tree ...
python实现的基于蒙特卡洛树搜索(MCTS)与UCT RAVE的五子棋游戏
转自: http://www.cnblogs.com/xmwd/p/python_game_based_on_MCTS_and_UCT_RAVE.html 更新 2017.2.23有更新,见文末 ...
【python】蒙特卡洛树搜索（MCTS）简单实现
过程包括以下四步: 选择 Selection:从根节点 R 开始,递归选择最优的子节点(后面会解释)直到达到叶子节点 L. 扩展 Expansion:如果 L 不是一个终止节点(也就是,不会导致博弈游 ...
DQN、蒙特卡洛树搜索（MCTS）
DQN Q-learning Q(s,a): 状态 s 下采取动作 a 的期望收益 Q(s,a)←(1−α)Q(s,a)+α[r+γmax⁡a′Q(s′,a′)]Q(s,a)\leftarrow (1 ...
强化学习笔记：AlphaGo(AlphaZero) ，蒙特卡洛树搜索（MCTS）
1 AlphaZero的状态围棋的棋盘是 19 × 19 的网格,可以在两条线交叉的地方放置棋子,一共有 361 个可以放置棋子的位置,因此动作空间是 A = {1, · · , 361}.比如动 ...
【Python】用蒙特卡洛树搜索（MCTS）解决寻路问题
像人类一样思考. 文章目录用蒙特卡洛树搜索(MCTS)解决寻路问题关于蒙特卡洛树搜索寻路问题和寻路算法数据结构与定义寻路算法的基本假设权值计算改进后的权值存储和加权随机策略测试运行结 ...

（MCTS）蒙特卡洛树搜索——参数寻优

文章目录

Zero.写作动机

一、模型原理

二、编程实现

Notice.

（MCTS）蒙特卡洛树搜索——参数寻优相关推荐

最新文章

热门文章