Mixed Strategy Game

每一个博弈者按照一定概率选择策略。

在某些情况下 Pure Strategy 是不适用的,比如零和博弈、多个纳什均衡节点。

A probability distribution for each player.
The distributions are mutual best responses to one another in the sense of expected payoffs.
It is a stochastic steady state.

Solving matching pennies


Player 1’s expected payoffs:
If Player 1 chooses Head, -q+(1-q)=1-2q
If Player 1 chooses Tail, q-(1-q)=2q-1
Player 1’s best response B1(q):
For q<0.5, Head (r=1)
For q>0.5, Tail (r=0)
For q=0.5, indifferent (0≤r≤1)

Player 2’s expected payoffs:
If Player 2 chooses Head, r-(1-r)=2r-1
If Player 2 chooses Tail, -r+(1-r)=1-2r
Player 2’s best response B2®:
For r<0.5, Tail (q=0)
For r>0.5, Head (q=1)
For r=0.5, indifferent (0≤q≤1)

达到 “概率” 的纳什均衡。决策不仅取决于对手的策略也同时 取决于每个策略对应的概率。

Example

Expected payoffs: 2 players each with two pure strategies.

Player 1 plays a mixed strategy (r, 1- r ). Player 2 plays a mixed strategy (q, 1- q).

Player 1’s expected payoff of playing s11: EU1(s11, (q, 1-q))=q×u1(s11, s21)+(1-q)×u1(s11, s22)
Player 1’s expected payoff of playing s12: EU1(s12, (q, 1-q))= q×u1(s12, s21)+(1-q)×u1(s12, s22)
Player 1’s expected payoff from her mixed strategy: v1((r, 1-r), (q, 1-q))=r×EU1(s11, (q, 1-q))+(1-r)×EU1(s12, (q, 1-q))

Player 2’s expected payoff of playing s21: EU2(s21, (r, 1-r))=r×u2(s11, s21)+(1-r)×u2(s12, s21)
Player 2’s expected payoff of playing s22: EU2(s22, (r, 1-r))= r×u2(s11, s22)+(1-r)×u2(s12, s22)
Player 2’s expected payoff from her mixed strategy: v2((r, 1-r),(q, 1-q))=q×EU2(s21, (r, 1-r))+(1-q)×EU2(s22, (r, 1-r))

Mixed strategy Nash equilibrium:
A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if (r*,1-r*) is a best response to (q*, 1-q*), and (q*, 1-q*) is a best response to (r*,1-r*). That is,
v1((r*, 1-r*), (q*, 1-q*)) ≥ v1((r, 1-r), (q*, 1-q*)), for all 0≤ r ≤1
v2((r*, 1-r*), (q*, 1-q*)) ≥ v2((r*, 1-r*), (q, 1-q)), for all 0≤ q ≤1

Theorem

Theorem 1

A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*)) is a Nash equilibrium if and only if
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s11, (q*, 1-q*))
v1((r*, 1-r*), (q*, 1-q*)) ≥ EU1(s12, (q*, 1-q*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s21, (r*, 1-r*))
v2((r*, 1-r*), (q*, 1-q*)) ≥ EU2(s22, (r*, 1-r*))

在竞争者使用 mixed strategy 时,选择使用 mixed strategy 一定比使用单边 pure strategy 带来的收益要高。pure strategy 是 mixed strategy 的一个特例,是以 1 的概率选择策略,很显然,有更多的选择肯定要比单一选择带来的收益高。

Theorem 2

Let ((r*, 1-r*), (q*, 1-q*)) be a pair of mixed strategies, where 0 <r*<1, 0<q*<1. Then ((r*, 1-r*), (q*, 1-q*)) is a mixed strategy Nash equilibrium if and only if
EU1(s11, (q*, 1-q*)) = EU1(s12, (q*, 1-q*))
EU2(s21, (r*, 1-r*)) = EU2(s22, (r*, 1-r*))
That is, each player is indifferent between her two strategies.
Significance: it gives conditions for a mixed strategy NE in terms of each player’s expected payoffs only to her pure strategies.

Mixed Strategy Nash Equilibrium

Mixed Strategy:
A mixed strategy of a player is a probability distribution over the player’s strategies.

Mixed strategy Nash equilibrium:
A probability distribution for each player
The distributions are mutual best responses to one another in the sense of expected payoffs

Employee Monitoring


Employee’s expected payoff of playing “work”
EU1(Work, (q, 1–q)) = q×50 + (1–q)×50=50

Employee’s expected payoff of playing “shirk”
EU1(Shirk, (q, 1–q)) = q×0 + (1–q)×100=100(1–q)

Employee is indifferent between playing Work and Shirk.
50=100(1–q)
q=1/2

Manager’s expected payoff of playing “Monitor”
EU2(Monitor, (r, 1–r)) = r×90+(1–r)×(-10) =100r–10

Manager’s expected payoff of playing “Not”
EU2(Not, (r, 1–r)) = r×100+(1–r)×(-100) =200r–100

Manager is indifferent between playing Monitor and Not
100r–10 =200r–100 implies that r=0.9.

Hence, ((0.9, 0.1), (0.5, 0.5)) is a mixed strategy Nash equilibrium by Theorem 2.

最大程度的干扰敌手,不能让敌手猜测出自己的偏好,让其没有一个一定最佳的应对策略。

Prisoners’ Dilemma

这里假设 Prisoners’ Dilemma 为一个 Mixed Strategy Game,prisoner 按一定的概率去选择 mum 还是 confess。

prisoner1:
U1(m, q*) = U1(c, q*)
根据定理,对于 prisoner1 来讲,单独的选择 m 和 c 带来收益是一样的(prisoner2 会控制 q 使得 prisoner1 无法猜出其偏好)
U1(m, q*) = q×(-1)+(1-q)×(-9)= 8q*-9
U1(c, q*) = q×0 +(1-q)×(-6)= 6q*-6
=> 8q*-9 = 6q*-6
=> q* = 3/2
同理求得:r* = 3/2
因为前提条件是 0≤ q ≤1; 0≤ r ≤1,所以在 Prisoners’ Dilemma 中不存在 Mixed Strategy Nash Equilibrium.

Existence of NE

Any finite game has a (mixed-strategy) NE.

strategy profile x* ∈ X,is called NE if only if,
1、inequality constraints
Ui(xi*, x-i*) >= Ui(xi, x-i*) for all xi ∈ X,all i ∈ N
任何节点没有动机去改变策略
2、 solution to multivariate function
Ui(xi*, x-i*) = maxUi(xi, x-i*) for all xi ∈ X,all i ∈ N
最佳收益策略
3、fixed point of best response function
xi* ∈ BRi(x-i*) where BRi(x-i*) = maxUi(xi, x-i*)
定点定理

fixed point定理

Brouwer fixed-point theorem: Let S⊂Rn be convex and compact, if T: S -> S is continuous, then there exits a fixed point, that is, there exits x* ∈ S such that x* = T(x*).
S: set is convex and compact, that is, x ∈ S, y ∈ S, 0<α<1 => αx + (1-α)y ∈ S, close and bound.
so, fixed point of best response function means, xa* = BRa(BRb(xa*)).
在 a 的决策空间中针对 b 选择了一个最佳映射,b 同样 执行相同的操作。

Proof

We define a finite f over the space of the mixed strategy profile Δ. We will argue that Δ is compact and convex and if f is continuous, hence the sequence defined by Δ0 … Δn => Δn = f(Δn-1) has an accumulated point. We will also argue that every fixed point of f must be a NE.
Δ is clearly compact and convex, since it is Δ = {{Δi}: any i ∈ N, δij ∈ Δi, j ∈ Si, δij≥0, ∑δij = 1}
Δn = f(Δn-1) => NE
The expect utility of player i if he were to play a particular pure strategy s ∈ Si instead of mixed strategy Δi would be
Ui(Si, Δ-i) = ∑∑ Δj Ui(Si, Sj);
Given a mixed strategy profile Δ = ∏ Δi, the expected utility of player i is
Ui(Δ) = ∑∑ Δj Ui(Sj, S-j);
Define Pi(Si, Δ) = Ui(Si, Δ-i) - Ui(Δ);
we define (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi);(如果有一个策略的 pure strategy 的收益高于平均水平,此时会增加该策略的概率来提高平均收益)
=> Δi = (Δi + max(Pi(Si, Δ), 0)) / (1 + ∑max(Pi(Si, Δ), 0)) = f(Δi)
=> max(Pi(Si, Δ), 0) = ∑max(Pi(Si, Δ), 0)
so, f(Δ) function, there exits fixed point Δ => max(Pi(Si, Δ), 0) = 0
=> Ui(Si, Δ-i) - Ui(Δ) ≤ 0 (NE’s definition),that is Ui(Si, Δ-i) ≤ Ui(Δ)

Mixed Strategy Game相关推荐

  1. An overview of gradient descent optimization algorithms

    转载自:http://sebastianruder.com/optimizing-gradient-descent/ 梯度下降优化及其各种变体.1.随机梯度下降(SGD) 2.小批量梯度下降(mini ...

  2. 博弈论 斯坦福game theory stanford week 3.2_

    title: 博弈论 斯坦福game theory stanford week 3-1 tags: note notebook: 6- 英文课程-15-game theory --- 博弈论 斯坦福g ...

  3. 斯坦福博弈论笔记整理活动的任务已重新划分,望周知

    参与方式:https://github.com/apachecn/stanford-game-theory-notes-zh/blob/master/CONTRIBUTING.md 整体进度:http ...

  4. 博弈论 斯坦福game theory stanford week 2.1_

    title: 博弈论 斯坦福game theory stanford week 2-0 tags: note notebook: 6- 英文课程-15-game theory --- 博弈论 斯坦福g ...

  5. REINFORCEMENT LEARNING USING QUANTUM BOLTZMANN MACHINES利用量子波兹曼机进行强化学习

    REINFORCEMENT LEARNING USING QUANTUM BOLTZMANN MACHINES 利用量子波兹曼机进行强化学习 Abstract. We investigate whet ...

  6. 干货!基于常识图谱和混合策略的情绪支持对话系统

    点击蓝字 关注我们 AI TIME欢迎每一位AI爱好者的加入! 情绪支持对话系统旨在理解人类用户的情绪困扰,并通过提供共情回复和疏导建议给予情感陪伴. 为了让对话系统拥有更强大的理解能力,我们在论文& ...

  7. Gradle 2.0 用户指南翻译——第五十章. 依赖管理

    本文禁止w3cschool转载! 翻译项目请关注Github上的地址:https://github.com/msdx/gradledoc . 本文翻译所在分支:https://github.com/m ...

  8. 论文浅尝 | MISC:融合COMET的混合策略模型进行情感支持对话

    笔记整理:朱益鹏,东南大学硕士,研究方向为知识图谱问答.自然语言处理. 论文引用:Tu, Q. ,  Li, Y. ,  Cui, J. ,  Wang, B. ,  Wen, J. R. , &am ...

  9. 认知网络知识点及例题总结

    前言 该博客为认知网络课程知识点与例题的总结,其中不乏错误,还望大家指正,我会及时修改. 文章的电子版(直接打印)下载链接见文末. 更新:20年最新试题题型有所变动,建议大家下载学习一下,下载链接见文 ...

最新文章

  1. 数据库 - mysql内置功能
  2. 第五章-分布式并行编程框架MapReduce
  3. java保存图片进度条_Java上传文件进度条的实现方法(附demo源码下载)
  4. 利用redis实现分布式锁
  5. 目录和文件管理(一)
  6. [WC2011][BZOJ2115] Xor
  7. qq空间说说服务器维护,如何解决QQ空间说说发表不了
  8. sort降序shell_排序之希尔排序(shell sort)
  9. Mysql的简单使用(二)
  10. [USB-Blaster] Error (209040): Can't access JTAG chain
  11. Unity 连接MySql数据库
  12. 花了很长时间看完了 java编程思想
  13. bzoj 4082: [Wf2014]Surveillance 倍增
  14. cadence 617工艺库安装以及相关问题解决
  15. 华为交换机配置acl规则
  16. vue + element-ui本地下载图片
  17. 用Matlab实现蒙特卡洛法求心形线面积
  18. using (XX xx = ...) 的含义
  19. TM1650芯片驱动四位数码管
  20. 处理器后面的字母含义_电脑CPU型号末端的字母是什么意思?让小编来告诉你吧...

热门文章

  1. IjkPlayer+AndroidVideoCache 实现音乐播放
  2. [SSD核心技术:FTL 11] 固态硬盘Read 技术详解
  3. 运营管理 优化成本管理成就卓越绩效采购
  4. 【拨号】iPhone拨号功能隐藏代码,值得收藏。
  5. Qumulo体系结构白皮书
  6. 我想成为一个真的程序员
  7. 前端页面性能优化指标
  8. linux分区如何4k对齐,Linux如何进行无损修复4K对齐?
  9. java access jdbc_JAVA软件逆向之hxtt的Access_JDBC30.jar
  10. WIFi天线和天线测试