代做MLNC作业、代写Matlab编程语言作业、代做CID留学生作业、代写Matlab语言作业
October 25, 2018
MLNC – Machine Learning & Neural Computation – Dr Aldo Faisal
Coursework 1 - Grid World
To be returned via Blackboard as indicated online.
Your coursework should contain: your name, your CID and your degree course at the top of the first
page. You text should provide brief analytical derivations and calculations as necessary in-line, so that
the markers can understand what you did. Please use succinct answers to the questions. Your final
document should be submitted as a single .zip file, containing one single PDF file, in the format of
CID FirstnameLastname.pdf (example: 012345678 JaneXu.pdf), and one single .m file, also in the
format CID FirstnameLastname.m. Note, that therefore all code that you have written or modified must
be within that one Matlab file. Do not submit multiple Matlab files, do not modify other Matlab files.
Your Matlab script should contain a function that takes no arguments and is called RunCoursework(),
that should produce all the Matlab-based results of your coursework (in clearly labelled text and/or figure
output). This function should be able to run on its own, in a clean Matlab installation and directory with
only the code we provided for the coursework present.
Please additionally paste the same fully-commented Matlab source code in the appendix of your PDF
submission. You are allowed to use all built-in Matlab functions and any Matlab functions supplied by
the course or written by you.
The markers may subtract points for badly commented code, coding that does not run and coding that
does not follow the specifications. Figures should be clearly readable, labelled and visible – poor quality
or difficult to understand figures may result in a loss of points.
Your coursework should not be longer than 4 single sided pages with 2 centimetre margins all around
and 12pt font. You are encouraged to discuss with other students, but your answers should be yours, i.e.,
written by you, in your own words, showing your own understanding. You have to produce your own
code. If you have questions about the coursework please make use of labs or Piazza, but note that GTAs
cannot provide you with answers that directly solve the coursework.
Marks are shown next to each question. Note that the marks are only indicative.
Figure 1: Grid World
This coursework uses the simple Grid World shown in Figure 1. There are 14 states, corresponding to locations
on a grid – two cells (marked in grey) are walls and therefore cannot be occupied. This Grid World has two
terminal states, s2 (the Goal state, in green) and s3 (the Penalty state, in red).
The starting state can vary. In each simulation there is an equal probability of starting from one of the
states s11, s12, s13, s14 (i.e. there is 1
4 probability of starting from any of these states).
Possible actions in this world are N, E, S and W (North, East, South, West), which correspond to
moving in the four cardinal directions of the compass.
The effects of actions are not deterministic, and only succeed in moving in the desired direction with
probability p. Alternatively, the agent will move perpendicular to its desired direction in either adjacent
direction with probability (1p)
2 . After the movement direction is determined, and if a wall blocks the agent’s path, then the agent will stay
where it is, otherwise it will move to the corresponding adjacent. So for example, in the grid world where
p = 0.8, an agent at state s5 which chooses to move north will move north to state s1 with probability
0.8; will move east to state s6 with probability 0.1; or will move west staying in state s5 with probability
0.1 (in which case it will bang into the wall and come to rest in state s5).
The agent receives a reward of 1
for every transition (i.e. a movement cost), except those movements
ending in state s3 (marked P for penalty) or state s2 (marked with G for goal). For transitioning to s3
there is a penalty of 10.
For transitioning to s2 there is a reward of 0.
We provide the code PersonalisedGridWorld.p. It contains the function that sets up the Grid World
with p probability of successful transition and returns the full MDP information. Note that .p files are
similar to normal Matlab functions/scripts, but are not human-readable (i.e. you do not/should not edit
it).
2
>> [NumStates, NumActions, TransitionMatrix, ...
RewardMatrix, StateNames, ActionNames, AbsorbingStates] ...
= PersonalisedGridWorld(p);
With NumStates being the number of states in the Grid World, and NumActions the number of actions
the agent can take. The TransitionMatrix is a NumStates ? NumStates NumActions
array of specified transition probabilities between (first dimension) successor state, (second dimension)
prior state, and (third dimension) action. RewardMatrix is the NumStates NumStates ?
NumActions array of reward values between (first dimension) successor state, (second dimension) prior
state, and (third dimension) action. StateNames is a NumStates1 matrix containing the name of
each state. ActionNames is a NumActions1 matrix containing the name of each action. Finally,
AbsorbingStates is a NumStates1 matrix specifying which states are terminal.
The coursework is personalised by your CID number. Throughout the exercise we set p = 0.5+0.5 x
10 and

= 0.2+0.5 y
10 , where x is the penultimate digit of your College ID (CID), and y is the last digit of your
CID. If your CID is 876543210 we have X = 1 and y = 0 resulting in p = 0.55 and
= 0.2.
Questions
Points per questions are indicative only. Questions become progressively more challenging.
1. (1 point) State your CID and personalised p and
(no need to show derivation).
2. (15 points) Assume the MDP is operating under an unbiased policy u, compute the value function
V u
(s) for every non-terminal state (s1, s4 ...,s14) by any dynamic programming method of your
choice. Report your result in the following format:
State s1 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14
Value 2 2.54 1
3 ...
3. (25 points) Assume you are observing the following state transitions from the above MDP: {s14, s10, s8, s4, s3},
{s11, s9, s5, s6, s6, s2}, {s12, s11, s11, s9, s5, s9, s5, s1, s2}.
(a) What is the likelihood that the above observed 3 sequences were generated by an unbiased policy
Report the value of the likelihood.
(b) Find a policy ?M for the observed 3 sequences that has higher likelihood than the likelihood of u
to have generated these sequences. Report it in the following table format. Note, that as not all
states are visited by these 3 sequences you only have to report the policy for visited, non-transient
states. Report your result using the following format:
State s1 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14
Action N S W ...
4. (39 points)
(a) Assume an unbiased policy u in this MDP. Generate 10 traces from this MDP and write them out.
When writing them out use one line for each trace, use symbols S1, S4, ..., S14, actions N, E, S,
W, and the rewards in the following format (please make sure we can easily copy and paste these
values from the PDF in one go), e.g. the output must be in the following format (so that we can
copy and paste the text from your PDF into our automatic testing software).
S12,W,-1,S11,N,-1,S9,N,-1,S5,N,-1,S1,N,-1,S1,E,0
S14,E,-1,S10,E,-1,S8,W,-1,S7,S,-1,S6,N,0
3
(b) Apply First-Visit Batch Monte-Carlo Policy Evaluation to estimate the value function
from
these 10 traces alone. Report the value function for every non-terminal state (s1, s4 ...,s14) using
the format specified in Question 2.
(c) Quantify the difference between
obtained from Q4.b and
obtained from Q2 by defining
a measure that reports in a single number how similar these two value functions are. Justify your
choice of measure. Then, plot the value of the proposed similarity measure against the number of
traces used. Start plotting the measure using the first trace, then the first and second trace, and so
forth. Comment on how increasing the number of traces affects the similarity measure.
5. (20 points)
(a) Implement -greedy first-visit Monte Carlo control. Evaluate the learning and control for two settings
of , namely 0.1 and 0.75.
For each setting of , plot two types of learning curves:
i. Plot reward against episodes.http://www.daixie0.com/contents/12/1997.html
ii. Plot trace length per episode against episodes.
Note: An episode is one complete trace. A trial is many episodes starting from an initialisation of
the agent. The learning curves are stochastic quantities, you may need to run a good number of
repeated learning experiments to average out the variability across trials. Specify the number of
trials and plot mean ± standard deviation of your learning curves.

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

微信:codinghelp

转载于:https://www.cnblogs.com/javanewpython/p/9931353.html

MLNC – Machine Learning Neural Computation相关推荐

  1. Machine Learning week 4 quiz: programming assignment-Multi-class Classification and Neural Networks

    一.ex3.m %% Machine Learning Online Class - Exercise 3 | Part 1: One-vs-all% Instructions % --------- ...

  2. Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth

    Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth ...

  3. Andrew Ng 的 Machine Learning 课程学习 (week4) Multi-class Classification and Neural Networks

    这学期一直在跟进 Coursera上的 Machina Learning 公开课, 老师Andrew Ng是coursera的创始人之一,Machine Learning方面的大牛.这门课程对想要了解 ...

  4. 台大李宏毅Machine Learning 2017Fall学习笔记 (11)Convolutional Neural Network

    台大李宏毅Machine Learning 2017Fall学习笔记 (11)Convolutional Neural Network 本博客主要整理自: http://blog.csdn.net/x ...

  5. (2018)All-optical machine learning using diffractive deep neural networks

    "All-optical machine learning using diffractive deep neural networks",这篇Science上的文章发表于2018 ...

  6. 【AirCompWeiszfeld】Byzantine-Resilient Federated Machine Learning via Over-the-Air Computation

    Byzantine-Resilient Federated Machine Learning via Over-the-Air Computation 通过空中计算的拜占庭弹性联合机器学习 论文 Ab ...

  7. Machine Learning week 5 quiz: programming assignment-Multi-Neural Network Learning

    一.ex4.m %% Machine Learning Online Class - Exercise 4 Neural Network Learning% Instructions % ------ ...

  8. 【译】The challenge of verification and testing of machine learning

    在我们的第二篇文章中 ,我们给出了一些背景解释为什么攻击机器学习通常比维护它更容易. 我们看到了一些原因,为什么我们还没有完全有效的防范敌对的例子,我们猜测我们是否能够期待这样的防御. 在这篇文章中, ...

  9. Federated Machine Learning: Concept and Applications

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! 今天的人工智能仍然面临两大挑战.一种是,在大多数行业中,数据以孤岛的形式存在.二是加强数据隐私和安全.我们提出了一个解决这些挑战的可能方案 ...

  10. 多模态机器学习入门——文献阅读(一)Multimodal Machine Learning: A Survey and Taxonomy

    文章目录 说明 论文阅读 Abstract Introduction Introduction总结 Applications:A Historical Perspective 补充与总结 3 MULT ...

最新文章

  1. im即时通讯源码_IM消息ID技术专题(六):深度解密滴滴的高性能ID生成器(Tinyid)
  2. Xamarin iOS教程之警告视图
  3. Mac连接远程Linux管理文件(samba)
  4. [html] 制作页面时,前端如何适应各种异形屏?
  5. 【2017百度之星程序设计大赛 - 资格赛】 度度熊与邪恶大魔王
  6. 计算机管理器win8,Win8如何快速打开资源管理器,Win8快速打开计算机操作方法
  7. ORM Model查询页生成
  8. java随机取数组_java基础自动数组(获取随机数组的最大数和最小数)
  9. Linux系统Bash(Shell)基础知识(4)
  10. enfp工具箱怎么用_小丸工具箱使用技巧详细图解,值得各位学习
  11. 模式识别学习笔记——1(线性分类器)
  12. 1194: 总成绩排序(结构体专题)
  13. Excel之DateDif函数
  14. yum 源没有php7.0,yum安装最新版php7的操作方法
  15. Dell OptiPlex 7040拆机组装全记录
  16. 英雄杀小程序微信区分服务器吗,英雄杀小程序
  17. ES学习笔记十-数据建模
  18. ​力扣解法汇总648-单词替换
  19. 开发者 发展 8 效率 web服务 如何学习
  20. weblogic 12c忘记密码

热门文章

  1. 高8kB计算机组成原理,计算机组成原理习题及答案
  2. IS-IS和OSPFv2对比
  3. PIM SSM技术原理与实验
  4. PIM SM建立SPT树过程与实验
  5. 华三 h3c vrrp和监视端口配置
  6. 正向混合云和反向混合云解析
  7. 金融冬天 IT产业如何应对危险与机遇
  8. MySQL配置优化选项
  9. 大学生计算机应用论文,大学生计算机应用论文(共1178字).doc
  10. linux 档案类型s,深入了解Linuxs归档和压缩命令 | MOS86