股票的预期收益率计算

A. Introduction

A.介绍

On this article we explain one of the three novel contributions behind the paper High-performance stock index trading using neural networks and trees, which achieves very strong results applied to major stock market indices such as SP500 and Dow Jones. The paper introduces a trading strategy that utilizes the distribution of predicted returns in order to make better and more informed trading decisions.

在本文中,我们介绍的论文背后的三种新的贡献之一高性能股指交易采用神经网络与树 ”,它实现了非常强劲的业绩应用于主要股市指数,如标准普尔500指数和道琼斯。 本文介绍了一种交易策略,该策略利用预测收益分布来制定更好,更明智的交易决策。

The purpose of this article is to describe the spirit of the trading strategy. For the actual implementation, various choices as well as the strong results we refer the curious reader to the paper. Moreover, given the financial implications linked with the paper code is unfortunately not available. However, the process is straightforward to replicate but what is even more important is to understand the idea behind it.

本文的目的是描述交易策略的精神。 对于实际的实现,各种各样的选择以及出色的结果,我们请好奇的读者阅读本文。 而且,不幸的是,鉴于没有与书面法规相关的财务影响。 但是,该过程很容易复制,但更重要的是了解其背后的想法。

We begin by stating the motivation of the problem, next describe the distribution of predicted returns, a key principle for our trading strategy.

我们首先说明问题的动机,然后描述预期收益的分布,这是我们交易策略的关键原则。

B. Problem Motivation

B.问题动机

Let’s first consider the problem setup. Assume we have a prediction model f, that could be any model from linear regression to deep neural networks, which given input data X produces a prediction y_hat, i.e. f(X) = y_hat . Common practice and wisdom is to act solely on that prediction. For example, if we are trying to predict the closing price (either daily, monthly or any other time internal) of SP500 (assume current price is $1300) and use any finely tuned model to do so by getting a prediction of $1350 or 3.86%, a rational investor would buy the underlying instrument (for example an ETF). If the current price of SP500 is $1400, the investor would sell short (return = - 3.57%). Hence, the investor acts solely on the information of that prediction.

让我们首先考虑问题的设置。 假设我们有一个预测模型f ,它可以是从线性回归到深度神经网络的任何模型,给定输入数据X可以产生预测y_hat,即f (X)= y_hat。 常见的做法和智慧是仅根据该预测采取行动。 例如,如果我们试图预测SP500的收盘价(每日,每月或内部任何其他时间)(假设当前价格为1300美元),并使用任何经过微调的模型,则可以得出1350美元或3.86%的预测值,那么理性的投资者就会购买基础工具(例如ETF)。 如果SP500的当前价格为1400美元,则投资者将卖空(收益=-3.57%)。 因此,投资者仅根据该预测的信息行事。

Imagine now two other scenarios: current price is $1300 and model predicts $1310 (0.769%) or $1400 (7.69%). By the same principle, the investor would buy probably without much consideration. In all scenarios the predicted return is indeed positive, implying an increase, however, how confident are we in each of our model’s prediction? i.e. how often in our training data, (assuming daily prediction) we had returns similar to 0.769% and how many to 7.69%, in order for our model to learn? Clearly for daily prediction smaller changes occur far more often.

想象一下另外两个场景:当前价格为1300美元,模型预测为$ 1310(0.769%)或$ 1400(7.69%)。 按照相同的原则,投资者可能会考虑很多而没有考虑购买。 在所有情况下,预测的回报确实为正,意味着增加了,但是,我们对每个模型的预测有多大信心? 也就是说,为了使我们的模型能够学习,在我们的训练数据中(假设每天进行预测),我们有多少次获得类似于0.769%的回报,有多少达到7.69%? 显然,对于每日预测而言,较小的变化发生的频率更高。

The most common way for training and also judging how good a model is, relies on the use of some error function (such as mean absolute or squared error). This is definitely not the key here since, quite a lot of information is suppressed by averaging. By treating all three predictions the same we are unilaterally assuming that our model possesses the same “ability” in each scenario. This leads us to the question of how “successful” (this needs to be defined either using high returns, or low standard deviation or the ratio of two (loosely speaking, Sharpe Ratio)) is the model each time it predicts returns of 0.7%, 3.87% or 7%? In order to answer such question we resort to the historical distribution of predicted returns which contains significant information about our model’s ability. The objective is to find where our model does the best and where the worst and act accordingly.

训练和判断模型的最佳状态的最常见方法是使用某些误差函数(例如平均绝对误差或平方误差)。 这绝对不是关键,因为平均会抑制很多信息。 通过将所有三个预测相同对待,我们单方面假设我们的模型在每种情况下都具有相同的“能力”。 这就引出了一个问题:每次预测收益为0.7%时,模型是如何“成功”的(需要使用高收益或低标准偏差或两者之比来定义)。 ,3.87%或7%? 为了回答这个问题,我们求助于预期收益的历史分布,其中包含有关我们模型能力的重要信息。 目的是找到我们的模型在哪里做得最好,在哪里做得最坏,然后采取相应的行动。

C. Distribution of Predicted Returns

C.预期收益的分配

This is nothing more than the collection of all the available predicted returns for as many time steps as possible. The next step is to find ways to extract the required information. One way to do so is to “slice” the distribution into bins (e.g. deciles, quintiles etc..) and consider the “success” of each bin separately. For the moment, assume the following histogram (Fig 1) represents predicted returns on a daily frequency (randomly generated 1000 draws using standard normal distribution Z(0,2)). Let’s “slice” the distribution into 10 bins or deciles. This effectively bundles together returns within specific ranges.

这无非是尽可能多地收集所有可用的预期回报。 下一步是找到提取所需信息的方法。 这样做的一种方法是将分布“切片”到箱中 (例如,十分位数,五分位数等。) 并分别考虑每个垃圾箱的“成功”。 目前,假设以下直方图(图1)表示每日频率上的预期收益(使用标准正态分布Z(0,2)随机生成1000次抽奖)。 让我们将分布“切片”成10个bin或decil。 这可以有效地将特定范围内的收益捆绑在一起。

For this example the ranges are as follows (Table 1)

对于此示例,范围如下(表1)

Table 1. Bins of the predicted return distribution along with the count number.
表1.预期收益分布的仓位以及计数数量。

For example, if predicted return is -1.57% then it belongs to bin 5 and if the predicted return is 3% then it belongs to bin 8.

例如,如果预测收益为-1.57%,则它属于等级5;如果预测收益为3%,则它属于等级8。

OK, so what is the big deal?

好,那有什么大不了的?

Well, instead of computing statistics on the entire distribution and losing information by averaging, we now have 10 bins on which we can compute statistics and actually learn useful information.

好了,我们现在没有10个bin,可以在其中计算统计数据并实际学习有用的信息,而不必计算整个分布的统计信息并通过平均来丢失信息。

To begin with, we can actually see that a high prediction as 7% occurred only twice (in our sample data) out of 1000 predictions. Hence how confident can we be when the model predicts a 7% increase that it’s actually going to be correct vs 0.79% which actually occur more often? This implies that our model has been exposed to far more observations in bin 6 than in bin 10. Or what if two bins are really “successful” (for example in terms of accuracy or profitability) and three are not? In that case you would like to avoid the bad bins and utilize to the maximum extent the good ones (for example whenever the model predicts a return that falls on the bad bins do nothing and when it falls on the good bins “go all in”). However, by treating all predictions the same, such useful information is lost and this is what will ultimately hurt overall performance.

首先,我们实际上可以看到,在1000个预测中,只有两次(在我们的样本数据中)出现了7%的高预测。 因此,当模型预测将实际增加7%,而实际发生频率更高的0.79%时,我们将有多大的信心呢? 这意味着我们的模型在6号仓中比在10号仓中要承受更多的观察。或者,如果两个仓真正“成功”(例如,就准确性或获利性而言),而三个仓却没有,该怎么办? 在那种情况下,您想避免坏的回收箱,并最大程度地利用好回收箱(例如,当模型预测落入坏回收箱的收益时,什么也不做;当它落在好的回收箱上时,“无所作为”) )。 但是,通过对所有预测进行相同的处理,这些有用的信息将丢失,而这最终将损害整体性能。

The important takeaway is to note that we have effectively divided our prediction model into ten different models, one whose predictions falls in bin 1, a second one which falls in bin 2 and so on. Then by computing any metric for each bin we are harnessing more information versus treating all prediction the same.

重要的要点是要注意,我们已将预测模型有效地划分为十个不同的模型,一个模型的预测属于第1仓,第二个模型属于第2仓,依此类推。 然后,通过为每个bin计算任何指标,我们将利用更多的信息,而不是将所有预测都相同。

D. How does it actually work?

D.它实际上如何工作?

The process is simple and as follows

该过程很简单,如下

  1. Slice the distribution of predicted returns into bins

    将预测收益的分布切成小块

2. Calculate metric of “success” (e.g. profitability) of each bin

2. 计算每个垃圾箱的“成功”(例如获利能力)度量

3. Locate which bin corresponds to the new prediction

3. 找到哪个仓对应于新的预测

4. If the bin is “successful” then act/trade, otherwise do nothing.

4. 如果垃圾箱“成功”,则采取行动/交易,否则不做任何事。

In order to apply the above procedure we require some initial “burn-in” data. When we start applying our methodology for our first prediction, we only have one value so not really a distribution. As we get more and more predictions, our distribution starts to take shape. How long we should we wait until we have a proper distribution depends on the prediction task i.e. daily vs monthly vs weekly, among other factors. Moreover, there is no assumption on having fixed cut-off points (i.e. using deciles) this was the most reasonable action to begin. Future research should look into variable points.

为了应用上述过程,我们需要一些初始的“老化”数据。 当我们开始将方法论用于我们的第一个预测时,我们只有一个值,因此实际上不是分布。 随着越来越多的预测,我们的分布开始成形。 我们应该等多久才能获得正确的分配取决于预测任务,即每日,每月,每周以及其他因素。 此外,没有关于设定固定的分界点(即使用十分位数)的假设,这是最合理的开始。 未来的研究应探讨可变点。

Example Walk Through

示例演练

Let’s walk through a made-up example. Assume we are at time t and we want to compute our next day return prediction. Since we are going to use the same data as before, step (1) has already been done (Table 1). Once we have our distribution it is pretty straightforward to calculate any q-tile. For step (2) we define our metric of success to be the profitability as measured by cumulative returns.

让我们来看一个虚构的例子。 假设我们在时间t ,我们要计算第二天的收益预测。 由于我们将使用与以前相同的数据,因此步骤(1)已经完成(表1)。 一旦有了分布,就可以很容易地计算出任何q-tile 。 对于步骤(2),我们将成功的指标定义为通过累计收益衡量的获利能力。

How do we compute it for each bin? A new prediction comes up, we find which bin it belongs and gets a corresponding bin. We repeat for each of those 1000 predicted returns, as they become available, and assign them to one bin. An example is given at Table 2. Once we compute the predicted return 501, we use the previous 500 observations of predicted returns to compute the deciles and assign a bin to the new observation. Next day we record the realized return. Hopefully the realized return aligns with our predicted return, at least in direction!

我们如何计算每个垃圾箱? 一个新的预测出现了,我们找到它属于哪个箱并得到一个对应的箱。 我们对这1000个可用的预测回报中的每一个进行重复,然后将它们分配给一个bin。 表2给出了一个示例。一旦我们计算了预期收益501,我们就可以使用先前的500个预测收益观察值来计算十分位数,并为新观测值分配一个bin。 第二天,我们记录已实现的回报。 希望已实现的回报至少在方向上与我们的预期回报保持一致!

r_hat is the predicted return and r_hat是预期收益, r is the realized return.r是实现收益。

So now we have for each bin a return associated with and simply compute the cumulative returns for each bin. This is equivalent to saying we are treating all bins as separate (ten) portfolios. For example, after observation 503, for bin = 5 we have two returns (and any returns associated with bin 5 in the first 500 observations). If bin 5 has 73 returns, then we compute the cumulative return of that bin based on those 73 returns. As we have already mentioned, any other measure of “success” could be used, i.e. median/mean of returns or standard deviation or Sharpe Ratio.

因此,现在我们为每个仓都有一个与之关联的收益,并简单地计算每个仓的累计收益。 这相当于说我们将所有垃圾箱都视为单独的(十个)投资组合。 例如,在观测值503之后,对于bin = 5,我们有两个收益率(以及在前500个观测值中与收益率5相关的任何收益率)。 如果仓位5有73个退货,则我们将基于这73个退货来计算该仓位的累计退货。 正如我们已经提到的,可以使用“成功”的任何其他度量,即收益的中位数/平均值或标准差或夏普比率。

Why is this important? By measuring the success of each bin we evaluate how good our model is with respect to that bin. We decompose our overall ability and get a more fine-grained view by inspecting the distribution.

为什么这很重要? 通过衡量每个垃圾箱的成功,我们可以评估模型相对于该垃圾箱的质量。 我们通过检查分布来分解我们的整体能力并获得更细粒度的视图。

Back to our example, following the above process we calculate the profitability of each bin given at Table 3.

回到我们的示例,按照上述过程,我们计算表3中给出的每个仓位的获利能力。

Table 3. Profitability for each bin
表3.每个容器的盈利能力

After steps (1) & (2) we have the profitability of each bin. For step (3) we need to simply locate the bin of our new prediction. Assume that our next day return prediction is 1.83%, which belongs to bin 7 and for step (4) we see how profitable that bin is, 5%.

经过步骤(1)和(2),我们可以获得每个仓位的获利能力。 对于步骤(3),我们需要简单地定位新预测的bin。 假设我们的第二天收益预测是1.83%,属于第7类,对于第(4)步,我们看到该类有5%的盈利能力。

To summarize: our predicted return is 1.83% which belongs to bin 7 (1.59% < 1.83% <= 2.94%) with the profitability of that bin being 5%, which triggers a buy signal. Had the bin been 6 we would not trade given the negative return.

总结:我们的预测回报率为1.83%,属于第7类(1.59%<1.83%<= 2.94%),该类的获利率为5%,这会触发购买信号。 如果仓位为6,鉴于负回报,我们将不进行交易。

Conclusion

结论

I hope the above are sufficient to understand the basic principle of why someone should use the (historical) distribution of predicted returns. That being said, there are a lot of moving parts that required further research. For example, currently the bins are utilized the historical data it would be nice to have some predictive ability for each bin, other ways to slice the distribution (why prefix cut-off points?) or other measure success for each bin.

我希望以上内容足以理解为什么有人应该使用预测收益的(历史)分布的基本原理。 话虽如此,有许多运动部件需要进一步研究。 例如,当前利用仓来的历史数据,对于每个仓具有某种预测能力,以其他方式对分布进行切片(为什么要使用前缀截止点?)或对仓进行其他度量成功将是很好的。

翻译自: https://medium.com/swlh/stock-trading-strategy-utilizing-the-distribution-of-predicted-returns-ff5051f13bec

股票的预期收益率计算


http://www.taodudu.cc/news/show-3358767.html

相关文章:

  • sql中将字段转成Decimal
  • python长数据转换成宽数据_python – 长/宽数据到宽/长
  • MySQL数据类型--decimal
  • MySQL数据类型DECIMAL用法
  • HTML中行级标签与块级标签都有哪些?
  • JSTL : 标签库详解
  • HTML常用标签及跳转
  • HTML 标签属性最强集合!
  • 对于thtd标签colspan不起作用的问题
  • html标签元素的代表意思
  • html 基础标签
  • html 学习 常用标签
  • html标签主体分为三个部分,HTML的基本结构与标签的初步了解
  • HTML上_之标签
  • HTML——标签元素索引
  • 常见 html 标签 笔记 (一)
  • 常用的HTML标签
  • HTML标签(一)
  • html 设置最小宽度,总结css中最小宽度min-width和最大宽度max-width属性的使用方法...
  • html页面控制标签,html body标签详解与html常用的控制标记
  • HTML基础——标签
  • html中空四格的标签,04-HTML常用标签
  • html中ing标签怎么写,HTML标签
  • html th有两行,HTML th nowrap 属性
  • 在html中创建最小的文本标签是,HTML里面的文本标签
  • 软件测试和java开发哪个前景好_Java开发,软件测试哪个更好,发展前景更大?...
  • 【美团面经】测试开发一面
  • 测试开发的进阶之路
  • app链接分享到微信,ios版本的微信浏览器打不开
  • application.xml加载多个properties 文件 报错 “Could not resolve placeholder ‘xxx‘ in string value “${xxx}“

股票的预期收益率计算_利用预期收益分配的股票交易策略相关推荐

  1. python输入数组并计算_利用Python进行数据分析——Numpy基础:数组和矢量计算

    利用Python进行数据分析--Numpy基础:数组和矢量计算 ndarry,一个具有矢量运算和复杂广播能力快速节省空间的多维数组 对整组数据进行快速运算的标准数学函数,无需for-loop 用于读写 ...

  2. 阶乘之和计算_利用MULTINOMIAL函数计算参数和的阶乘与各参数阶乘乘积的比 值

    各位Excel天天学的小伙伴们大家好,欢迎收看Excel天天学出品的excel2019函数公式大全课程.今天我们依旧要学习的是Excel函数中的数学函数MULTINOMIAL函数.今天我们这个例子非常 ...

  3. python获取股票逐笔交易数据_利用python下载股票交易数据

    前段时间玩Python时无意看到了获取股票交易数据的tushare模块,由于自己对股票交易挺有兴趣,加上现在又在做数据挖掘工作,故想先将股票数据下载到数据库中,以便日后分析: # 导入需要用到的模块 ...

  4. java calendar日期计算_利用Java中Calendar计算两个日期之间的天数和周数

    前言 究竟什么是一个 Calendar 呢?中文的翻译就是日历,那我们立刻可以想到我们生活中有阳(公)历.阴(农)历之分.它们的区别在哪呢? 比如有: 月份的定义 - 阳`(公)历 一年12 个月,每 ...

  5. python爬取股票大单历史记录_利用bs4爬取股票的历史交易数据

    听起来,爬取股票的所有历史交易数据跟高大上,有木有? 不过写这个爬虫的时候,发现基于网易财经的股票历史数据的爬取其实挺简单,最后再写到txt文档里(暂时写txt,以后会写csv的.可以在用机器学习干一 ...

  6. 供暖水力计算_利用邻居的计算能力为您的房屋供暖

    供暖水力计算 今年的Blender会议充满了关于Blender和开源技术有趣用途的各种有趣的演讲 . 在Blender Institute的公开电影Cosmos Laundromat的艺术和流水线演示 ...

  7. python稳健性检验_利用Python检验你的策略参数是否过拟合

    过拟合现象 一般来说,量化研究员在优化其交易策略参数时难免会面临这样一个问题:优化过后的策略在样本内表现一般来说均会超过其在样本外的表现,即参数过拟合.对于参数优化来说,由于优化时存在噪音,过拟合是不 ...

  8. 项目练习_利用tushare下载股票行情【多线程】

    项目练习_利用tushare下载股票行情[多线程] tushare简介 代码 tushare简介 tushare是专业的金融数据开源接口,数据齐全(甚至包含宏观数据,石油数据,电影数据等等等),是金融 ...

  9. 利用c语言建立交易系统,【图】手把手教会你构建自己的交易系统 - 4_股票论坛,炒股公式,股票指标,股票公式,选股公式_数据、教程交流论坛_理想论坛 - 股票论坛...

    很多人对交易系统有神秘感,其实神秘感的来源主要是因为不懂什么是交易系统所以好奇导致.任何人都可以根据任何买进卖出信号构建自己的交易系统.但是,一个交易系统需要投资者投入大量的时间和精力.那如何判断一个 ...

最新文章

  1. 自动驾驶车辆在结构化场景中基于HD-Map由粗到精语义定位
  2. redispython源文件_Redis与Python在项目中的交互
  3. [蓝桥杯]算法提高 天天向上(记忆化搜索)
  4. 腾讯offer是什么样子_月薪35K:2020腾讯Java后端开发详细面试流程
  5. 长春学校计算机科学技术学院,长春大学计算机科学技术学院
  6. Hadoop的I/O操作
  7. Traversing Mapping Filtering Folding Reducing
  8. linux下多条命令组合使用
  9. 软件生命周期、面向对象基本概要
  10. python 【moviepy】 音频剪切与拼接
  11. 搭建和配置支撑2000人同时观看的流媒体服务器系统(Linux步骤详解)
  12. 在使用pyplot时报错MatplotlibDeprecationWarning
  13. 计算机代表数字的通用码是什么,数字信息在计算机中的表示及编码.ppt
  14. 阿里云ECS通用型g7云服务器支持vTPM第三代神龙架构性能评测
  15. EWM RF手持设备开发记录
  16. python解码图片_python的opencv读取解码的base64图片失败
  17. 湍流参数计算c语言,常用的湍流模型
  18. 计算机发展历程第四代所用逻辑部件是,第四代计算机采用的逻辑元件是什么
  19. Carla框架分析(三)
  20. 百度云网盘链接用aria2下载

热门文章

  1. Casio普通计算器编程
  2. 更适合小朋友的儿童节礼物,在CTWing找到了
  3. 企业大数据战略规划高级培训课程
  4. BAT中读取文本文件
  5. 苹果手机5s无需越狱免流_他来了,苹果ios稳定微信双开多开!!!!无需越狱。...
  6. Warframe(星际战甲) 大黄脸结合目标位置
  7. 深度学习之---起源
  8. Uniform Buffer
  9. 急躁型人格分析,如何改变急躁性性格?
  10. 全国行政区划数据处理及资源目录树实例化