本文翻译自https://lwn.net/Articles/749900/

Energy-aware scheduling — running a system’s workload in a way that minimizes the amount of energy consumed — has been a topic of active discussion and development for some time; LWN first covered the issue at the beginning of 2012. Many approaches have been tried during the intervening years, but little in the way of generalized energy-aware scheduling work has made it into the mainline. Recently, a new patch set was posted by Dietmar Eggemann that only tries to address one aspect of the problem; perhaps the problem domain has now been simplified enough that this support can finally be merged.
能源感知调度——以最小化能源消耗的方式运行系统的工作负载——已经是一个长期以来积极讨论和发展的话题;LWN在2012年初首次报道了这个问题。在此期间尝试了许多方法,但很少有通用的节能调度工作进入到主线内核中。最近,Dietmar Eggemann发布了一个新的补丁集,只试图解决问题的一个方面;或许问题的领域现在已经被简化到足够的程度,以至于这个支持可以最终合并到主线内核中。
In the end, the scheduler can most effectively reduce power consumption by keeping the system’s CPUs in the lowest possible power states for the longest time — with “sleeping” being the state preferred over all of the others. There is a tradeoff, though, in that users tend to lack appreciation for saved power if their systems are not responsive; any energy-aware scheduling solution must also be aware of throughput and latency concerns. A failure to balance all of these objectives across the wide range of machines that run Linux has been the bane of many patches over the years.
最终,调度器可以通过将系统的CPU尽可能长时间地保持在最低功耗状态(其中"睡眠"状态优于其他状态)来最有效地降低功耗。然而,存在一个权衡,在用户系统没有响应的情况下,用户往往不太关注节省的能量;因此,任何能量感知调度解决方案都必须考虑吞吐量和延迟等因素。多年来,要在运行Linux的各种机器上平衡所有这些目标一直是许多补丁的困难所在。
There have been a number of clever ideas that have been attempted, of course. Small-task packing tries to group small, sporadic processes onto a small number of CPUs to prevent them from waking the others. Other patch sets have used a spreading technique in an attempt to evacuate CPUs with relatively low loads. There has been talk of a separate power scheduler whose job is to run each CPU at the optimal power level for the current workload. The energy cost model created a data structure to track the performance and energy cost of each processor state and used it to inform scheduling decisions. The SchedTune CPU-frequency governor allows some tasks to be designated as “important”, with the less-important ones being relegated to low amounts of CPU power. Some of these ideas have influenced the mainline scheduler but, as a whole, they remain outside.
当然,已经尝试了许多巧妙的想法。小任务打包(Small-task packing)尝试将小型、间歇性的进程分组到少量的CPU上,以防止它们唤醒其他CPU。其他补丁集尝试使用扩散技术来尽量分散负载较低的CPU。还有人提到过一个独立的电源调度器,其任务是使每个CPU以当前工作负载的最佳功率水平运行。能量成本模型创建了一个数据结构,用于跟踪每个处理器状态的性能和能量成本,并用于指导调度决策。SchedTune CPU频率调节器允许将某些任务指定为“重要任务”,而将次要任务限制在较低的CPU功耗水平上。其中一些想法已经对主线调度器产生了影响,但作为整体而言,它们仍然没有被纳入主线内核。
Saving energy is valuable in almost every setting from tiny embedded systems to supercomputer installations. But the pressure tends to be most acutely felt in the area of mobile systems; the less power a device uses, the longer it can run before exhausting its battery. It is thus not surprising that most of the energy-aware scheduling work has been driven by the mobile market. The Android Open Source Project’s kernel includes a version of the energy-aware scheduler patches; those have been shipping on handsets for some time. Scheduling, as a result, is one of the areas where the Android and mainline kernels differ the most.
在几乎所有环境中,从小型嵌入式系统到超级计算机安装,节能都是非常有价值的。但是,在移动系统领域,压力往往最为迫切;设备使用的功耗越低,其电池就能更长时间地持续运行。因此,不足为奇的是,大部分节能调度工作都是由移动市场推动的。Android开源项目的内核包括了一个版本的节能调度器补丁;这些补丁已经在手机上使用了一段时间。因此,调度是Android和主线内核之间差异最大的领域之一。
Eggemann’s patch set is intended to reduce that difference by proposing a simplified version of the Android scheduler. To that end, it only addresses the problem for asymmetric systems — those with CPUs that have varying power characteristics, such as the ARM big.LITTLE processors. Since the “little” processors are much more energy-efficient (but much slower) than the “big” ones, the placement of processes in the system can have a significant effect on both energy consumption and performance. Improving task placement under mainline kernels on big.LITTLE systems is arguably the most urgent problem in the energy-aware scheduling area.
Eggemann的补丁集旨在通过提出简化版的Android调度器来减少这种差异。为此,它只针对非对称系统提出解决方案,即具有具有不同功耗特性的CPU,如ARM的big.LITTLE处理器。由于“小”处理器比“大”处理器更节能(但速度更慢),因此系统中进程的放置对能量消耗和性能都有显著影响。在big.LITTLE系统上改进主线内核下的任务放置可以说是能源感知调度领域中最紧迫的问题。
To get there, the patch set adds a simplified version of the energy-cost model used in the Android scheduler. It is defined entirely with these two structures:
为了实现这一目标,该补丁集添加了一个简化版本的Android调度器中使用的能量成本模型。它完全由以下两个结构定义:
struct capacity_state {
unsigned long cap; /* compute capacity /
unsigned long power; / power consumption at this compute capacity */
};

struct sched_energy_model {
int nr_cap_states;
struct capacity_state *cap_states;
}
The units of both cap and power are not really defined, but they do not need to be as long as they are used consistently across the CPUs of the system. There is one capacity_state structure for each power state of each CPU, so the scheduler can immediately determine what the cost (or benefit) of changing a given CPU’s state would be. Each CPU has a sched_energy_model structure containing the data for all of its available power states.
这两个结构中的cap和power的单位并没有真正定义,但只要在系统的各个CPU上使用一致,它们就不需要被定义。对于每个CPU的每个功耗状态,都有一个capacity_state结构,因此调度器可以立即确定改变给定CPU状态的成本(或收益)是多少。每个CPU都有一个sched_energy_model结构,其中包含其所有可用功耗状态的数据。
This information, as it turns out, is already available in some systems at least, since the thermal subsystem makes use of it to help keep the system from overheating. That is a useful attribute; it means that a scheduler with these patches could be run on existing hardware without the need to provide more information (through device-tree entries, for example).
事实证明,至少在某些系统中,这些信息已经是可用的,因为热管理子系统利用它来帮助防止系统过热。这是一个有用的属性;这意味着具有这些补丁的调度器可以在现有硬件上运行,而无需提供更多信息(例如通过设备树条目)。
The scheduler already performs load tracking, which allows it to estimate how much load each process will put on a CPU when it is run there. That load estimate is used along with the energy model to determine where a task should run when it wakes up. This is done by looking at each CPU in the scheduling domain where the process last ran and determining what the energy cost of placing the process on each CPU would be. Essentially, if the CPU would have to go to a higher power state to run the added load in a timely manner, the cost would be the additional energy needed to sustain that higher state. In the end, the CPU with the lowest added cost is the one the will run the new process.
调度器已经执行负载跟踪,这使得它可以估计每个进程在运行时对CPU的负载量。该负载估计与能量模型一起使用,以确定任务在唤醒时应该在哪个CPU上运行。这是通过查看进程上次运行的调度域中的每个CPU,并确定将进程放置在每个CPU上的能源成本来实现的。基本上,如果CPU必须以及时运行附加负载而转入更高的功耗状态,则成本将是维持该更高状态所需的额外能量。最终,具有最低附加成本的CPU将运行新进程。
The process wakeup path is rather performance-critical, so the above algorithm raises some red flags. Iterating over every CPU in the system (or even just a subset in a given domain) could become quite expensive in a system with a lot of CPUs. This algorithm is only enabled on asymmetric systems, which minimizes that cost because such systems (currently) have a maximum of eight CPUs. Those also are the systems that benefit most from this sort of energy-use calculation. Data-center systems with large numbers of identical CPUs would see little improvement from this approach, so it is not enabled there.
进程唤醒路径在性能上非常关键,因此上述算法引发了一些问题。在系统中遍历每个CPU(甚至只是给定域中的一个子集)可能在具有大量CPU的系统中变得非常昂贵。该算法仅在非对称处理器系统上启用,这样可以将成本最小化,因为这些系统(目前)最多只有八个CPU。这些系统也是最能从这种能量使用计算中受益的系统。具有大量相同CPU的数据中心系统在这种方法下几乎没有改进,因此在那里不启用该功能。
Even on asymmetric systems, though, this algorithm will not help if the system is already running near its capacity; in that case, the CPUs will already be running at a high power point and there is little value to be had from looking at power costs. If the scheduling domain where the process last ran is determined to be “overutilized”, defined as running at 80% of its maximum capacity or higher, then the current wakeup path (which tries to find the most lightly loaded CPU) is used instead.
即使在非对称系统中,如果系统已经接近其容量上限,这个算法也无法提供帮助;在这种情况下,CPU已经运行在高功耗点上,从功耗角度来看很难获得很大的价值。如果确定进程上次运行的调度域被认为是"过度利用"的,即运行在其最大容量的80%或更高水平,那么将使用当前的唤醒路径(试图找到负载最轻的CPU)代替。
Some benchmarks posted with the patch set show some significant energy-use improvements with the patches applied — up to 33% in one case. There is a small cost in throughput (up to about 2% in one test, but usually much lower) that comes with that improvement. That is a cost that most mobile users are likely to be willing to pay for that kind of battery-life improvement.
一些使用该补丁集进行的基准测试显示,在应用了这些补丁后,能源利用方面出现了显著的改善,最高可达到33%的提升。与这种改进相比,吞吐量会有一些小的代价(在一个测试中高达约2%,但通常要低得多)。对于这种电池续航时间的改善,大多数移动用户可能愿意为这种代价付费。
Discussion of the patch set has mostly been focused on implementation details so far, and there has not yet been input from the core scheduler maintainers. So there is no way to really know whether this approach has a better chance of getting over the acceptance hurdle than its predecessors. Given that it is relatively simple and the costs are only paid on systems that benefit from this algorithm, though, one might expect that its chances would be relatively good. Acceptance would not unify the mainline and Android schedulers, but it would be a big step in the right direction.
目前,关于该补丁集的讨论主要集中在实现细节上,并且核心调度器维护者尚未提供意见。因此,我们无法确定这种方法相较于之前的方法是否有更好的机会被接受。然而,考虑到这种方法相对简单,并且只有在从该算法中受益的系统上才需要付出代价,人们可能会期望它的机会相对较好。虽然接受该补丁集不会统一主线和Android调度器,但它将是朝着正确方向迈出的重要一步。

Energy-aware scheduling on asymmetric systems相关推荐

  1. 探究CPU等设备频率、电压、功耗的关系 —— Linux中的OPP_table与energy model

    上次写完IPA文章之后有小伙伴留言说自己的设备上找不到对应的em_perf_domain中频率和电压的对应关系,这里跟大家讨论CPU等设备中会存在频率.电压.功耗三者之间的关系究竟是怎么来的. 什么是 ...

  2. linux内核文档汇集

    链接:https://01.org/linuxgraphics/gfx-docs/drm/ The Linux Kernel documentation This is the top level o ...

  3. 从big.LITTE到DynamIQ

    作者简介 兰新宇,坐标成都的一名软件工程师,从事底层开发多年,对嵌入式,RTOS,Linux和虚拟化技术有一定的了解,有知乎专栏"术道经纬"进行相关技术文章的分享,欢迎大家共同探讨 ...

  4. Linux进程管理专题

    Linux进程管理 (1)进程的诞生介绍了如何表示进程?进程的生命周期.进程的创建等等? Linux支持多种调度器(deadline/realtime/cfs/idle),其中CFS调度器最常见.Li ...

  5. Android 8.0 学习(4)---Android通用内核

    内核 Linux 内核是几乎所有的 Android 设备上极其重要的软件组成部分.本部分介绍了 Linux 内核开发和版本模型(如下).稳定的长期支持 (LTS) 内核(包括所有 Android 设备 ...

  6. 【资源分享】Linux Scheduler

    Linux Scheduler Completions - "wait for completion" barrier APIs CPU Scheduler implementat ...

  7. RK CPU调试技巧

    RK CPU调试技巧 文章目录 RK CPU调试技巧 2.CPU温度 查看CPU工作温度 3.CPU电压 查看CPU工作电压 设置CPU工作电压 查看CPU频率电压表 4.CPU频率 查看CPU频率 ...

  8. Linux内核中最牛逼的温控方案——IPA(一)

    前言 首先请大家思考一个问题,当前移动设备的性能瓶颈究竟是什么呢? 抛砖引玉一下,笔者认为当前移动设备的主要矛盾是有限空间内的散热.续航以及增长的性能需求之间的矛盾.性能需求究竟是什么呢?其实是CPU ...

  9. big.LITTLEDynamIQ

    最近看到了DynamIQ,于是来了解一下什么是DynamIQ? 前言 首先要知道DynamIQ,那么你肯定得知道big.LITTLE.因为DynamIQ可以说是big.LITTLE新一代或者是升级版. ...

最新文章

  1. 手把手教你 MongoDB 的安装与详细使用(二)
  2. CYQ.Data 从入门到放弃ORM系列:开篇:自动化框架编程思维
  3. 【python】队列——用顺序表实现队列操作
  4. SugarCRM 在Html中增加超连接按钮
  5. response的运行过程
  6. 一文解开java中字符串编码的小秘密
  7. 自省:我为什么没有成功--对照35前务必完成的12跳
  8. 趣文:追MM的各种算法
  9. 基于OMAPL138的Linux字符驱动_GPIO驱动AD9833(二)之cdev与read、write
  10. 域服务器无法修改域账户密码,域用户使用Ctrl+Alt+del不能修改密码
  11. turtle库——绘制八边形、八角图形以及叠边形图形
  12. “外卖式”售后服务体验来袭 沃丰科技ServiceGo让售后服务更智能
  13. 利用flex弹性布局实现图片水平及垂直方向居中
  14. 经典算法之直接插入排序法
  15. JavaScript中的escape() 函数
  16. 一些网站...........
  17. CSS十问——好奇心+刨根问底=CSSer
  18. 64位mysql 和32位区别_navicat for mysql 64位和32位区别,win7 64位下用32位和64位有区别吗??...
  19. 云计算助力智慧民航腾“云”而起
  20. 自考总结(运筹学和管经)

热门文章

  1. 《消失的搭车客》pdfmobiepub电子版
  2. 【PNN分类】基于灰狼鹰算法优化pnn神经网络实现数据分附matlab代码
  3. 中国止血海绵市场趋势报告、技术动态创新及市场预测
  4. JAVA开发的校园转转二手市场程序源码
  5. 基于jsp的汽车维修管理系统
  6. 在线Excel转SQL工具
  7. emui9android系统通知,华为emui9.0系统
  8. 英文不懂这些,弄错会很丢脸的
  9. 我,上市公司副总裁,裸辞创业!发现职场最残酷的真相....
  10. access连接机床_如何实现对机床西门子系统操作屏幕的远程监控