转自:http://software.intel.com/en-us/forums/topic/276989

TITLE: Front End Bound

ISSUE_NAME: Frontend

DESCRIPTION:

This category reflects slots where the Frontend of the processor undersupplies its Backend. Frontend denotes the first portion of pipeline where the branch predictor predicts the next address to fetch, cache-lines are fetched, parsed into instructions, and decoded into micro-ops that can be executed later by the Backend. The purpose of the Frontend cluster is to deliver uops to Backend whenever the latter can accept them. The IDQ (decoded uops queue) queues the uops delivered by the Frontend to the Backend. An example of stalls that should be counted in the Frontend bound bucket are stalls due to instruction-cache misses.

To calculate this bucket, we use a newly designated counter for non-delivered uops (stalled allocation pipeline slots) when such uops could otherwise have been accepted; that is, when there was no Backend Stall:

IDQ_UOPS_NOT_DELIVERED.CORE / (4*CPU_CLK_UNHALTED.THREAD)

It should be noted that the qualification with no Backend stall is very important here, as it lets us correctly distinguish slots when the Frontend was the limiter. Furthermore, accounting at slot granularity is important as it enables us to catch slight inefficiencies where a non-optimal number of uops were delivered in a cycle. IDQ_UOPS_NOT_DELIVERED.CORE is a notable counter introduced for SandyBridge which was defined with Top Down mindset. Prior to SandyBridge it was difficult to get an accurate estimate of Frontend penalties, especially in client workloads that typically do not suffer from long latency stalls like icache or iTLB misses, but may suffer from issues like instruction decoding bandwidth that have smaller penalties and manifest often by less than the optimal delivery of 4 uops/cycle (1, 2 or 3 uops). Such scenarios were traditionally considered good as some allocation did occur, hence underestimating Frontend issues.

When HyperThreading (HT) is enabled, the allocation alternates between the two threads. IDQ_UOPS_NOT_DELIVERED.CORE is designed such that it accounts just for the thread currently allocating. This provides accurate allocation attribution of allocation slots hence enabling the Top Level breakdown for HT.

RELEVANCE:

This metric can be used to determine at a high level CPU is bound due to front end issues.

EXAMPLE:

I-cache misses, iTLB misses, Frontend penalties after miss-prediction clears, LCP stalls, DSB to MITE switches, Decoders inefficiency and various other front end issues can cause this to be high.

SOLUTION:

Drill down into the lower level front end metrics to find the specific performance issue.

RELATED_SOURCES:

NOTES:

This metric is measured by specifically counting instances where the backend of the machine is requesting uops and the front end is unable to fill all pipeline slots.

EQUATION: IDQ_UOPS_NOT_DELIVERED.CORE / (4*CPU_CLK_UNHALTED.THREAD)

Front End Bound AND Back End Bound相关推荐

  1. CPU-bound(计算密集型) 和I/O bound(I/O密集型)

    2019独角兽企业重金招聘Python工程师标准>>> IO bound 指的是系统的CPU效能相对硬盘/内存的效能要好很多,此时,系统运作,大部分的状况是 CPU 在等 I/O ( ...

  2. mybatis一个怪异的问题: Invalid bound statement not found 作者及来源: babyblue - 博客园 收藏到→_→: 摘要: mybatis一个怪异

    mybatis一个怪异的问题: Invalid bound statement not found 作者及来源: babyblue - 博客园    收藏到→_→: 摘要: mybatis一个怪异的问 ...

  3. Chernoff Bound

    引文 中心不等式(Concentration Inequality)是分析随机算法的经典工具,在机器学习算法的理论分析中也用的特别多.为了 学习这方面的知识,刚开始我选择的是Massart和Lugos ...

  4. 解读Android之Service(2)Bound Service

    本文翻译自Android官方文档,经过本人测试整理如下. 这是service的第二部分bound service.若第一部分没看的,请参考:解读Android之Service(1)基础知识 . bou ...

  5. 常见.Net 英文专业词汇收集

    转自:http://www.ninedns.com/asp.net/2007102514949.html abstract class    抽象类 accelerator    快捷键 accele ...

  6. react 生命挂钩_角生命周期挂钩:ngOnChanges,ngOnInit等

    react 生命挂钩 为什么我们需要生命周期挂钩? (Why do we need lifecycle hooks?) Modern front-end frameworks move the app ...

  7. linux c语言lzma,LZMA 算法简介

    The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compressi ...

  8. Linux内核性能剖析的方法学和主要工具

    计算机科学的先驱Donald Knuth(高德纳)曾经说过:"过早的优化是万恶之源",更详细的原文如下:"We should forget about small eff ...

  9. 软件开发常用名词中英文对照

    软件开发常用名词中英文对照 词汇 释义 abort 中止 abstract class 抽象类 accelerator 快捷键 accelerator mapping 快捷键映射 accelerato ...

最新文章

  1. linux下mysql修改root密码
  2. 逻辑漏洞之任意用户登陆漏洞
  3. python求1+2+3+....+100的和注意事项_python006(求1-2+3-4+5.....99的所有数的和)
  4. Java架构-每秒上千订单场景下的分布式锁高并发优化实践!
  5. MySQL与Oracle主键冲突解决方式
  6. eclipse java 注释_Eclipse Java注释模板设置详解
  7. 在WPF里面显示DIB图片格式的图片
  8. 银行理财收益复利还是单利?
  9. UnityShader中的Queue
  10. mysql,oracle,sql server数据库默认的端口号,端口号可以为负数吗?
  11. 【洛谷1985】【USACO07OPEN】翻转棋
  12. 如何提炼游戏IP的价值,《梦幻西游三维版》给我们上了一课
  13. 雨林风一键重装系统_u盘启动盘制作工具-雨林风
  14. 我的个人博客网站是怎么制作的?
  15. MPC-HC视频播放器
  16. 接连倒闭失联的背后 传统健身房生意为什么突然就不行了?
  17. 重学JavaSE —— Map、Set、Iterator(迭代器) 简单笔记
  18. Type-C潮流下 如何衡量一款数据线好坏?
  19. oracle ORA-12543
  20. 核心函数--少儿编程

热门文章

  1. thinker 库开发的GUI程序-利用Pandas进行excel文档数据的读取和数据比对
  2. 加油吧,所有登山的人
  3. 程序猿版 老板让明天带条鱼大家观察
  4. 数据可视化之finebi和tableau电力系统分析实现对比
  5. SQL注入是什么,怎么防止SQL注入?
  6. SSR服务器端渲染(Next.js总结和豆瓣电影项目)
  7. 基于单片机的频率计设计
  8. Java查询ES (elasticsearch) 对短句进行关键词摘要 并分词查询
  9. Python_数据分析_pandas_04缺失值处理
  10. 非极大值抑制(nonMaximumSuppression)