原文:http://aras-p.info/texts/D3D9GPUHacks.html

If you know more hacks or more details, please let me know in the comments!

Most hacks are exposed as custom ("FOURCC") formats. So to check for that, you do CheckDeviceFormat. Here's the list (Usage column codes: DS=DepthStencil, RT=RenderTarget; Resource column codes: tex=texture, surf=surface). More green = more hardware support.

Format Usage Resource Description NVIDIA GeForce AMD Radeon Intel
Shadow mapping
D3DFMT_D16 DS tex Sample depth buffer directly as shadow map. 2001 (GF3) 2006 (HD2xxx) 2006 (965)
D3DFMT_D24X8 DS tex 2001 (GF3) 2006 (HD2xxx) 2006 (965)
Depth Buffer As Texture
DF16 DS tex Read depth buffer as texture.   2002 (9500) 2008 (G45)
DF24 DS tex   2005 (X1300) 2011 (Gen6)
INTZ DS tex 2006 (GF8) 2008 (HD4xxx) 2008 (G45)
RAWZ DS tex Only GF6&7    
Anti-Aliasing related
RESZ RT surf Resolve MSAA'd depth stencil surface into non-MSAA'd depth texture.   2008 (HD4xxx) 2008 (G45)
ATOC 0 surf Transparency anti-aliasing. 2005 (GF7)   2011 (Gen6)
SSAA 0 surf 2005 (GF7)    
All AMD DX9+ hardware   2002 (9500)  
n/a     Coverage Sampled Anti-Aliasing[5] 2006 (GF8)    
Texturing
ATI1 0 tex ATI1n & ATI2n texture compression formats. 2006 (GF8) 2005 (X1300) 2008 (G45)
ATI2 0 tex 2004 (GF6) 2002 (9500) 2008 (G45)
DF24 DS tex Fetch 4: when sampling 1 channel texture, return four touched texel values[1]. Check for DF24 support.   2005 (X1300) 2011 (Gen6)
Misc
NULL RT surf Dummy render target surface that does not consume video memory. 2004 (GF6) 2008 (HD4xxx) 2010 (Gen5)
NVDB 0 surf Depth Bounds Test. 2004 (GF6)    
R2VB 0 surf Render into vertex buffer. Only GF6&7 2002 (9500)  
INST 0 surf Geometry Instancing on pre-SM3.0 hardware.   2002 (9500)  

Native Shadow Mapping

Native support for shadow map sampling & filtering was introduced ages ago (GeForce 3) by NVIDIA. Turns out AMD also implemented the same feature for DX10 level cards. Intel also supports it on Intel 965 (aka GMA X3100, the shader model 3 card) and later (G45/X4500/HD) cards.

The usage is quite simple; just create a texture with regular depth/stencil format and render into it. When reading from the texture, one extra component in texture coordinates will be the depth to compare with. Compared & filtered result will be returned.

Also useful:

  • Creating ‘NULL’ color surface to keep D3D runtime happy and save on video memory.

Depth Buffer as Texture

For some rendering schemes (anything with “deferred”) or some effects (SSAO, depth of field, volumetric fog, …) having access to a depth buffer is needed. If native depth buffer can be read as a texture, this saves both memory and a rendering pass or extra output for MRTs.

Depending on hardware, this can be achieved via INTZ, RAWZ, DF16 or DF24 formats:

  • INTZ is for recent (DX10+) hardware. With recent drivers, all three major IHVs expose this. According to AMD [1], it also allows using stencil buffer while rendering. Also allows reading from depth texture while it’s still being used for depth testing (but not depth writing). Looks like this applies to NV & Intel parts as well.
  • RAWZ is for GeForce 6 & 7 series only. Depth is specially encoded into four channels of returned value.
  • DF16 and DF24 is for AMD and Intel cards, including older cards that don’t support INTZ. Unlike INTZ, this does not allow using depth buffer or using the surface for both sampling & depth testing at the same time.

Also useful when using depth textures:

  • Creating NULL color surface to keep D3D runtime happy and save on video memory.
  • RESZ allows resolving multisampled depth surfaces into non-multisampled depth textures (result will be sample zero for each pixel).

Caveats:

  • Using INTZ for both depth/stencil testing and sampling at the same time seems to have performance problems on AMD cards (checked Radeon HD 3xxx to 5xxx, Catalyst 9.10 to 10.5). A workaround is to render to INTZ depth/stencil first, then use RESZ to “blit” it into another surface. Then do sampling from one surface, and depth testing on another.

Depth Bounds Test

Direct equivalent of GL_EXT_depth_bounds_test OpenGL extension. See [3] for more information.

Transparency Anti-Aliasing

NVIDIA exposes two controls: transparency multisampling (ATOC) and transparency supersampling (SSAA) [4]. The whitepaper does not explicitly say it, but in order for ATOC render state (D3DRS_ADAPTIVETESS_Y set to ATOC) to actually work, D3DRS_ALPHATESTENABLE state must be also set to TRUE.

AMD says that all Radeons since 9500 support “alpha to coverage” [1].

Intel supports ATOC (same as NVIDIA) with SandyBridge (GMA HD 2000/3000) GPUs.

Render Into Vertex Buffer

Similar to “stream out” or “memexport” in other APIs/platforms. See [2] for more information. Apparently some NVIDIA GPUs (or drivers?) support this as well.

Geometry Instancing

Instancing is supported on all Shader Model 3.0 hardware by Direct3D 9.0c, so there’s no extra hacks necessary there. AMD has exposed a capability to enable instancing on their Shader Model 2.0 hardware as well. Check for “INST” support, and do dev->SetRenderState (D3DRS_POINTSIZE, kFourccINST); at startup to enable instancing.

I can’t find any document on instancing from AMD now. Other references: [6] and [7].

ATI1n & ATI2n Compressed Texture Formats

Compressed texture formats:

  • ATI1n, also called 3Dc+, or BC4 in DirectX 10 and later. This is single channel, 4 bits per pixel; basically DXT5/BC3 alpha block.
  • ATI2n, also called 3Dc, and almost BC5 (see below) in DirectX 10 and later. This is two channel, 8 bits per pixel; basically two DXT5/BC3 alpha blocks right after each other.

Since they are more or less just DX10 formats, support is quite widespread, with NVIDIA exposing it a while ago and Intel exposing it recently (drivers 15.17 or higher, since 2011 or so).

“Almost BC5” part: ATI2n/3Dc has the red & green channels swapped compared to BC5. This is seemingly not clearly documented anywhere, but ends up working like that. ATI Compressonator source code seems to agree (for ATI2N format, it puts X channel data after Y), even if the header comment says that BC5 is identical to ATI2N :)

Compression tools like Compressonator have something called “A2XY” (CMP_FORMAT_ATI2N_XY there), which actually matches BC5 layout. However, neither NVIDIA nor AMD drivers (as of mid-2016) expose this FOURCC format at runtime. So if you want your DX9 runtime to match what DX11/GL/Metal is doing with BC5, you’ll have to use ATI2n format and swizzle the texture data yourself at upload time (for each 16 bytes, swap the 8-byte parts).

Caveat: when DX9 allocates the mip chain, they check if the format is a known compressed format and allocate the appropriate space for the smallest mip levels. For example, a 1x1 DXT1 compressed level actually takes up 8 bytes, as the block size is fixed at 4x4 texels. This is true for all block compressed formats. Now when using the hacked formats DX9 doesn’t know it’s a block compression format and will only allocate the number of bytes the mip would have taken, if it weren’t compressed. For example a 1x1 ATI1n format will only have 1 byte allocated. What you need to do is to stop the mip chain before the size of the either dimension shrinks below the block dimensions otherwise you risk having memory corruption.

Another thing to keep in mind: on Vista+ (WDDM) driver model, textures in these formats will still consume application address space. Most regular textures like DXT5 don’t take up additional address space in WDDM (see here). For some reason ATI1n and ATI2n textures on D3D9 are deemed lockable.

References

All this information gathered mostly from:

  1. Advanced DX9 Capabilities for ATI Radeon Cards (pdf)
  2. ATI R2VB Programming (pdf)
  3. NVIDIA GPU Programming Guide (pdf)
  4. NVIDIA Transparency AA
  5. NVIDIA Coverage Sampled AA
  6. Humus' Instancing Demo
  7. Arseny's article on particles

D3D9 GPU Hacks相关推荐

  1. ID Tech 5 中 Megatexturequot;针对地形的D3D9 基本实现原理

    http://blog.csdn.net/BoYueJiang/article/details/8908373 ID Tech 5 中"Megatexture"针对地形的D3D9  ...

  2. sdl2 opengl d3d9的mipmap和各项异性过滤渲染

    sdl支持的驱动 列出后可以根据自己的需求去做,如果没有驱动就使用software去做就好了,以下列出对比,视频源为1280 720 的摄像头,使用RGB24来测试 opengl CPU占用率在1.3 ...

  3. ID Tech 5 中Megatexture针对地形的D3D9 基本实现原理

    ID Tech 5 中"Megatexture"针对地形的D3D9 基本实现原理 ID Tech 5 中"Megatexture"针对地形的D3D9 基本实现原 ...

  4. ID Tech 5 中quot;Megatexturequot;针对地形的D3D9 基本实现原理

    ID Tech 5 中"Megatexture"针对地形的D3D9 基本实现原理 姚勇 H3D 2007-8 本文对ID SOFTWARE 使用的"megatexture ...

  5. d3d9查询(Queries Direct3d9)

    D3D9查询(Queries Direct3d9) 来源:d3d9帮助文档 Queries(Direct3d9) 翻译:游蓝海(http://blog.csdn.net/you_lan_hai) 说明 ...

  6. D3D9学习笔记(三) Device

    2.1架构 2.2类型 基本的device类型:hal:硬件渲染(发布).这也不是直接访问硬件驱动,而是访问在上一层的hal.(在顶点处理过程中,如果硬件处理失败,可尝试混合处理及纯软处理,设定标志D ...

  7. D3D9 Shader实例教程

    啥是Shader? Shader是一段运行在GPU上的小程序,是运行在GPU上的Pipeline上的特定的可编程单元的小程序. 从D3D9 API层面学习Shader编程 随着Unity3D的流行,很 ...

  8. 我们如何方便判断我们当前电脑显卡是否支持GPU硬件加速(硬解码),图解DXVA Checker详细使用方法介绍

    这里推荐一个小工具--DXVA Checker(官网下载地址) 这个免费的小软件可以检测GPU支持的DXVA硬解标准,其中SD代表DVD级别的标准清晰度,HD代表1280x720级别的高清晰度,FHD ...

  9. 开源gpu_Linux游戏销售,AMD GPU开源驱动程序以及更多开源游戏新闻

    开源gpu 您好,开放游戏迷! 在本周的版本中,我们来看看两个游戏销售,AMD GPU开源驱动程序,新的开源游戏硬件等等! 2015年11月21日至27日开放游戏摘要 野性互动和Steam销售 Fer ...

最新文章

  1. java 规范异常的处理_规范-异常处理
  2. 分布式事务SEATA的AT模式的简单使用
  3. netty系列之:在netty中使用protobuf协议
  4. salt 启动mysql_saltsack自动化配置day03:服务部署mysql部署
  5. Android实现点击两次返回键退出
  6. indesign使用教程,如何将颜色保存为色板?
  7. MySQL分页查询优化
  8. 黑马程序员——面向对象篇之多态
  9. 如何做伪原创视频 视频md5修改器吾爱
  10. 斐波那契数列Java
  11. 苹果手机投影_家用无线投影解决方案
  12. Python中while循环的基本用法
  13. 【饭谈】软件测试薪资层次和分段(修仙)
  14. 超级好用的Caps Lock大小写锁定提示及使用配置
  15. 1990年图灵奖--费尔南多·考巴脱简介
  16. ubuntu 开机进不去桌面问题
  17. 苹果itunes下载_苹果正在杀死iTunes,但不是在Windows上
  18. 【UEFI实战】EDK编译和使用(更新版)
  19. 【OBS】OBS Studio 的安装、参数设置和录屏、摄像头使用教程
  20. 信号的宽带和计算机网络的宽带有什么不同,计算机网络(一)带宽理解

热门文章

  1. 三大设计思路,让你的作品更有吸引力
  2. 国家政策去oracle,oracle的lifetime support政策
  3. CSS常用样式属性(下)
  4. 数据库复习2. Relational Algebra 关系代数
  5. appium+夜神模拟器操作微信小程序,多个模拟器要结合yaml配置文件来并发控制,一万多行代码[建议收藏]
  6. 分布式系统中 Unique ID 的生成方法
  7. mysql 录入时间_Mysql录入时间不符
  8. Unity 性能优化一:性能标准、常用工具
  9. 新年贺词新年祝福短信
  10. 艾级计算机面临的挑战