[翻译]Inside Geometry Instancing(下)游戏创作交流引擎学习交流游艺网|GAME798|-游戏美术、游戏培训、游戏艺术工厂、游戏开发交流论坛！

暴米花 发表于 2006-12-7 10:32:00

[翻译]Inside Geometry Instancing(下)

Inside Geometry Instancing(下) 此教程版权归我所有，仅供个人学习使用，请勿转载，勿用于任何商业用途。商业应用请同我联系。 由于本人水平有限，难免出错，不清楚的地方请大家以原著为准。也欢迎大家和我多多交流。 其中部分图片来自网络，尽量保证了和原书中插图一致。 特别感谢mtt重现了文章中的流程图^_^ 翻译：clayman Blog：<a href="http://blog.csdn.net/soilwork" target="_blank">http://blog.csdn.net/soilwork</a> clayman_joe@yahoo.com.cn3.3.3 Vertex Constants Instancing 在vertex constants instancing方法中，我们利用顶点常量来储存实体属性。就渲染性能而言，顶点常量批次是非常快的，同时支持实体位置的移动，但这些特点都是以牺牲可控性为代价的。 以下是这种方法主要的限制： *根据常理数值的大小，每批次的实体数量是受限制的；通常对一次方法调用来说，批次中不会超过50到100个实体。但是，这足以满足减少CPU调用绘图函数的负载。 *不支持skinning；顶点常量全部用于储存实体属性了 *需要支持vertex shaders的硬件 首先，需要准备一块静态的顶点缓冲（同样包括索引缓冲）来储存同一几何包的多个副本，每个副本都以模型坐标空间保存，并且对应批次中的一个实体。  <img alt="" src="http://blog.csdn.net/images/blog_csdn_net/soilwork/VBL.jpg" border="0"/> 必须更新最初的顶点格式，为每个顶点添加一个整数索引值。对每个实体来说，这个值将是一个常量，标志了特定几何包属于哪个实体。这和palette skinning有些类似，每个顶点都包含了一个索引，指向将会影响他的一个或多个骨骼。 更新之后的顶点格式如下： Stuct InstanceVertex { D3DVECTOR3  mPosition; //other properties…… WORD  mInstanceIndex;  //Direct3D requires SHORT4 }; 在所有实体数据都添加到几何批次之后，Commit()方法将按照正确的设计，准备好顶点缓冲。 接下来就是为每个需要渲染的实体加载属性。我们假设属性只包括描述实体位置和朝向的模型矩阵，以及实体颜色。 对于支持DirectX9系列的GPU来说，最多能使用256个顶点常量：我们使用其中的200个来保存实体属性。在我们所举的例子中，每个实体需要4个常量储存模型矩阵，1个常量储存颜色，这样每个实体需要5个常量，因此每批次最多包含40个实体。 以下是Update()方法。实际的实体将在vertex shader进行处理。 D3DVECTOR4 instancesData; unsigned int count = 0; for(unsigned int i=0; i<GetInstancesCount(); ++i) { //write model matrix instancesData = *(D3DXVECTOR4*) & mInstances.mModeMatrix.m11; instancesData = *(D3DXVECTOR4*) & mInstances.mModelMatrix.m21; instancesData = *(D3DXVECTOR4*) & mInstances.mModelMatrix.m31; instancesData = *(D3DXVECTOR4*) & mInstances.mModelMatrix.m41; //write instance color instaceData = ConverColorToVec4(mInstances.mColor); } lpDevice->SetVertexConstants(INSTANCES_DATA_FIRST_CONSTANT, instancesData, count); 下面是vertex shader： //vertex input declaration struct vsInput { float4 postion : POSITON; float3 normal : NORMAL; //other vertex data int4 instance_index : BLENDINDICES; };vsOutput VertexConstantsInstancingVS( in vsInput input) { //get the instance index; the index is premultiplied by 5 to take account of the number of constants used by each instance int instanceIndex = ((int)(input.instance_index)); //access each row of the instance model matrix float4 m0 = InstanceData; float4 m1 = InstanceData; float4 m2 = InstanceData; float4 m3 = InstanceData; //construct the model matrix float4x4 modelMatrix = {m0, m1, m2, m3} //get the instance color float instanceColor = InstanceData; //transform input position and normal to world space with the instance model matrix float4 worldPostion = mul(input.position, modelMatrix); float3 worldNormal = mul(input.normal, modelMatrix; //output posion, normal and color output.position = mul(worldPostion, ViewProjectionMatrix); output.normal = mul(worldPostion,ViewProjectionMatrix); output.color = instanceColor; //output other vertex data } Render()方法设置观察和投影矩阵，并且调用一次DrawIndexedPrimitive()方法提交所有实体。 实际代码中，可以把模型空间的旋转部分储存为一个四元数（quaternion）,从而节约2个常量，把最大实体数增加到70左右。之后，在vertex shader中重新构造矩阵，当然，这也增加了编码的复杂度和执行时间。3.3.4 Batching with the Geometry Instancing API 最后介绍的一种方法就是在DirectX9中引入的，完全可由Geforce 6系列GPU硬件实现的几何实体API批次。随着原来越多的硬件支持几何实体API，这项技术将变的更加有趣，它只需要占用非常少的内存，另外也不需要太多CPU的干涉。它唯一的缺点就是只能处理来自同一几何包的实体。 DirectX9提供了以下函数来访问几何实体API： HRESULT SetStreamSourceFreq( UINT StreamNumber, UINT FrequencyParameter); StreamNumber是目标数据流的索引，FrequencyParameter表示每个顶点包含的实体数量。 我们首先创建2快顶点缓冲：一块静态缓冲，用来储存将被多次实体化的单一几何包；一块动态缓冲，用来储存实体数据。两个数据流如下图所示：  <img alt="" src="http://blog.csdn.net/images/blog_csdn_net/soilwork/GIAPI.jpg" border="0"/> Commit()必须保证所有几何体都使用了同一几何包，并且把几何体的信息复制到静态缓冲中。 Update()只需简单的把所有实体属性复制到动态缓冲中。虽然它和动态批次中的Update()方法很类似，但是却最小化了CPU的干涉和图形总线（AGP或者PCI－E）带宽。此外，我们可以分配一块足够大的顶点缓冲，来满足所有实体属性的需求，而不必担心显存消耗，因为每个实体属性只会占用整个几何包内存消耗的一小部分。 Render()方法使用正确流频率（stream frequency）设置好两个流，之后调用DrawIndexedPrimitive()方法渲染同一批次中的所有实体，其代码如下： unsigned int instancesCount = GetInstancesCount(); //set u stream source frequency for the first stream to render instancesCount instances //D3DSTREAMSOURCE_INDEXEDDATA tell Direct3D we’ll use indexed geometry for instancing lpDevice->SetStreamSourceFreq(0, D3DSTREAMSOURCE_INDEXEDDATA | instancesCount); //set up first stream source with the vertex buffer containing geometry for the geometry packet lpDevice->setStreamSource(0, mGeometryInstancingVB, 0, mGeometryPacketDeck); //set up stream source frequency for the second stream; each set of instance attributes describes one instance to be rendered lpDevice->SetstreamSouceFreq(1, D3DSTREAMSOURCE_INDEXEDDATA | 1); // set up second stream source with the vertex buffer containing all instances’ attributes pd3dDevice->SetStreamSource(1, mGeometryInstancingVB, 0, mInstancesDataVertexDecl); GPU通过虚拟复制（virtually duplicating）把顶点从第一个流打包到第二个流中。vertex shader的输入参数包括顶点在模型空间下的位置，以及额外的用来把模型矩阵变换到世界空间下的实体属性。代码如下： // vertex input declaration struct vsInput { //stream 0 float4 position : POSITION; float3 normal  : NORMAL; //stream 1 float4 model_matrix0 :  TEXCOORD0; float4 model_matrix1 :  TEXCOORD1; float4 model_matrix2 :  TEXCOORD2; float4 model_matrix3 :  TEXCOORD3;float4 instance_color :  D3DCOLOR; };vsOutput geometryInstancingVS(in vsInput input) { //construct the model matrix float4x4 modelMatrix =  { input.model_matrix0, input.model_matrix1, input.model_matrix2, input.model_matrix3, } //transform inut position and normal to world space with the instance model matrix float4 worldPosition = mul(input.position, modelMatrix); float3 worldNormal = mul(input.normal,modelMatrix); //output positon, normal ,and color output.positon = mul(worldPostion,ViewProjectionMatrix); output.normal = mul(worldNormal,ViewProjectionMatrix); output.color = int.instance_color; //output other vertex data….. } 由于最小化了CPU负载和内存占用，这种技术能高效的渲染同一几何体的大量副本，因此，也是游戏中理想的解决方案。当然，它的缺点在于需要硬件功能的支持，此外，也不能轻易实现skinning。 如果需要实现skinning，可以尝试把所有实体的所有骨骼信息储存为一张纹理，之后为相应的实体选择正确的骨骼，这需要用到Shader Model3.0中的顶点纹理访问功能。如果使用这种技术，那么访问顶点纹理带来的性能消耗是不确定的，应该实现进行测试。3．4 结论 本文描述了几何实体的概念，并且描述了4中不同的技术，来达到高效渲染同一几何体多次的目的。每一种技术都有有点和缺点，没有哪种单一的方法能完美解决游戏场景中可能遇到的问题。应该根据应用程序的类型和渲染的物体种类来选择相应的方法。 一下是一些场景中建议使用的方法： *对于包含了同一几何体大量静态实体的室内场景，由于他们很少移动，静态批次是最好的选择。 *包含了大量动画实体的户外场景，比如包含了数百战士的即时战略游戏，动态批次也许是最好的选择。 *包含了大量蔬菜和树木的户外场景，通常需要对他们的属性进行修改（比如实现随风而动的效果），以及一些粒子系统，几何批次API也许就是最好的选择。 通常，同一应用程序会用到两个以上的方法。这种情况下，使用一个抽象的几何批次接口隐藏具体实现，能让引擎更容易进行模块化和管理。这样，对整个程序来说，几何实体化的实现工作也能减少很多。 <img alt="" src="http://blog.csdn.net/images/blog_csdn_net/soilwork/20050302_GreekCity_LumberMi.jpg" border="0"/> （图中，静态的建筑使用了静态批次，而树则使用了几何实体API）
[此贴子已经被作者于2006-12-7 10:33:27编辑过]

页: [1]

游艺网|GAME798|-游戏美术、游戏培训、游戏艺术工厂、游戏开发交流论坛！'s Archiver

[翻译]Inside Geometry Instancing(下)