zhigu 发表于 2007-6-1 22:12:00

高效率的3D图形数学库(1) ----Vector概览

潜水了很长时间,该是做点贡献的时候了,最近写的,发上来给各位拍砖: <p></p><p>&nbsp;&nbsp;&nbsp;&nbsp;最近研究汇编比较多,看自己C++代码的汇编源码简直是一种折磨,这迫使我将所有数学库重新用汇编指令实现,当然,包括对CPUID的检测和使用扩展指令集。测试结果是与D3DX9的数学函数比较的,效果另人满意,除了矩阵相乘的算法总是与D3DXMatrixMultiply函数有7%的差距外,其余都是持平甚至遥遥领先(也许是我疯了,有新的看官可以自己测一下)。&nbsp;由于本人技术浅薄,测试效率的方法又比较简陋,所以还请高手指正!<br/>第一步是介绍我的Vector类,以下是声明:</p><p>struct&nbsp;__declspec(dllexport)&nbsp;Vector&nbsp;<br/>{</p><p>/******************变量********************/</p><p>float&nbsp;x,&nbsp;y,&nbsp;z,&nbsp;w;</p><p>/******************构造*******************/</p><p>//&nbsp;构造函数<br/>Vector()&nbsp;{}<br/>//&nbsp;构造函数<br/>Vector(const&nbsp;float*&nbsp;v);<br/>//&nbsp;构造函数<br/>Vector(float&nbsp;_x,&nbsp;float&nbsp;_y,&nbsp;float&nbsp;_z,&nbsp;float&nbsp;_w);</p><p>/******************方法*******************/</p><p>//&nbsp;设置向量<br/>void&nbsp;SetVector(const&nbsp;float*&nbsp;v);<br/>//&nbsp;设置向量<br/>void&nbsp;SetVector(float&nbsp;_x,&nbsp;float&nbsp;_y,&nbsp;float&nbsp;_z,&nbsp;float&nbsp;_w);<br/>//&nbsp;减法<br/>void&nbsp;Difference(const&nbsp;Vector*&nbsp;pSrc,&nbsp;const&nbsp;Vector*&nbsp;pDest);<br/>//&nbsp;反向量<br/>void&nbsp;Inverse();<br/>//&nbsp;单位化向量<br/>void&nbsp;Normalize();<br/>//&nbsp;是否单位向量<br/>bool&nbsp;IsNormalized();<br/>//&nbsp;向量长度(慢)<br/>float&nbsp;GetLength();<br/>//&nbsp;向量长度的平方(快)<br/>float&nbsp;GetLengthSq();<br/>//&nbsp;通过两向量求叉乘,结果保存在该向量中<br/>void&nbsp;Cross(const&nbsp;Vector*&nbsp;pU,&nbsp;const&nbsp;Vector*&nbsp;pV);<br/>//&nbsp;求两向量夹角<br/>float&nbsp;AngleWith(Vector&amp;&nbsp;v);</p><p>/*************运算符重载*****************/</p><p>//&nbsp;运算符重载<br/>void operator&nbsp;+=&nbsp;(Vector&amp;&nbsp;v);<br/>//&nbsp;运算符重载<br/>void operator&nbsp;-=&nbsp;(Vector&amp;&nbsp;v);<br/>//&nbsp;运算符重载<br/>void operator&nbsp;*=&nbsp;(float&nbsp;v);<br/>//&nbsp;运算符重载<br/>void operator&nbsp;/=&nbsp;(float&nbsp;v);<br/>//&nbsp;运算符重载<br/>Vector operator&nbsp;+&nbsp;(Vector&amp;&nbsp;v)&nbsp;const;<br/>//&nbsp;运算符重载<br/>Vector operator&nbsp;-&nbsp;(Vector&amp;&nbsp;v)&nbsp;const;<br/>//&nbsp;运算符重载<br/>float operator&nbsp;*&nbsp;(Vector&amp;&nbsp;v)&nbsp;const;<br/>//&nbsp;运算符重载<br/>void operator&nbsp;*=&nbsp;(GaiaMatrix&amp;&nbsp;m);<br/>//&nbsp;运算符重载<br/>Vector operator&nbsp;*&nbsp;(float&nbsp;f)&nbsp;const;<br/>//&nbsp;运算符重载<br/>bool operator&nbsp;==(Vector&amp;&nbsp;v);<br/>//&nbsp;运算符重载<br/>bool operator&nbsp;!=(Vector&amp;&nbsp;v);<br/>//&nbsp;运算符重载<br/>//void operator&nbsp;=&nbsp;(Vector&amp;&nbsp;v);<br/>};</p><p>然后是简单的内联函数:</p><p>//&nbsp;构造函数<br/>inline&nbsp;Vector::Vector(const&nbsp;float*&nbsp;v)<br/>:&nbsp;x(v)<br/>,&nbsp;y(v)<br/>,&nbsp;z(v)<br/>,&nbsp;w(v)<br/>{<br/>}</p><p>//&nbsp;构造函数<br/>inline&nbsp;Vector::Vector(float&nbsp;_x,&nbsp;float&nbsp;_y,&nbsp;float&nbsp;_z,&nbsp;float&nbsp;_w)<br/>:&nbsp;x(_x)<br/>,&nbsp;y(_y)<br/>,&nbsp;z(_z)<br/>,&nbsp;w(_w)<br/>{<br/>}</p><p>//&nbsp;设置向量<br/>inline&nbsp;void&nbsp;Vector::SetVector(const&nbsp;float*&nbsp;v)<br/>{<br/>x&nbsp;=&nbsp;v;&nbsp;y&nbsp;=&nbsp;v;&nbsp;z&nbsp;=&nbsp;v;<br/>}</p><p>//&nbsp;设置向量<br/>inline&nbsp;void&nbsp;Vector::SetVector(float&nbsp;_x,&nbsp;float&nbsp;_y,&nbsp;float&nbsp;_z,&nbsp;float&nbsp;_w)<br/>{<br/>x&nbsp;=&nbsp;_x; y&nbsp;=&nbsp;_y; z&nbsp;=&nbsp;_z; w&nbsp;=&nbsp;_w;<br/>}</p><p>//&nbsp;减法<br/>inline&nbsp;void&nbsp;Vector::Difference(const&nbsp;Vector*&nbsp;pSrc,&nbsp;const&nbsp;Vector*&nbsp;pDest)<br/>{<br/>x&nbsp;=&nbsp;pDest-&gt;x&nbsp;-&nbsp;pSrc-&gt;x;<br/>y&nbsp;=&nbsp;pDest-&gt;y&nbsp;-&nbsp;pSrc-&gt;y;<br/>x&nbsp;=&nbsp;pDest-&gt;z&nbsp;-&nbsp;pSrc-&gt;z;<br/>}</p><p>//&nbsp;反向量<br/>inline&nbsp;void&nbsp;Vector::Inverse()<br/>{<br/>x&nbsp;=&nbsp;-x; y&nbsp;=&nbsp;-y;&nbsp;z&nbsp;=&nbsp;-z;<br/>}</p><p>//&nbsp;是否单位向量<br/>inline&nbsp;bool&nbsp;Vector::IsNormalized()<br/>{<br/>return&nbsp;CmpFloatSame(x*x+y*y+z*z,&nbsp;1.0f);<br/>}</p><p>//&nbsp;运算符重载<br/>inline&nbsp;void&nbsp;Vector::operator&nbsp;+=&nbsp;(Vector&amp;&nbsp;v)<br/>{<br/>x&nbsp;+=&nbsp;v.x; y&nbsp;+=&nbsp;v.y; z&nbsp;+=&nbsp;v.z;<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;void&nbsp;Vector::operator&nbsp;-=&nbsp;(Vector&amp;&nbsp;v)<br/>{<br/>x&nbsp;-=&nbsp;v.x; y&nbsp;-=&nbsp;v.y; z&nbsp;-=&nbsp;v.z;<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;void&nbsp;Vector::operator&nbsp;*=&nbsp;(float&nbsp;f)<br/>{<br/>x&nbsp;*=&nbsp;f; y&nbsp;*=&nbsp;f; z&nbsp;*=&nbsp;f;<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;void&nbsp;Vector::operator&nbsp;/=&nbsp;(float&nbsp;f)<br/>{<br/>f&nbsp;=&nbsp;1.0f/f;<br/>x&nbsp;*=&nbsp;f; y&nbsp;*=&nbsp;f; z&nbsp;*=&nbsp;f;<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;Vector&nbsp;Vector::operator&nbsp;+&nbsp;(Vector&amp;&nbsp;v)&nbsp;const<br/>{<br/>return&nbsp;Vector(x+v.x,&nbsp;y+v.y,&nbsp;z+v.z,&nbsp;w);<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;Vector&nbsp;Vector::operator&nbsp;-&nbsp;(Vector&amp;&nbsp;v)&nbsp;const<br/>{<br/>return&nbsp;Vector(x-v.x,&nbsp;y-v.y,&nbsp;z-v.z,&nbsp;w);<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;float&nbsp;Vector::operator&nbsp;*&nbsp;(Vector&amp;&nbsp;v)&nbsp;const<br/>{<br/>return&nbsp;(x*v.x&nbsp;+&nbsp;y*v.y&nbsp;+&nbsp;z*v.z);<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;Vector&nbsp;Vector::operator&nbsp;*&nbsp;(float&nbsp;f)&nbsp;const<br/>{<br/>return&nbsp;Vector(x*f,&nbsp;y*f,&nbsp;z*f,&nbsp;w);<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;bool&nbsp;Vector::operator&nbsp;==(Vector&amp;&nbsp;v)<br/>{<br/>return&nbsp;((((x-v.x)&lt;FLOAT_EPS&nbsp;&amp;&amp;&nbsp;(x-v.x)&gt;-FLOAT_EPS)&nbsp;||&nbsp;((y-v.y)&lt;FLOAT_EPS&nbsp;&amp;&amp;&nbsp;(y-v.y)&gt;-FLOAT_EPS)&nbsp;||&nbsp;((z-v.z)&lt;FLOAT_EPS&nbsp;&amp;&amp;&nbsp;(z-v.z)&gt;-FLOAT_EPS))?&nbsp;false:true);<br/>}<br/>//&nbsp;运算符重载<br/>inline&nbsp;bool&nbsp;Vector::operator&nbsp;!=(Vector&amp;&nbsp;v)<br/>{<br/>return&nbsp;((((x-v.x)&lt;FLOAT_EPS&nbsp;&amp;&amp;&nbsp;(x-v.x)&gt;-FLOAT_EPS)&nbsp;||&nbsp;((y-v.y)&lt;FLOAT_EPS&nbsp;&amp;&amp;&nbsp;(y-v.y)&gt;-FLOAT_EPS)&nbsp;||&nbsp;((z-v.z)&lt;FLOAT_EPS&nbsp;&amp;&amp;&nbsp;(z-v.z)&gt;-FLOAT_EPS))?&nbsp;true:false);<br/>}</p><p>这里比较重要的优化有几点,也可以作为写代码的原则,非常非常重要:</p><p>1、可以用const的地方一定要用!编辑器会拿这个来优化的。<br/>2、return返回一个值的时候,如果可以的话,就一定要以构造函数的形式返回值。如:<br/>return&nbsp;Vector(x+v.x,&nbsp;y+v.y,&nbsp;z+v.z,&nbsp;w);<br/>3、多个数除以同一个数时,一定要按照如Vector::operator&nbsp;/=&nbsp;(float&nbsp;f)中的形式写。<br/>4、这样的小函数一定是要inline的!</p><p>以上4点一定要遵守,否则做出的汇编代码惨不忍睹!效率自然也是一落千丈,切记切记。</p><p>接下来是Vector的高级函数部分:</p><p>//&nbsp;向量长度的平方(快)<br/>float&nbsp;Vector::GetLengthSq() //&nbsp;潜在危险<br/>{<br/>_asm<br/>{<br/>fld&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dword&nbsp;ptr&nbsp;;<br/>fmul&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dword&nbsp;ptr&nbsp;;<br/>fld&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dword&nbsp;ptr&nbsp;;<br/>fmul&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dword&nbsp;ptr&nbsp;;<br/>faddp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st(1),st;<br/>fld&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dword&nbsp;ptr&nbsp;&nbsp;;<br/>fmul&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dword&nbsp;ptr&nbsp;&nbsp;;<br/>faddp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st(1),st&nbsp;;<br/>}<br/>//return&nbsp;x*x&nbsp;+&nbsp;y*y&nbsp;+&nbsp;z*z;<br/>}</p><p>//&nbsp;向量长度(慢)<br/>float&nbsp;Vector::GetLength()<br/>{<br/>float&nbsp;f;<br/>if&nbsp;(g_bUseSSE2)<br/>{<br/>_asm<br/>{<br/>lea ecx,&nbsp;f;<br/>mov eax,&nbsp;this;<br/>mov dword&nbsp;ptr&nbsp;,&nbsp;0;&nbsp;//&nbsp;w&nbsp;=&nbsp;0.0f;</p><p>movups xmm0,&nbsp;;<br/>mulps xmm0,&nbsp;xmm0;<br/>movaps xmm1,&nbsp;xmm0;<br/>shufps xmm1,&nbsp;xmm1,&nbsp;4Eh; 洗牌<br/>addps xmm0,&nbsp;xmm1;<br/>movaps xmm1,&nbsp;xmm0;<br/>shufps xmm1,&nbsp;xmm1,&nbsp;11h; 洗牌<br/>addss xmm0,&nbsp;xmm1;</p><p>sqrtss xmm0,&nbsp;xmm0;&nbsp;第一个单元求开方<br/>movss dword&nbsp;ptr&nbsp;,&nbsp;xmm0;&nbsp;第一个单元的值给ecx指向的内存空间</p><p>mov dword&nbsp;ptr&nbsp;,&nbsp;3F800000h;&nbsp;//&nbsp;3F800000h&nbsp;==&nbsp;1.0f<br/>}<br/>}<br/>else<br/>{<br/>f&nbsp;=&nbsp;(float)sqrt(x*x+y*y+z*z);<br/>}<br/>return&nbsp;f;<br/>}</p><p>//&nbsp;单位化向量<br/>void&nbsp;Vector::Normalize()<br/>{<br/>if&nbsp;(g_bUseSSE2)<br/>{<br/>_asm<br/>{<br/>mov eax,&nbsp;this;<br/>mov&nbsp;dword&nbsp;ptr,&nbsp;0;</p><p>movups xmm0,&nbsp;;<br/>movaps xmm2,&nbsp;xmm0;<br/>mulps xmm0,&nbsp;xmm0;<br/>movaps xmm1,&nbsp;xmm0;<br/>shufps xmm1,&nbsp;xmm1,&nbsp;4Eh;<br/>addps xmm0,&nbsp;xmm1;<br/>movaps xmm1,&nbsp;xmm0;<br/>shufps xmm1,&nbsp;xmm1,&nbsp;11h;<br/>addps xmm0,&nbsp;xmm1;</p><p>rsqrtps xmm0,&nbsp;xmm0;<br/>mulps xmm2,&nbsp;xmm0;<br/>movups ,&nbsp;xmm2;</p><p>mov&nbsp;dword&nbsp;ptr&nbsp;,&nbsp;3F800000h;<br/>}<br/>}<br/>else<br/>{<br/>float&nbsp;f&nbsp;=&nbsp;(float)sqrt(x*x+y*y+z*z);<br/>if&nbsp;(f&nbsp;!=&nbsp;0.0f)<br/>{<br/>f&nbsp;=&nbsp;1.0f/f;<br/>x*=f; y*=f; z*=f;<br/>}<br/>}<br/>}</p><p>//&nbsp;通过两向量求叉乘,结果保存在该向量中<br/>void&nbsp;Vector::Cross(const&nbsp;Vector*&nbsp;pU,&nbsp;const&nbsp;Vector*&nbsp;pV)<br/>{<br/>if&nbsp;(g_bUseSSE2)<br/>{<br/>_asm<br/>{<br/>mov eax,&nbsp;pU;<br/>mov&nbsp;edx,&nbsp;pV;</p><p>movups xmm0,&nbsp;<br/>movups xmm1,&nbsp;<br/>movaps xmm2,&nbsp;xmm0<br/>movaps xmm3,&nbsp;xmm1</p><p>shufps xmm0,&nbsp;xmm0,&nbsp;0xc9<br/>shufps xmm1,&nbsp;xmm1,&nbsp;0xd2<br/>mulps xmm0,&nbsp;xmm1</p><p>shufps xmm2,&nbsp;xmm2,&nbsp;0xd2<br/>shufps xmm3,&nbsp;xmm3,&nbsp;0xc9<br/>mulps xmm2,&nbsp;xmm3</p><p>subps xmm0,&nbsp;xmm2</p><p>mov eax,&nbsp;this<br/>movups ,&nbsp;xmm0</p><p>mov ,&nbsp;3F800000h;<br/>}<br/>}<br/>else<br/>{<br/>x&nbsp;=&nbsp;pU-&gt;y&nbsp;*&nbsp;pV-&gt;z&nbsp;-&nbsp;pU-&gt;z&nbsp;*&nbsp;pV-&gt;y;<br/>y&nbsp;=&nbsp;pU-&gt;z&nbsp;*&nbsp;pV-&gt;x&nbsp;-&nbsp;pU-&gt;x&nbsp;*&nbsp;pV-&gt;z;<br/>z&nbsp;=&nbsp;pU-&gt;x&nbsp;*&nbsp;pV-&gt;y&nbsp;-&nbsp;pU-&gt;y&nbsp;*&nbsp;pV-&gt;x;<br/>w&nbsp;=&nbsp;1.0f;<br/>}<br/>}</p><p><br/>//&nbsp;运算符重载<br/>void&nbsp;Vector::operator&nbsp;*=&nbsp;(Matrix&amp;&nbsp;m) //&nbsp;潜在危险<br/>{<br/>#ifdef&nbsp;_DEBUG<br/>assert(w!=1.0f&nbsp;&amp;&amp;&nbsp;w!=0.0f);<br/>#endif</p><p>if&nbsp;(g_bUseSSE2)<br/>{<br/>_asm<br/>{<br/>mov ecx,&nbsp;this;<br/>mov edx,&nbsp;m;<br/>movss xmm0,&nbsp;;<br/>//lea eax,&nbsp;vr;<br/>shufps xmm0,&nbsp;xmm0,&nbsp;0;&nbsp;//&nbsp;xmm0&nbsp;=&nbsp;x,x,x,x</p><p>movss xmm1,&nbsp;;<br/>mulps xmm0,&nbsp;;<br/>shufps xmm1,&nbsp;xmm1,&nbsp;0; //&nbsp;xmm1&nbsp;=&nbsp;y,y,y,y</p><p>movss xmm2,&nbsp;;<br/>mulps xmm1,&nbsp;;<br/>shufps xmm2,&nbsp;xmm2,&nbsp;0; //&nbsp;xmm2&nbsp;=&nbsp;z,z,z,z</p><p>movss xmm3,&nbsp;;<br/>mulps xmm2,&nbsp;;<br/>shufps xmm3,&nbsp;xmm3,&nbsp;0; //&nbsp;xmm3&nbsp;=&nbsp;w,w,w,w</p><p>addps xmm0,&nbsp;xmm1;<br/>mulps xmm3,&nbsp;;</p><p>addps xmm0,&nbsp;xmm2;<br/>addps xmm0,&nbsp;xmm3; //&nbsp;xmm0&nbsp;=&nbsp;result<br/>movups ,&nbsp;xmm0;<br/>mov ,&nbsp;3F800000h;<br/>}</p><p>}&nbsp;<br/>else<br/>{<br/>Vector&nbsp;vr;<br/>vr.x&nbsp;=&nbsp;x*m._11&nbsp;+&nbsp;y*m._21&nbsp;+&nbsp;z*m._31&nbsp;+&nbsp;w*m._41;<br/>vr.y&nbsp;=&nbsp;x*m._12&nbsp;+&nbsp;y*m._22&nbsp;+&nbsp;z*m._32&nbsp;+&nbsp;w*m._42;<br/>vr.z&nbsp;=&nbsp;x*m._13&nbsp;+&nbsp;y*m._23&nbsp;+&nbsp;z*m._33&nbsp;+&nbsp;w*m._43;<br/>vr.w&nbsp;=&nbsp;x*m._14&nbsp;+&nbsp;y*m._24&nbsp;+&nbsp;z*m._34&nbsp;+&nbsp;w*m._44;</p><p>x&nbsp;=&nbsp;vr.x;<br/>y&nbsp;=&nbsp;vr.y;<br/>z&nbsp;=&nbsp;vr.z;<br/>w&nbsp;=&nbsp;1.0f;<br/>}<br/>}</p><p><br/>//&nbsp;求两向量夹角<br/>float&nbsp;Vector::AngleWith(Vector&amp;&nbsp;v)<br/>{<br/>return&nbsp;(float)acosf((*this&nbsp;*&nbsp;v)/(this-&gt;GetLength()*v.GetLength()*2.0f));<br/>}</p><p>这里要说明3个函数:GetLengthSq,*=&nbsp;和AngleWith<br/>&nbsp;&nbsp;&nbsp;&nbsp;GetLengthSq有潜在危险,因为我是根据.Net2003的编辑器来写的代码,我知道ecx==this,知道float的返回值是直接从浮点栈寄存器fstp到外面参数的,所以,我会用这种方法来写,甚至没有写返回值!而看此文的您可能不会使用与我一样的编辑器,所以,在理解了实质之后,运用合理的算法来实现你的数学库。后面的函数都使用了编辑器无关的方法写的。</p><p>&nbsp;&nbsp;&nbsp;&nbsp;*=&nbsp;的运算符重载的潜在危险在于,Vector是4D的,可以表示3D的向量或者3D空间点坐标。如果是向量,则w==0,这样就只会受到旋转和缩放的影响。而如果是表示空间点,w==1,就会受到所有类型的变动,如平移、旋转和缩放。由于向量是不能平移的,处于对运算效率的考虑,这时候就需要数学库的调用者自己注意了。</p><p>&nbsp;&nbsp;&nbsp;&nbsp;AngleWith函数之所以不对其进行内联化,是因为在以后的文章中,我会去进一步优化这里的代码。GetLength和acosf都不是内联函数,我必须要将其展开,以汇编实现,并重新组织编码。这个函数好像在D3DX9的数学库中是没有的~~没办法比较了。</p><p>以上几个函数的效率与D3DX库比较结果大致是这样的:<br/>&nbsp;&nbsp;&nbsp;&nbsp;GetLengthSq微高于D3DX<br/>&nbsp;&nbsp;&nbsp;&nbsp;GetLength是D3DX速度的2倍多,因为D3D库没有用SSE指令。<br/>&nbsp;&nbsp;&nbsp;&nbsp;Normalize和Cross的速度比D3DX的高的太多,有些离谱。同样是因为D3D库没有用SSE指令。<br/>&nbsp;&nbsp;&nbsp;&nbsp;*=的效率低于D3DXVec3Transform约7%,有进一步提高的可能!高手来看看。D3DX库用的是3DNow!运算的,居然比SSE快!大概是因为我的AMD3000+的缘故吧...,换在Inter上应该速度差不多了。<br/>&nbsp;&nbsp;&nbsp;&nbsp;AngleWith没有办法评测,因为没有比照对象。</p><p>&nbsp;&nbsp;&nbsp;&nbsp;很多算法都经过手工的指令重排,发现指令的顺序对效率的影响是非常大的!在改变指令顺序时一定要慎重!最好拷贝一份原来的,否则在排比较长的汇编代码时会把自己玩晕的~o~<br/>&nbsp;&nbsp;&nbsp;&nbsp;顺便提几个很多人疑惑的问题:<br/>&nbsp;&nbsp;&nbsp;&nbsp;1、那个C++库里的_mm_mov_ps()类似的代码,简直就是垃圾!想要效率就千万别用那个,好好的学习汇编,然后亲手写代码。那些库里的函数搞出的代码简直就是惨不忍睹!<br/>&nbsp;&nbsp;&nbsp;&nbsp;2、movups和movaps的效率差距几乎可以忽略不计的!别为了快那么百分之一的速度就声明一个_m128的Vector或者Matrix,以后建立数组的时候可有你受的了!<br/>&nbsp;&nbsp;&nbsp;&nbsp;3、本人的测试方法太菜了,就是循环1000万遍,用timeGetTime()看个大概。多运行几遍找个平均而已。所以,一旦Release模式的内联就测不出效率了~有时间的高人们可以去测试一下,估计能内联的函数都是快接近效率极限的,不太值得优化。<br/>如果对我的测试有什么疑惑,看官们可以考回去自己测试效率,换多种CPU试一下,我在这儿接受任何人的拍砖!</p><p>&nbsp;&nbsp;&nbsp;&nbsp;下一次我将详细说说我对SSE和浮点指令的理解,以及最有用的矩阵相乘算法。<br/></p>
页: [1]
查看完整版本: 高效率的3D图形数学库(1) ----Vector概览