|
发表于 2008-6-22 17:03:40
|
显示全部楼层
原帖由 lsyhzhou 于 2008-6-22 01:00 发表 
到底是个虾米原因造成A卡完败捏?
1,因为CUDA采用C语言编程,执行效率高?
2,320个流处理器五分之四不是全功能的,只有一64个标准的起作用,与9600GT对比起来是差不多
3870的PPD算作1900好了,核心775
9600GT(标准64SP)的PPD3000,SP ...
老外也提出了这个疑问,难道说...俺一直被ATI忽悠了....
Hi All,
I read a debate going on in the forums as to why ATI core is giving less PPD compared to NVIDIA core. And I see many reasons given including the wu's assigned are different (Definitely that also will impact) as well as MAYBE NVidia core is NOT DOING Science work just Credit work (ONLY possible if someone has goofed up major while Programming - LESS LIKELY because I am sure lot of testing has occured within fah stanford group before releasing the GPU beta to the world.)
For a long time (From the time AMD and NVIDIA came out with Unified shaders) I have had a thought in my mind, which seems to be playing out now if the PPD results are valid (i.e no programming error some where). So let me start =>
EVERYONE has to REALIZE the difference in definition of SPs between ATI/AMD and NVIDIA. To be frank I would say the way ATI named their single SP has SP is a kind of JOKE (in some ways) compared to NVIDIA.
Based on what I have read about the NVIDIA and ATI GPU Architectures on the Net (Haven't had time nor a ATI card at hand to write code to verify this yet)
IN NVIDIA __EACH SP__ is capable of doing either FP32 or Integer (ALL ops) or Special_functions.
WHILE
IN AMD __FOR EACH Group of 5 SPs ONLY ONE SP__ can do FP32 or Integer_MUL or Special function, while the other 4 SP can only do SIMPLE INTEGER operations.
So if the code is doing lot of FP operations or special functions or Integer_MUL IN EFFECT one gets only 1/5 th the number of SPs in ATI as compared to what ATI claims as SPs i.e A ATI chip with 320 SPs is in effect ONLY 64 SPs. So obviously a ATI performance will be only 50% compared to NVIDIA in the WORST CASE.
However in practice one would find that ATI GPUs will give bit more performance than the WORST CASE Mentioned above because IF one can mix the FP32/IntegerMUL/Special_Functions with Simple_Integer ops then the other 4 SPs out of the 5 SP group in ATI can be utilized. Thus improving over the WORST CASE which I mentioned.
SO BEFORE JUMPING AND CONCLUDING may be WE have to think of the reality interms of what is ATI 320 SIMPLE SPs worth compared to NVIDIA 128 FULL SPs.
Note: Also one more thing which can aid ATI a bit could be the better branch/thread (i.e independent code) granularity in ATI compared to NVidia. But still the 128 Full SPs in NVidia vs 64 Full SPs in AMD/ATI may not help much for ATI wrt this granularity if the Vector sizes on which the operations are occuring are large.
Just my thoughts. Happy to get constructive feedback, even to the extent of telling that my thoughts are rubbish, provided it is backed technically. I am starting this thread mainly to understand the G80/G92 core VS R600 core architecture/programming advantages/disadvantages.
Keep
HanishKVChanishkvc |
|