|
发表于 2009-5-26 15:06:27
|
显示全部楼层
FAQ:关于加速处理器的积分
http://www.gpugrid.net/forum_thread.php?id=219
FAQ: Accelerator processors. A note on credits
FAQ:关于加速处理器的积分

A conventional processor dedicates a relatively large fraction of its transistors to complex control logic, to maximise performance of a serial code. The Cell processor contains 8 fast Synergetic Processing Elements (SPEs) designed to maximise arithmetic throughput. Graphical processing units (GPUs) have a very large number of slower cores maximizing parallel throughput.
一个传统的处理器使用相当多的微小的晶体管来构成复杂的逻辑控制系统,擅长处理一连串的指令代码。Cell架构的处理器包含了8个快速协同处理器(SPEs),擅长进行强大的算术运算。图形处理单元(GPUs)拥有大量的慢速核心,擅长并行处理。
All this computational power comes at the cost of a programming paradigm change. An existing application would run on the Cell processor using only the PPE core without any performance benefit. Therefore, in order to obtain the maximum performance, it is necessary to use all SPEs and to adapt the code to match the underlying hardware architecture. This means addressing issues of vectorization, memory alignment and communication between main memory and local stores. On GPUs, it is now available a nice programming environment (CUDA) which helps dramatically in fully exploiting the potentiality of these devices.
所有这些计算能力在修改程序代码后都能获得。一个目前运行在Cell处理器上的应用程序仅仅只能使用PPE核心,并且无法获得优化的性能。但是,为了获得最优的性能,有必要使用所有的SPE,修改代码并使之与下面的硬件结构相匹配。这意味着矢量地址分配、内存对齐以及主存与硬盘之间的通信。目前,GPU已经有了一个良好的编程环境,能将设备潜在的性能完美地发掘出来。
HOW DO WE ASSIGN CREDITS?
我们如何分配积分?
On a standard PC, BOINC assigns credits based on the average between the floating-point and integer performance of the machine according to a set of benchmarks performed by the client, regardless of the real performance of the application on the machine.
在一台标准的PC机上,BOINC基于机器匹配的基准相应的浮点计算及整数计算的能力来分配积分,不关心应用程序在机器上的真实性能。
Credits = 0.5(million float ops/sec + million int ops/sec)/864,000 * (cpu time in seconds),
(each unit of BOINC credit, the Cobblestone, is 864,000 MIPS)
积分 = 0.5(每秒运行的浮点运算次数 + 每秒运行的整数运算次数)/86400 * (耗费的CPU时间),其中次数单位为百万,时间单位为秒;(每个任务的BOINC积分,作为基准的Cobblestones,是864000 MIPS)
where "float ops" are floating-point operations, and "int ops" are integer operations. These benchmarks on the Cell processor are of course wrong because they do not use the SPEs. The same applies for GPUs which are not considered by the benchmarks. In any case, as we said, these benchmarks are just an indication of the speed of the machine not of the speed of the application.
其中“float ops”为浮点运算,“int ops”是整数运算。这些基准不能应用于Cell处理器,因为它们不使用SPE。同样的,GPU处理器也不能参照基准。无论如何,正如我们所说的,这些基准仅仅只代表这些机器的速度,而不是这些应用程序的速度。
For instance, this machine returns the following benchmark by the BOINC client:
举例而言,某台机器能通过BOINC客户端返回如下基准测试结果:
GenuineIntel Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz [Family 6 Model 15 Stepping 11]
Number of CPUs 2
Measured floating point speed 2281.82 million ops/sec
Measured integer speed 6348.82 million ops/sec
The average is therefore 4343 MIPS (million instruction per second) or equivalently 18.09 Cobblestone/hour as assigned by the BOINC system automatically. So, BOINC will assign to this machine 18.09 Cobblestones each CPU hour of calculation. Note that the ratio between floating operations and integer operations is approximately 3 (=6348/2281).
一般情况下,BOINC系统自动分配4343MIPS(每秒运行××百万条指令)或18.09Cobblestones/小时。因此,BOINC将分配给这台机器每个CPU每小时的计算量为18.09 Cobblestones。您可以发现,浮点运算与整数运算的比率约为3(=6348/2281)。
The way we assign credits takes into account these facts.
我们分配积分基于以下事实。
First of all we need to measure the floating point performance of the application. We have build a performance model of our applications (CELLMD and ACEMD) by counting the number of flops manually per step. For a specific WU, we are able to compute how many total floating operations are performed in average depending on the number of atoms, number of steps and so on. For CELLMD it was also possible to verify that the estimated flops were correct within few percent from the real value (multiplication, addition, subtraction, division and reciprocal square root are counted as as a single floating-point operation). In the case of GPU, we can also use interpolating texture units instead of computing some expensive expression. In this case, as the CPU does not have anything similar, we use the number of floats of the equivalent expression. It is not easy to measure the number of integer operations, so we guess the estimated MIPS to be 2 times the number of floating-point operations (really, we reckon that it would be correct to assign up to a factor 3 times, as in the example above). Therefore,
首先,我们需要衡量应用程序浮点运算的性能。我们通过人工步进的方式对我们的应用程序(CELLMD和ACEMD)进行操作,并建立了一个性能模型。对某一个特定的WU而言,我们可以计算内核的平均性能表现是多少次浮点操作、多少个步骤等等。就CELLMD而言,它也能验证在少数有价值的操作(乘法、加法、减法、除法、开方等等能当做单一的浮点操作)中有多少操作是正确的。在GPU情况下,我们也能使用中断处理单元替代高昂的计算处理。在这种情况下,CPU不会进行任何模仿,我们使用一些等效的表达式进行浮点运算。这对测量完整的操作而言并不容易,因此我们猜测MIPS评估值是浮点操作的两倍(事实上,我们觉得在上面的范例中将一个因子分配3次是正确的)。但是,
Credits = 0.5(MFLOP per WU + approx MIPS per WU)/864,000
(MFLOP is million of floating point operations)
Finally note, that this method produces the credits for the real performance of the application, not a benchmark as the BOINC client does, so it is a bit penalized.
最后还需要注意的是,根据这种算法,对应用程序的真实性能算出的积分,并不是一个BOINC客户端给出的基准分,这有些不公平。
In molecular dynamics, speed is critical and we put all our efforts into providing the most efficient molecular dynamics codes. To give you an idea, the development of these codes took literally years of work. Read more on the performance and efficiency of our applications: ACEMD, CELLMD
在分子动力学中,速度是至关重要的,我们一直在致力于开发具有最大效能的分子动力学代码。告诉大家一个事实,这些代码的开发已经耗费了我们许多年的工作。您可以阅读更多关于我们的应用程序效能的说明:ACEMD, CELLMD |
|