BOINC 4.x技术更新内容！关于CPU时间在多个项目间的分配

vmzy · 发表于 2004-9-2 00:00:00

Time-slicing
Starting with version 4.00, the BOINC core client does time-slicing. This means that the core client may switch back and forth between results of different projects. This is done in a way that allocates CPU time according to the 'resource shares' you have assigned to each project.

For example, suppose you participate in SETI@home with resource share 100 and Predictor@home with resource share 200. A single-processor machine might be scheduled as follows:

1:00 - 2:00: SETI@home
2:00 - 3:00: Predictor@home
3:00 - 4:00: Predictor@home
4:00 - 5:00: SETI@home
5:00 - 6:00: Predictor@home
6:00 - 7:00: Predictor@home
...

A two-processor machine might be scheduled as follows:
CPU 0 CPU 1
1:00 - 2:00: Predictor@home SETI@home
2:00 - 3:00: Predictor@home SETI@home
3:00 - 4:00: Predictor@home Predictor@home
4:00 - 5:00: Predictor@home SETI@home
5:00 - 6:00: Predictor@home SETI@home
6:00 - 7:00: Predictor@home Predictor@home

In every 3 hour period, your computer spends 4 hours on Predictor@home and 2 hours on SETI@home, which is the desired ratio.
This feature is necessary to handle projects like Climateprediction.net, whose work units take a long time (1 or 2 months) to complete on a typical computer. Without time-slicing, your computer would have to finish an entire work unit before it could start working on a different project.

Preemption
When BOINC switches from one application to another, the first application is said to be preempted. BOINC can do preemption in two different ways; you can select this as part of your General Preferences.

Don't leave the suspended applications in memory (default). Applications are preempted by killing them; they are later restarted, and resume from their last checkpoint. This saves virtual memory (swap space) but can waste CPU time, especially if applications checkpoint infrequently.
Leave suspended applications in memory. Applications are preempted by suspending them; they remain in virtual memory while preempted (they don't necessarily occupy physical memory).
翻译BY VMZY
Time-slicing(时间分片)
从4.00版开始，BOINC核心客户端支持time-slicing。这意味着，核心客户会在不同的项目之间反复交换。它根据您为各个项目分配的'资源份额'分配Cpu-time。

例如，假设您参加SETI@home分配资源份额100和Predictor@home分配资源份额200。单处理器机器也许会按如下分配:
1:00 - 2:00: SETI@home
2:00 - 3:00: Predictor@home
3:00 - 4:00: Predictor@home
4:00 - 5:00: SETI@home
5:00 - 6:00: Predictor@home
6:00 - 7:00: Predictor@home
...
双处理器机器也许会按如下分配:
1:00 - 2:00: Predictor@home SETI@home
2:00 - 3:00: Predictor@home SETI@home
3:00 - 4:00: Predictor@home Predictor@home
4:00 - 5:00: Predictor@home SETI@home
5:00 - 6:00: Predictor@home SETI@home
6:00 - 7:00: Predictor@home Predictor@home

每3个小时，您的计算机在Predictor@home上花费2个小时，并在SETI@home上花费1个小时，即预期的比率。
这个功能在处理象Climateprediction.net项目时是必要的，因为它在一台典型的计算机完成一个WU的时间很长(1或2个月)。没有time-slicing，您的计算机会必须完成整个WU后，它才能开始研究另外一个项目。

替换
当BOINC从一种应用程序转换到另一个时，第一应用程序被替换。BOINC可以用二种不同方式替换; 您能作为您的General Preferences一个选项选择。

不要把暂停的应用程序留在内存中(缺省)。应用程序会直接被关闭；他们以后会从他们的前个check point存盘点重新开始。这种方式节省虚拟内存(交换空间)但可能浪费Cpu-time，特别是如果应用程序的check point存盘点很少。
把暂停的应用程序留在内存中。应用程序会被暂停；他们留在虚拟内存中(他们不占物理内存)。

[此贴子已经被作者于2004-9-4 12:03:26编辑过]

[ Last edited by Youth on 2005-4-27 at 13:03 ]

vmzy · 发表于 2004-9-4 00:00:00

仔细研究了，一下，我觉得大家，还是应该到官方改一下！
因为，默认方案，虽然省虚拟内存，但会导致计算失败，效率低下，原因如下：
现在所有项目的checkpoint，基本上没有小于1个小时（我机子慢1G左右）！而项目都是一个小时就退出，下次启动时再从checkpoint开始，所以如果1小时内没有checkpoint的话，那一个小时的计算就全废了！
所以建议大家去官方改了，我现在的计算效率只有以前的1/10左右，太低了，基本上没算，所以必须要改！

Youth · 发表于 2005-3-6 12:07:44

这一点我也强调一下，免得大家浪费计算时间，直接到官网上面修改自己的参数就可以了，贴两张图吧：

Youth · 发表于 2005-3-6 12:08:32

然后在这个地方把第四条修改为yes就可以了：

Youth · 发表于 2005-5-11 19:46:11

这个调度算法从4.36版开始已经有变动了，详细的可以看http://boinc.berkeley.edu/sched.php，大概地看下来，还是挺合理的，不过这个主要是针对参与项目较多的计算机，目的是为了尽可能避免计算结果过期的问题。如果一个机器就跑一个项目的话就不用关心了：）

调度方面将原来的称为普通模式（normal mode），新引入了一个“恐慌”模式（panic mode），在新模式下CPU将优先考虑运行快要过期的计算单元，当然，是在还来得及的情况下。两个模式间的切换是自动进行的。

有了这个恐慌模式，那么本来用户设置的计算资源分配岂不是就乱了？有效期短的项目将可能获得更多的计算资源，因此，还引入了一个新的概念Debt，就叫债务吧。如前面据说，有效期短的项目得到了超过用户本来设置的计算资源，那么就说这个项目负债了（negative debt），反之就是positive debt，有了这个债务值，BOINC客户端就可能通过限制负债项目的网络连接来调节最终的资源分配比例。

[ Last edited by Youth on 2005-5-11 at 19:51 ]

Youth · 发表于 2005-5-11 19:48:14

我也只是大概的看了一下，原文如下：

Client scheduling

Last modified 6:49 PM UTC, May 09 2005

This document describes the CPU scheduling policy and work-fetch policy used in the BOINC core client, starting with version 4.36.

Terminology

Debt

How much CPU time is 'owed' to a project in order to bring it into parity with other projects, based on the user's resource-share settings. Positive debt means that a project has not had enough CPU time to match its resource share. Negative debt means that a project has had more than its share. Long-term debt is tracked over the entire time the project is attached. Short-term debt is tracked only for projects that have work on the computer.

Deadlines

The 'deadline' of a result is the time by which it must be completed and reported. Deadlines are set by projects. Work that is returned after the deadline may or may not have any value to the project, and it may or may not be granted credit, even if it matches the results that were returned on time.

Goals

The goals of the CPU scheduler and work-fetch policies are:
To complete and report results by their deadline;
To honor resource shares;
To keep an interesting mix of work on the computer.

There may be times when fetching more work will result in missed deadlines.

The CPU scheduler has two modes, normal and panic.

In normal mode the CPU scheduler does round-robin scheduling among results, attempts to honor the resource shares.

In panic mode, the CPU scheduler runs results with the nearest deadline. This allows the client to meet deadlines that would otherwise be missed. Panic mode is entered if either a work unit has a deadline that is very near, or the sum of remaining calculation times is nearly as large as the remaining calculation time available. If the CPU scheduler is in panic mode, no new work is fetched.

The CPU scheduler decides which mode it is in when a result is completed, when the end of the user specified work period is reached, when new work is downloaded, or when the user takes some action through the UI.

The work-fetch policy has three modes: no download, download OK, and download required.

Download required means that there are not enough results to keep all of the CPUs busy or there is not enough work to get to the next time that you have indicated that you are likely to connect. Work should be retrieved from someplace even if it means that work is retrieved from a project with negative long term debt.

No downloads means that the CPU scheduler is in panic mode.

In the downloads OK mode, projects with high long term debt can download work, but projects with very low long term debt cannot. Very low long term debt projects have probably recently caused a panic mode, or they have been dominating the work on the computer in some other way.

BOINC work fetch and CPU policy design

Problem

The old work fetch policy and CPU scheduler policy can miss deadlines for a number of reasons. The computer is slow, too many projects are attached, a short deadline work unit is downloaded, or a work unit with a tight deadline is downloaded.

There is a difference between short deadlines and tight deadlines:

A short deadline is a deadline that would be missed because the debt did not increase to a level where the first time slice was given to the project before the work unit expired. For example the early work units from Pirates had a one hour deadline, and the CharMM work units from Protein Predictor have a 24 hour deadline.

A tight deadline is one where the time to crunch the work unit is a large fraction of the deadline. For example on one of my machines a Sulfur Cycle work unit from Climate Prediction.Net is estimated to take 145 days and has a 180 day deadline, which is more than half of the processing time for the CPU for the duration of the work. In this case the deadline is not short, but it is tight.
With the current policies, the slower the computer, the lower the fraction of time that the computer is on, and the tighter the deadlines for the projects that are attached to that computer, the fewer projects that computer may successfully attach to. In the case of the slowest computers the number may be one, even though there are several which could be run successfully individually.

Design goals

In order to keep the work the computer is running as varied as possible, each computer should be able to attach to as many projects as the user desires if that computer is capable of running each of the projects in isolation. The combination of the work fetch policy and the CPU scheduler should not download too much work for the CPU to complete on time, and should attempt to complete all work that is downloaded on time. Faster computers will be able to keep work from more different projects on hand than slower computers.

Design of the CPU scheduler

The CPU scheduler has two modes, normal and panic. In normal mode, the CPU scheduler uses the current debt calculations to attempt to balance the resource share with the work on hand. For some users with just a few projects and balanced resource shares, they may never leave this mode. In the panic mode, the CPU scheduler processes up the results with the nearest deadlines. It is possible to switch into the panic mode at any time, but the CPU scheduler will finish the current time segment processing the current result. It is only possible to switch out of the panic mode when the CPUs would be rescheduled. Having the CPU scheduler in panic mode is one of the drivers of the work fetch policy.

Design of the work fetch policy

The goals of the work fetch policy are:

not get too much work for the CPU to complete by the deadline.
to honor the resource shares that the user has specified.
to keep an interesting mix of work on the system.

The new work fetch policy limits how much work is on hand, it maintains a debt even if a project does not have work on hand.

The work fetch will always be done in order of highest long term debt. Projects with negative long term debts will not be allowed to connect. This prevents a project with a tight deadline from dominating out of proportion to its resource share. If the user connects to two projects and one of them has a processing time to deadline ratio of 0.6 and the other has a processing time to deadline ratio of 0.1, the project with the deadline ratio above a half would tend to get a 0.6 fraction of the work because the CPU scheduler will occasionally give it several turns out of order to get it done by its deadline. If at that point, that project were allowed to download another work unit, then it would again have to have several turns out of order to meet this deadline as well.

The work fetch policy has several gates in order to prevent downloaded work from overloading the CPU.

The second trigger is to have a tight string of deadlines. Having the CPU scheduler in panic mode for a short deadline will not preclude the downloading of work. If the work unit is due today, but the work otherwise is not in time trouble, there is no reason not to download some more work.

The third trigger is to have the sum of the processing fractions greater than some fraction of the wall time. This gives long term work units a chance to finish slowly instead of all at the end. This will normally be invoked soon enough to prevent the CPU scheduler from entering panic mode because of tight deadines.

Details

Short deadline

Result deadline is less than 24 hours or has already passed. This triggers the CPU scheduler into panic mode.

CPU queue overload

Sort the work units by deadline, earliest first. If at any point in this list, the sum of the remaining processing time is greater than 0.8 * up_frac * time to deadline, the CPU queue is overloaded. This triggers both no work requests and the CPU scheduler into earliest deadline first.

CPU queue fully loaded

Sum the fraction that the remaining processing time is of the time to deadline for each work unit. If this is greater than 0.8 * up_frac, the CPU queue is fully loaded. This triggers no work fetch.

[ Last edited by Youth on 2005-5-11 at 19:54 ]

Youth · 发表于 2005-5-11 19:59:08

4.37版本的BOINC Manager里面已经可以看到新的调度算法在运行了：

11/05/2005 19:06:12||Computer is overcommitted
11/05/2005 19:06:12||Nearly overcommitted.
11/05/2005 19:06:12||New work fetch policy: no work fetch allowed.
11/05/2005 19:06:12||New CPU scheduler policy: earliest deadline first.
11/05/2005 19:06:12||earliest deadline: 1116070997.000000 H1_0972.5__0972.6_0.1_T01_Fin1_1

DF3-CQB · 发表于 2005-5-15 03:58:28

还是不要算太多的项目,一个主要的项目,一个备用项目就可以了,调度也简单些...

bascacaler · 发表于 2005-5-18 20:46:07

谢谢，受教了。

Youth · 发表于 2005-9-27 23:56:08

顶出来这篇文章,提醒新手老手都看看:)

一般建议将Leave applications in memory while preempted设置为yes!

1.可以避免计算时间的浪费,原因见前面的文章
2.可以一定程序避免计算出错,这种情况在同一台机器跑多个BOINC项目情况下容易出现

小岩~stONE · 发表于 2008-5-13 10:52:55

分配资源份额在哪里设置啊？

Youth · 发表于 2008-5-13 11:27:17

各个项目自己的官方网站，登录到个人页面，设置项目参数

Julian_Yuen · 发表于 2008-5-13 12:34:02

如果使用boincstats，也可以在boincstats网站进行设置

		自动登录	找回密码
密码			新注册用户

BOINC 4.x技术更新内容！关于CPU时间在多个项目间的分配

问题！