找回密码
 新注册用户
搜索
楼主: vmzy

[独立平台] [生命科学类] Folding@Home

[复制链接]
 楼主| 发表于 2010-4-25 15:02:16 | 显示全部楼层
April 24, 2010
Prepping for the GPU3 rolling: new client and NVIDIA FAH GPU clients will (in the future) need CUDA 2.2 or later

As we've discussed in previous posts, due to its great computational abilities, our GPU client has had a great scientific impact so far.  In our most recent FAH paper (also see the movie), the GPU clients play a star role in allowing Folding@home to push to unprecedented levels, simulating protein folding on the millisecond timescale in an atomistic model.

We are prepping for the rollout of the next generation GPU client (GPU3).  As mentioned in previous posts, GPU3 will allow for greatly enhanced science (including more accurate models, new science can be done, 2x faster execution of the science, more stable simulations, OpenCL support for run time science optimizations, and greater flexibility for adding new scientific capability).  This is accomplished through the use of the OpenMM GPU library (which originally came from FAH GPU code, but has been significantly enhanced by Simbios staff).

We would like to give donors a heads up of what's coming.  We are doing internal testing now and will do closed beta testing hopefully soon.  With the rollout of the new GPU3/OpenMM-based core (core15) for NVIDIA GPU clients, we will need donors to do two software installs (please note that this is not required to be done immediately, since the new client is not openly available):

1) In order to get WUs using this new core, donors will need to make sure their CUDA level is least CUDA 2.2, but ideally 2.3 or the most recent.  To know which version of CUDA you have, you can find out based on your driver version:

CUDA 2.0: 177.35+
CUDA 2.1: 180.60+
CUDA 2.2: 185.85+
CUDA 2.3: 190.38+
CUDA 3.0/OpenCL: 195.36+

2) A new client will be needed to access GPU3 WUs.  This new client will report the CUDA level to the assignment server, so it can assign around machines with less capable CUDA levels.  Note that "assigning around" the issue means that if your client can't do the work available, it won't be assigned a WU, so it's best to make sure your CUDA drivers are reasonably updated.  We feel this is better than giving a WU which will crash the core, etc.

While the new client has not been openly released yet, we wanted to give this heads up to donors so they have time to upgrade their drivers.  

Thanks to all of the GPU folders.  We have done some great work so far and the best results are yet to come!
大意:
GPU3准备开始公测
GPU客户端为FAH做出了巨大的贡献。具体可参考论文影片

GPU3可以提高计算精度,增强稳定性,同时也更方便代码优化,可以使研究速度提速2倍。主要是因为GPU3使用了OpenMM(由FAH GPU衍生出来的)。

GPU3当前还在内测中,估计不久就会开始公测。在这里我们先‘剧透’一些信息。GPU3/NVIDIA客户端将使用core15计算内核,为此需要大家做好2个准备工作。

1、新内核对CUDA的最小需求是2.2版,不过版本越高越好,所以您最好安装最新版的驱动。下面是版本信息:
CUDA 2.0: 177.35+
CUDA 2.1: 180.60+
CUDA 2.2: 185.85+
CUDA 2.3: 190.38+
CUDA 3.0/OpenCL: 195.36+

2、安装最新版客户端,客户端会向AS(任务分配)服务器报告您当前的CUDA版本,并取得最合适的任务。所以当您的驱动版本较低时,很有可能收不到任务。

因为GPU3还没开始公测,所以您有足够的时间,更新驱动,不用急。

评分

参与人数 1基本分 +30 维基拼图 +10 收起 理由
BiscuiT + 30 + 10

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-4-28 09:09:54 | 显示全部楼层
April 27, 2010
One physical server (and several virtual interfaces) down for maintenance
We needed to take one physical server down, but that takes down several interfaces, including

vsp07, vsp07v, vsp07b, vsp17, vsp17v, vsp22, vsp22v

The machine is fscking now, so it may take a few hours for it to come back.
大意:
挂了一台物理服务器,受此影响下列虚拟服务器也都挂了。
vsp07, vsp07v, vsp07b, vsp17, vsp17v, vsp22, vsp22v
服务器正在进行磁盘检查,还需要几个小时才能恢复正常。

评分

参与人数 2基本分 +30 维基拼图 +5 收起 理由
霊烏路 空 + 5
BiscuiT + 30

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-5-11 09:25:31 | 显示全部楼层
May 10, 2010
Update on WU shortage
We've been working on the WU shortage issue and have some positive items to report.

First, we have greatly improved the AS logic so it uses more information about the CPU.  This information is only available in the v6 client or later, so it is important to upgrade to v6 if you're not getting WUs.  The main jist is that we can now identify directly whether a machine has SSE or SSE2 support directly, so we can better assign to cores that only support SSE or SSE2 (such as the protomol core, which currently only supports SSE2).  This should be a big help to Linux clients as well, which were not well handled by the AS before.

Second, there are a lot of available WUs for Protomol right now, but only for advanced methods clients.  If you would like to try that out, set your client for the "Advanced Methods" setting.  Note that the Protomol team looks to have fixed the checkpoint bug (which has kept this core in the Advanced Methods QA level) and we hope to roll out this core to all of FAH once again, with this issue fixed.

Finally, we have also identified a potential issue with the AS code which might make its logic fail in certain cases.  Basically, in the old days of FAH, we could get away with 32-bit floating point numbers for internal AS calculations, but now with so many servers and all of FAH's complexities, floating point roundoff for certain AS logic could be causing problems.  We will be working on a fix for this, but this is something we must do carefully (not just a global replace of float -> double) and so it will take some time to implement and test this.
大意:
缺粮进展
首先,我们调整了AS代码,现在能根据cpu按需分配了(需要v6版客户端支持,建议大家尽快升级)不会再把sse2任务分给不支持sse2的机器了。以前Linux不支持按需分配,现在支持了。
其次,现在Protomol任务很多,不过都是测试任务。如果你想算这些任务,就请打开"Advanced Methods"设置。注:Protomol开发组好像解决了Protomol存盘点异常的bug(之前就是因为这个原因,把Protomol从正式版降回测试版的),如果一切顺利,我们会尽快将Protomol转回正式版。
最后,我们发现了AS代码的一处潜在隐患。随着fah的发展,32位浮点型(float)貌似不够用了。我们会择机把代码升级至double,不过这个改动太大,我们需要小心行事。

low on jobs for Pentium 2/Pentium 3 CPUs -- and a discussion for why that happens and how you can help
We're low on jobs for machines w/o SSE capabilities.  We are working to fix this.

By the way, I often get asked "how come FAH can get low on jobs?"  This is a good question, considering that since FAH studies temporal phenomena, when one (Work Unit) WU comes in, the work servers automatically build the next one.  So, it should be impossible (o rat least very difficult) to run out of jobs, IF everyone plays by the rules.

But that's not the case.  Many people attempt to "cherry pick" WUs, i.e. they dump WUs until they get one which is most favorable for them points-wise.  This means that they take away WUs from other people, since our server waits until the WU times out before sending it to someone else.  This can take a long time on certain WUs.

We have several schemes implemented to fight cherry picking and keep WUs flowing to all the donors, but some times the cherry picking gets very aggressive and we run out of WUs, like today.  We are looking into addressing this issue short term (getting more jobs going) as well as long term (better solutions to cherry picking problems).  The FFF bonus scheme is such an example of a plan, which seems to be working reasonably well.  We are looking into expanding it more broadly.

However, you can help us help other donors (and keep our research going).  Please do not cherry pick WUs.  This slows down FAHs progress, makes other donors unhappy, and (eg based on FFF schemes) will lead to lower points for those who do this in the future.
大意:
Pentium 2/Pentium 3 CPUs(sse任务)缺粮
有些人经常会问,为什么会缺粮。其实FAH服务器上基本上不缺任务,大部分是因为‘某些人’的行为造成的。他们‘挑食’,遇到‘不合算’的任务(PPD不高)就删除,直到接到合适任务为止。而这些所谓的‘不合算’任务,就会堆积在服务器上,直至过期才会发给其他人。最后就出现了,服务器上有任务,却发不出去的情况。
我们正在需求解决这个问题的办法。目前看来奖励积分是个不错的办法,我们准备大范围推广。
如果你真的想为科研做贡献,请不要‘挑食’了,这样会减慢FAH的研究进度,损害其他人的利益。将来对‘挑食’,我们准备采取积分扣减的惩罚性政策。

评分

参与人数 2基本分 +50 维基拼图 +15 收起 理由
霊烏路 空 + 15
BiscuiT + 50

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-5-11 22:30:22 | 显示全部楼层
Tue May 11, 2010 6:19 am
assignment server problems this evening--fixing now
We have had a set of assignment server problems this afternoon and evening related to an AS code update; first there was a logic error that caused clients not to get any work. The fix for this problem then caused inappropriate assignments (e.g. SMP clients being given non-SMP work units, also machines being given core types they can't handle). I just reverted the assignment server to the older version of the code; we'll debug sometime during daytime hours (when we can have several pairs of eyes on the code). In the meantime, the AS should start behaving normally in a few minutes.

Our apologies for the inconvenience.
大意:
AS服务器故障
昨天升级了AS代码,导致AS乱分配任务,现在已经把代码回滚了。

评分

参与人数 2基本分 +20 维基拼图 +5 收起 理由
霊烏路 空 + 5
BiscuiT + 20

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-5-12 09:09:00 | 显示全部楼层
本帖最后由 vmzy 于 2010-5-12 09:31 编辑

Tue May 11, 2010 6:17 pm
SMP announcements: be sure you have 6.29+ client
We have moved SMP2 from advanced methods to full release. SMP2 and the A3 core will become the workhorse of our multi-core effort as we phase out A1 and A2. At this point, most of our older A1 and A2 projects are being finished (we're getting ready to start migrating bigadv, but the others are mostly done).

One important note to SMP users: ***please be sure you have 6.29 or later clients***
The assignment server will not give SMP2 work to clients older than 6.29, so if you have an older client you may not be able to download SMP work.

On a related note, as we phase out SMP1 we will offer a windows SMP client that doesn't need MPI. This should ease the installation process substantially.We'll release it when it's ready.  Hopefully soon, but we don't like to give target dates. The client has been built and is entering early testing.
大意:
请将SMP客户端升级至6.29+版
我们已经把SMP2由测试版升级至正式版。今后任务主要以A3为主。目前A1和A2基本上都算完了,不过bigadv还有一些,而且bigadv目前还没移植至A3内核。
再次提醒大家,当前AS已经不支持6.29以下版本的SMP客户端了,所以请您尽快升级。
另外,当SMP1的任务完全处理完后,我们将会发布一个不需要安装MPI的SMP2 windows客户端。这将有效降低客户端安装的复杂度。目前新版客户端已经写好,正在内测,等测试结束我们会尽快发布该客户端。

译者注:因为客户端已经转正。为了稳定起见,建议大家去掉advanced methods参数。
终于盼到不需要安装MPI的SMP客户端了,这样SMP推广起来就更方便了吧。希望新版本可以添加为服务(目前的版本因为MPI库的原因很难加为服务),这样就可以后台自己慢慢跑了,哈哈。我想大部分FAHer,都应该很不喜欢任务栏里多个碍事的命令行窗口吧。等新客户端发布了,我就把所有cpu全转到FAH了。

评分

参与人数 2基本分 +60 维基拼图 +10 收起 理由
霊烏路 空 + 10
BiscuiT + 60

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-5-18 09:59:34 | 显示全部楼层
Mon May 17, 2010 10:56 pm
1a. Must use the same passkey must be used both when downloading a WU and uploading the result. I you change passkeys, the bonus factor = 1.

1b. You may use more than one UserName with the same passkey, but each one must meet the >= 10 WU and >80% qualifications separately.
大意:
SMP2奖励积分策略微调:
1a、下载上传时必须使用相同的passkey,否则该任务没有奖励积分。
1b、如果一个passkey下挂了不同的用户,那么>= 10 WU 和 >80%‘良品率’将对每个用户分开统计。

译者注:这些微调,貌似都是对‘作弊者’加的,应该对正常用户不产生任何负面影响。

评分

参与人数 1基本分 +30 收起 理由
BiscuiT + 30

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-5-23 14:27:34 | 显示全部楼层
官方前几天,偷偷把b4转正了,据说修正了客户端重启任务出错问题,不过就现实情况而言,适得其反。以前出错率在10%,现在基本上提升至90%了。杯具啊。已经向官方报故障了,希望只是我一个人的人品问题。
现在本人所有机器几乎全挂了,只要一重启客户端,任务就出错报废。
回复

使用道具 举报

发表于 2010-5-23 23:49:43 | 显示全部楼层
刚刚开始B4了
就在AMD XⅡ240上(好像是这么写吧)
看看咱RP如何^_^
回复

使用道具 举报

 楼主| 发表于 2010-5-25 11:32:23 | 显示全部楼层
Tue May 25, 2010 6:01 am
Open beta release of the GPU3 core
We have a new GPU core (core 15) going into an open beta test for NVIDIA clients. This core requires a new client (see below) as well as the latest drivers (197.45). This core is the first run of the GPU3 technology, derived from the OpenMM project at Stanford (http://simtk.org/home/openmm). You can find more information in our GPU3 FAQ (see url below).

While this release is for NVIDIA only to start, we are actively pushing ATI support (with the help of AMD/ATI), although we have no ETA at the moment. However, please do not use this client with an ATI GPU at the moment.

This is the first open beta test of this new client and core, so there are likely bugs to be found as more donors try this out on more diverse sets of hardware. Also, the documentation (GPU3 FAQ) is new too and there are possibly some errors there too. However, the client has been QA'd both internally at Stanford and with our closed group of beta testers and is looking pretty good so far.

Please post bugs or issues in the GPU section of this forum. Some testers in the closed beta test have found problems with 8800 and 9800 class GPUs (we are working on this).


NVIDIA Client download:
SYSTRAY: http://www.stanford.edu/~friedrim/.Folding@home-systray-632.msi (md5sum=effd87ba12c96be28e252bccbe776ff9)
VISTA CONSOLE: http://www.stanford.edu/~friedri ... 2-GPU_Vista-631.zip (md5sum=b41301886881958c64c1907b3ed6acae)
XP CONSOLE: http://www.stanford.edu/~friedri ... in32-GPU_XP-631.zip (md5sum=885e36a477d247487f8009335bd4e3cc)

GPU3 FAQ:
http://folding.stanford.edu/English/FAQ-NVIDIA-GPU3
大意:
GPU3(core 15)开始公测
GPU3需要安装新客户端(下载地址见上面英文内容),暂时仅支持NVIDIA,驱动不得低于197.45。请不要在ATI上跑这个客户端。
目前已知问题:8800 和 9800 系 GPUs有问题。我们正在努力解决中。

译者注:农场主们,准备筹钱买4XX卡吧。求淘汰显卡的兄弟们,记得常来论坛看看,晚了就抢不到了,呵呵。

评分

参与人数 2基本分 +30 维基拼图 +8 收起 理由
霊烏路 空 + 8
BiscuiT + 30

查看全部评分

回复

使用道具 举报

发表于 2010-5-25 22:55:01 | 显示全部楼层
B4算一个115分的包花了近20小时
关注GPU3
GF295们,向我开炮~~^_^
回复

使用道具 举报

 楼主| 发表于 2010-6-1 10:15:23 | 显示全部楼层
May 31, 2010
GPU3 (NVIDIA) open beta test update
The GPU3 open beta test for NVIDIA GPUs is going well.  There have been a few issues uncovered and our team is working on it.  However, there haven't been any major show stoppers so far, which is good news.

Here's what lies ahead.  We will continue QA in an open beta format for a little while longer until we can resolve the remaining most significant issues.  Then, the new client will replace the existing GPU client.  However, the science/WS switch over will take much longer.  In general for FAH, science calculations that were started with one core (eg core11) will need to be completed with it.  New NVIDIA GPU projects will start up with core15 (although there may be a few projects already in the pipeline that will use still core11), but this switchover to using only core15 could take a while, easily 3 to 6 months, or maybe longer depending on how long we need to complete the existing core11 projects.

Since Fermi boards must run core15, we are prioritizing core15 assigns to Fermi (with only a few core15 projects, we would run out of Fermi work unless we do this).  So, donors without Fermi cards will likely see mostly core11 WUs short term.  This will change as more core15 project come on line in the coming months.

However, there is still a major benefit for GPU donors to run the new client.  As we switch over, more and more WUs will be running in core15, so running the old client would eventually lead to WU shortages, etc.  There is no need to switch over immediately as there will be plenty of core11 WUs for quite a while (eg on the weeks to months timescale).

We are also making a major push for OpenCL on ATI and NVIDIA.  We are working closely with NVIDIA and ATI on this and together we are making progress, although this new core does still seem to be some time out.
大意:
GPU3 (NVIDIA)测试还算顺利。没有发现大的问题。
不过GPU3要测试相当长一段时间,因为,首先,我们还有一些问题没有解决。其次,我们还有很多GPU2任务没有完成,我们要算完他们才能停掉GPU2,全面转至GPU3。估计最快需要3到6个月时间。
因为Fermi只能用GPU3,所以AS会优先给Fermi显卡core15任务(目前core15任务很少,所以我们不得不这样做)。故,短期内非Fermi客户端接到core11几率大些。等core15任务多了,这个情况就会得以改善。
当前我们正在和ATI 与 NVIDIA密切合作,改进客户端对ATI 与 NVIDIA的OpenCL支持情况。不过我们还不确定新客户端什么时候可以出炉。

评分

参与人数 2基本分 +50 维基拼图 +10 收起 理由
霊烏路 空 + 10
BiscuiT + 50

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-6-13 09:48:21 | 显示全部楼层
June 12, 2010
SMP server down, leads to SMP WU shortage
One key SMP server is down (vspg9), which brings down all of its associated interfaces (vspg9a, vspg9b).  This is making us very short on SMP WUs.  We are actively working on this one, although our IT staff has told me that this one isn't an easy fix (multiple restarts haven't brought the key RAID back and they are working with the hardware vendor to see what's going on).

I will post updates as we get them.
大意:
SMP服务器挂了
一台SMP物理主服务器挂了,这导致我们的SMP任务奇缺,我们已经在整它了,不过计算机维护人员告诉我们不好修(虽然重启了很多次,但是RAID还是没有恢复,他们已经联系硬件厂家解决该问题了)

评分

参与人数 2基本分 +30 维基拼图 +5 收起 理由
霊烏路 空 + 5
BiscuiT + 30

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-6-21 10:02:45 | 显示全部楼层
Wed Jun 16, 2010 6:59 pm
A3 linux core v2.22 released
Sun Jun 20, 2010 5:29 pm
A3 windows core v2.22 released.
回复

使用道具 举报

 楼主| 发表于 2010-6-25 10:09:32 | 显示全部楼层
Thu Jun 24, 2010 9:43 pm
publication of FAH results on membrane fusion and influenza

We are happy to announce the publication of some of our FAH scientific results:

"Atomic-Resolution Simulations Predict a Transition State for Vesicle Fusion Defined by Contact of a Few Lipid Tails"

This paper describes work on the mechanism of vesicle fusion, a process involved in viral infection, the transmission of nerve impulses, and cellular secretion.
FAH project 2681 directly contributed to this work; we are also following up several other avenues.

http://www.ploscompbiol.org/arti ... ournal.pcbi.1000829

A summary follows below:
Membrane fusion is a common underlying process critical to neurotransmitter release, cellular trafficking, and infection by many viruses. Proteins have been identified that catalyze fusion, and mutations to these proteins have yielded important information on how fusion occurs. However, the precise mechanism by which membrane fusion begins is the subject of active investigation. We have used atomic-resolution simulations to model the process of vesicle fusion and to identify a transition state for the formation of an initial fusion stalk. Doing so required substantial technical advances in combining high-performance simulation and distributed computing to analyze the transition state of a complex reaction in a large system. The transition state we identify in our simulations involves specific structural changes by a few lipid molecules. We also simulate fusion peptides from influenza hemagglutinin and show that they promote the same structural changes as are required for fusion in our model. We therefore hypothesize that these changes to individual lipid molecules may explain a portion of the catalytic activity of fusion proteins such as influenza hemagglutinin.
大意:
公布膜融合与流感的研究结果

评分

参与人数 2基本分 +20 维基拼图 +2 收起 理由
霊烏路 空 + 2
BiscuiT + 20

查看全部评分

回复

使用道具 举报

 楼主| 发表于 2010-6-26 11:09:29 | 显示全部楼层
June 25, 2010
Update on the v7 client
Here's an update on our v7 client efforts.  The v7 client is a complete re-write in order to make the client more reliable, simple to maintain, and easier to include features that donors have requested.  The first versions will not have everything that donors have asked for, but there will be some significant changes, such as the integration of classic, SMP, and GPU clients, the ability for a single client to handle all of these (eg multi-core + GPU) simultaneously (via multiple cores), a cleaner and more reliable GUI, much better and sophisticated tools for 3rd party developers, and in time the ability to use the FAH client to manage multiple machines easily.  

The console version of the v7 client has gone through internal testing and we're starting a very limited alpha release.  There are definitely some rough edges, but I'm excited to see it get this far.  Assuming there are no show stoppers, I hope that we'll have something ready for open beta testing in a month or two, maybe sooner.
大意:
v7客户端消息
v7客户端包括的新功能有:单核、smp、gpu客户端合体(通过兼容不同的core实现),更稳定的图形界面,第三方程序接口,方便的对多个客户端进行管理。
命令行版已经通过内测了。如果一切顺利的话,将于1、2个月内开始公测。

评分

参与人数 2基本分 +30 维基拼图 +8 收起 理由
霊烏路 空 + 8
BiscuiT + 30

查看全部评分

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 新注册用户

本版积分规则

论坛官方淘宝店开业啦~

Archiver|手机版|小黑屋|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2024-9-25 21:00

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表