楼主: vmzy

[独立平台] [生命科学类] Folding@Home

 楼主| 发表于 2007-3-23 16:06:10 | 显示全部楼层
3/22/2007 Stanford School of Medicine Network Outtage
The main net for the school of medicine (and much of Folding@home) went down at about 8am pacific time. For now, stats and the assignment servers are down. Many of the data servers are up, as they are spread out on other networks. Stanford IT is working to get it up ASAP. We will post more news as we hear it.

3/22/2007 UPDATE: network back
The network is back up and running.



参与人数 1基本分 +10 维基拼图 +3 收起 理由
BiscuiT + 10 + 3



使用道具 举报

 楼主| 发表于 2007-3-24 22:25:57 | 显示全部楼层
3/23/2008 PS3 launch early results
With the PS3 launch, we've added a lot of CPU power to Folding@Home, with the total FLOPS now greatly increased an on its way to a Petaflop. Also, we've gotten a lot of crossover interest in the other Folding@Home clients (Win, Lin, OSX; SMP; and GPU), which is also wonderful.

Finally, due to all the interest, our web pages are getting hit very severely. We have been working to improve this, especially with the understanding that more PS3's could be on the way. So far, the performance is still pretty snappy, but we've made changes to make the process run more smoothly. First, we now allow the stats update script exclusive access to the db during updates. This speeds updates, but limits what donors can do during the update period to see their stats. We have also done several changes to improve caching of the stats data to improve overall performance. We are also working to get additional hardware to help out.

With these changes, we should be ready for a lot more clients!





参与人数 1基本分 +20 维基拼图 +10 收起 理由
BiscuiT + 20 + 10



使用道具 举报

 楼主| 发表于 2007-3-26 10:25:12 | 显示全部楼层
3/25/2007 Working towards a petaflop
With the addition of more PS3 clients, we're working our way up towards a petaflop. The performance of the project depends on machines being left on running Folding@Home. There was a performance drop as certain machines started taking longer to do work units (most likely since these machines may not be running Folding@Home 24/7, naturally). This drop is expected as we move from the launch date (when people are running FAH in extended periods) and into a more steady-state set of numbers for the PS3 performance. We are also looking into different ways to evaluate FLOPS, as there are different pros and cons of our current method. As reaching a petaflop is an important milestone, we want to make sure that we use methods which allow our flop count to be directly comparable to others cited.



参与人数 1基本分 +20 维基拼图 +8 收起 理由
BiscuiT + 20 + 8



使用道具 举报

 楼主| 发表于 2007-3-27 21:29:41 | 显示全部楼层
3/26/2007 Update on flops count
We have been looking into the flops count and its large variations and have found one more issue. The initial stats were based off the average we had seen during testing (yielding approximately 25 GFLOPs for a single PS3). However, the pre-launch testing period used big proteins which will result in higher GFLOP utilization. When we went live, we started our initial post-launch phase with small proteins to test the scientific validity; these smaller proteins have more overhead (since they spend less time calculating the force -- which is highly optimized) and thus the GFLOPS are lower now. As we switch back to the larger proteins, we expect to see an increase in the FLOPS per machine, and thus the overall FLOP count will change dramatically. We stress that there is a wide variation in FLOPS we can get (easily a factor of 3x) and so we expect the number to vary widely until we reach some steady state average.



参与人数 1基本分 +20 维基拼图 +5 收起 理由
BiscuiT + 20 + 5



使用道具 举报

 楼主| 发表于 2007-3-28 17:02:50 | 显示全部楼层
3/27/2007Two Million CPUs have returned work
We have just passed the 2,000,000 CPU mark -- 2M CPUs have at some time contributed to FAH. Right now, over 200,000 CPUs are actively returning work. With the addition of PS3 donors, Folding@Home is the most powerful distributed computing resource on the planet, and for the calculations we run (parallel independent molecular dynamics trajectories), the most powerful supercomputer of any type (distributed or otherwise).



[ 本帖最后由 vmzy 于 2007-3-28 17:09 编辑 ]


参与人数 1基本分 +10 维基拼图 +3 收起 理由
BiscuiT + 10 + 3



使用道具 举报

 楼主| 发表于 2007-3-28 17:07:32 | 显示全部楼层
General update
We've had a pretty busy few days. Unfortunately, on the day of the PS3 launch, the network went down throughout the whole Stanford Medical School, taking FAH servers off the internet. That unfortuantely lead to a major outtage last thursday, but that was resolved and now FAH is running smoothly. The PS3 machines are isolated from the rest of FAH in that they have their own AS and data servers and all seems to be running smoothly there.

The main issue over the last few days has been slow stats and web access. We've rewritten how the stats work to improve performance. There are a couple of new changes:

1) We've updated how the daily_*_summary.txt files get updated and we can now update them more frequently than before the PS3 launch (we are now updating them every 3 hours instead of 6). note that our bandwidth scripts check IPs which download these and other files too often, so to avoid getting caught by that script, keep the downloads of each of these files to under 10 per day. Since there are only 24/3 = 8 updates per day, this should hopefully not be a problem.

2) We've instituted a new policy where we update the stats db every hour with new WU's, but turn off the cgi web pages to read the stats during the update. This will avoid some of the very long updates seen in the past. The main downside is that the stats are down every hour for about 10 minutes (roughly from the 10 to 20 minute period in each hour). We are considering ways to improve this, including updating the stats every 2 hours (leading to less down time).

So, with the new changes, it looks like FAH is back to running smoothly. The PS3 clients bring a great new capability to our scientific research and so we're excited about what we'll be able to do now. It's important for us to stress that the other clients still play a key role, as the PS3 client (like the GPU clients) are limited in what they can do (although what they do do, they do fast). In particular, we are getting wonderful results and throughput from the SMP client and we expect that to play a very important role for years to come.



参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5



使用道具 举报

 楼主| 发表于 2007-3-29 14:27:47 | 显示全部楼层
28 Mar, 2007
Catalyst 7.3 compatible with GPU client (w/caveats) [WinXP]

The new Catalyst drivers released 2007/03/28, version 7.3 (8.351) appear to work successfully with the GPU client.

The console client will work as expected with the new drivers.

On some single GPU systems you will have to force the card detection by using the -gpu 0 flag.

The GUI client appears to have a few problems (similar to those found with Cat 7.2):

You cannot start the GUI GPU client with the GUI window open, as the GPU will fail to initialise correctly.

Opening the GUI window in the GUI GPU client (after it has successfully started) will cause the client to pause folding on the GPU. Closing the GUI window will resume folding on the GPU.

Neither of the above cases cause the client to crash, or trash WUs, but they appear to run the WU in software mode only resulting in extremely long frame times, and these WUs will certainly expire if left processing on the CPU. It is recommended that when you start the GPU client (console or GUI versions) you watch the temperatures and current draws using ATItool or ATI Tray Tools to confirm that processing has actually begun on the GPU itself.

In order to guarantee that the GPU will initialise, the GUI window should be closed when starting the client.

At present the cause of these problems is not known, and anyone with more information is requested to post with their experiences.

Note, there appears to be a small folding performance hit compared to previous working versions. It equates to roughly a 2% increase in frame time/drop in PPD.

The following versions of Catalyst are known to work with the GPU client:
  • 6.5
  • 6.10
  • 6.11
  • 7.2
  • 7.3
The GPU client will work with these drivers with a significant performance hit:
  • 6.6
  • 6.7
The following drivers do not work at all with the GPU client:
  • 6.8
  • 6.9
  • 6.12
  • 7.1

Tested with X1900XT 256 on Windows XP 32bit

For more information see this thread:
Catalyst 7.3 is up



参与人数 1基本分 +20 维基拼图 +6 收起 理由
BiscuiT + 20 + 6



使用道具 举报

 楼主| 发表于 2007-3-30 22:23:01 | 显示全部楼层
3/29/2008Major stats overhaul: Monday April 2
We are going to take the stats down for several hours at 10am pacific time on Monday April 2 (this coming Monday). We need to make updates to the stats system for v6 and test that these updates are working. When we go back on-line, we will hopefully have the upgdraded stats working and would then be ready to launch v6. We are keeping a backup of the stats, such that we can at any time revert to the old stats system if there are any bugs in the code. So, the stats data is very much safe during this transition, but there may be some unforseen problems (it's hard to predict those). This is actually mostly unrelated to all the stats work done last week to improve performance for the PS3.



参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5



使用道具 举报

 楼主| 发表于 2007-4-19 10:35:21 | 显示全部楼层
18 Apr, 2007
Catalyst 7.4 compatible with GPU client

The new Catalyst drivers released 2007/04/18, version 7.4 (8.36) appear to work successfully with the GPU client.

The console client will work as expected with the new drivers.

On some single GPU systems you will have to force the card detection by using the -gpu 0 flag.

The GUI client appears to have a few problems (similar to those found with Cat 7.2 and 7.3):

You cannot start the GUI GPU client with the GUI window open, as the GPU will fail to initialise correctly.

Opening the GUI window in the GUI GPU client (after it has successfully started) will cause the client to pause folding on the GPU. Closing the GUI window will resume folding on the GPU.

Neither of the above cases cause the client to crash, or trash WUs, but they appear to run the WU in software mode only resulting in extremely long frame times, and these WUs will certainly expire if left processing on the CPU. It is recommended that when you start the GPU client (console or GUI versions) you watch the temperatures and current draws using ATItool or ATI Tray Tools to confirm that processing has actually begun on the GPU itself.

In order to guarantee that the GPU will initialise, the GUI window should be closed when starting the client.

At present the cause of these problems is not known, and anyone with more information is requested to post with their experiences.

Note, there appears to be a small folding performance hit compared to catalyst 7.2 No performance difference from 7.3. It equates to roughly a 2% increase in frame time/drop in PPD.

The following versions of Catalyst are known to work with the GPU client:
  • 6.5
  • 6.10
  • 6.11
  • 7.2
  • 7.3
  • 7.4
The GPU client will work with these drivers with a significant performance hit:
  • 6.6
  • 6.7
The following drivers do not work at all with the GPU client:
  • 6.8
  • 6.9
  • 6.12
  • 7.1

Tested with X1900XT 256 on Windows XP 32bit

单显卡的命令行客户端用户如果检测不到显卡,请使用“-gpu 0”参数。


参与人数 1基本分 +20 维基拼图 +10 收起 理由
BiscuiT + 20 + 10



使用道具 举报

 楼主| 发表于 2007-4-20 17:24:35 | 显示全部楼层
13 Apr, 2007
Strange hardware failure on vspg machines

We are seeing a strange hardware failure for all of the vspg machines. The fact that all the machines are suddenly behaving strangely suggests that it's not the machines themselves, but the networking. We are looking into this right now. Unfortunatley, it's 10pm pacific time and the Stanford networking support is out for the night, so this will likely go unresolved until the morning.

19 Apr, 2007
Update on server status

Here's an update. We have been working on servers the last few days. The reset due to the power outtage caused problems with 2 servers, which didn't want to come back up after they were shut down.

We've gotten one of the big ones back up (, but we're still working on another ( The latter one is giving some strange results and so we don't have an ETA on it.

Finally, we've found a problem with the collection server and it's now running again, although there may be some problems with the CS taking WU's from We are looking into that.



参与人数 1基本分 +10 维基拼图 +3 收起 理由
BiscuiT + 10 + 3



使用道具 举报

 楼主| 发表于 2007-4-28 10:44:16 | 显示全部楼层
26 Apr, 2007
Collection server off line
We have taken the collection server off line while we work on a major upgrade/overhaul for it. Many WU's were being rejected by the CS and we now know why. We're working on a fix, which will require a mix of hardware and software updates.

Stanford IT find network issue
Stanford IT has sent this to me just now. We'll keep you updated. It's almost 5pm pacific time, so it may be a rough night.

We are currently experiencing a network outage that is affecting a number of buildings and services on campus. Engineers are working on the problem, there is no estimate for repair at this time.

itss-service-alerts@lists is an internal list to notify ITSS staff about service outages and updates. The Incident Report form is available on the web at <http://itss-incident-report.stanford.edu/>.


[ 本帖最后由 vmzy 于 2007-4-28 10:45 编辑 ]


参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5



使用道具 举报

 楼主| 发表于 2007-5-4 00:25:06 | 显示全部楼层
Network upgrades at Stanford
Stanford is upgrading the network to one of Folding@home's data centers to try to help make the system more redundant and to get us on Stanford's new 10Gig backbone. However, this will lead to a network outtage on some of our machines (those in the 171.64.65.xx subnet) on two occaisons. Tomorrow (Tues) at 5am pacific time there will be some work, but likely not an outtage. On Thursday at 5am pacific time, there will be an outtage for about 1 hour.



参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5



使用道具 举报

 楼主| 发表于 2007-5-5 00:06:14 | 显示全部楼层
Server and server room upgrades
We've got a series of new servers that will double the server capability of FAH. We've had the hardware for almost two weeks, but we're still waiting on the University to get the networking completed to our new server room. We've been told that this will be completed by the end of next week, so we're likely still 2 weeks out from getting these new servers up and going. We'll give an update as we get more info.



参与人数 1基本分 +10 维基拼图 +6 收起 理由
BiscuiT + 10 + 6



使用道具 举报

 楼主| 发表于 2007-5-7 22:07:45 | 显示全部楼层
Update on server and server room upgrades
Looks like we're on schedule to get the new servers on line by the end of the week, which means the new WU's can go out on Monday or Tuesday the week after that, if all goes well. These new machines will basically double the usable storage of FAH, adding almost an extra 150TB and adding 56 cores to the mix.



参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5



使用道具 举报

 楼主| 发表于 2007-5-21 10:16:09 | 显示全部楼层
Update on collection server and server & server room upgrades
Since people have been curious, I thought it would be important to comment on these issues in a very visable place (I've addressed this our forum, but it's better to make a posting in a more visable place like this). The main issue with the CS right now is that with all the FAH WU's, the current CS hardware needs to be upgraded to handle it. We have a plan to handle it. Short term, we have put only certain machines on the CS to get it at least partially working (better than not working at all). We have been waiting for new hardware to make the real long term solution: new and more hardware. The new hardware will be much more beefy to handle the issue. Also, we will move to having multiple collection servers, which will also lessen the load and the requirements of each individual CS. The new machines are here and finally networked (as of Friday). Our sysadmins need to install the OS on the machines and we should be ready to roll.

[ 本帖最后由 vmzy 于 2007-5-21 10:17 编辑 ]


参与人数 1基本分 +10 维基拼图 +5 收起 理由
BiscuiT + 10 + 5



使用道具 举报

您需要登录后才可以回帖 登录 | 新注册用户



Archiver|手机版|小黑屋|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2024-7-27 08:06

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表