[独立平台] [生命科学类] Folding@Home

dis · 发表于 2009-1-30 10:00:57

January 29, 2009
New paper #62: The predicted structure of the headpiece of the Huntingtin protein and its implications for Huntington's Disease

It's still early (since this paper was just accepted), but I wanted to give FAH donors a heads up on our work on Huntington's Disease aggregation, which is just about to come out in the Journal of Molecular Biology. I'll comment on it more in a future post.

See our papers page for more details.
大意：
发表有关Huntington蛋白质的初步研究结果。

vmzy · 发表于 2009-2-19 12:49:53

February 18, 2009

New paper #63: Accelerating Molecular Dynamic Simulation on Graphics Processing Units
We're happy to announce a new paper (#63 at http://folding.stanford.edu/English/Papers). This paper describes the code behind the Folding@home GPU clients, detailing how they work, how we achieved such a significant speed up on GPUs, and other implementation details.
For those curious about the technical details, I've pasted our technical abstract below:

ABSTRACT. We describe a complete implementation of all-atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core.

Also, this software is now available for general use (for scientific research outside of FAH). Please go to http://simtk.org/home/openmm for more details.
大意：
发表与利用显卡进行分子模拟有关的文章。

Folding@home passes the 5 petaflop mark
Based on our FLOP estimate (see http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats), Folding@home has passed the 5 petaflop mark recently. To put it in context, traditional supercomputers have just broken the 1 petaflop mark, and even that level of performance is very challenging to aggregate. The use of GPU's and Cell processors is has been key to this, and in fact the NVIDIA numbers alone have just passed 2 petaflops.

Thanks to all who have contributed and we look forward to the next major milestones to be crossed!

大意：
计算速度超过5P（当今最快的超级计算机才刚到1P），其中NVIDIA 就超过了2P。

[ 本帖最后由 vmzy 于 2009-2-19 12:51 编辑 ]

vmzy · 发表于 2009-3-6 11:08:31

March 05, 2009
New donation page

In addition to those donating computer time, we have been very grateful to people who have also donated funds to the Folding@home project. These funds have been important in purchasing equipment (eg servers) or services (eg professional programmers) that are traditionally difficult to purchase from grants from NIH or NSF.

In coordination with Stanford's development office, Stanford has made a special donation page just for Folding@home:

http://pgnet.stanford.edu/goto/foldinggift

This new web page makes donating funds to Folding@home a lot easier, since one does not have to do input special codes to make sure that the funds are directed to Folding@home.

Stanford is a 5013c non-profit entity and thus a donation of money to Folding@home is tax deductible. Stanford's Federal Tax ID number is 94-1156365. Also, many companies help individuals donate to Stanford by providing matching funds.

You can learn more details at our Donation page:

http://folding.stanford.edu/English/Donate
大意：
官方更新项目捐款页面。

vmzy · 发表于 2009-3-19 11:04:00

March 18, 2009
FLOPS

There has been much interest in our stats page (osstats) detailing different OS's and the FLOPS they produce:

http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats

We've been trying to come up with a way to standardize these numbers so that they can be more easily compared to each other. That has resulted in a new FAQ:

http://folding.stanford.edu/English/FAQ-flops

We also plan on updating the osstats page to include both the Native FLOPS (which is on there now) and the more common x86 FLOPS, which allows for a more "apples to apples" comparison of FLOPS.
大意：
官方将调整积分策略。也把积分统计页面修改了下。

vmzy · 发表于 2009-3-21 21:34:28

March 20, 2009
Change to stats updates to greatly increase update speed
In order to speed the stats updates, we've suspended the updates for the project WU counts.  This does not affect WU counts or points, just the stats pages which tells donors how many WUs they have contributed to each project.

We have polled donors in the past regarding this issue, and there was overwhelming support for this change, especially if it could significantly increase stats updates speeds.  After doing some in house tests, we expect that this will greatly enhance stats updates speeds, so we have rolled this out.

Even if this is the main culprit for the slow stats updates, we may put this back in time, once we work out a better scheme for how to update this information and/or get hardware better suited for it.  The main issue here is that this is a lot of information, with data going back many, many years, and it has become unwieldy.

For now, this should lead to a pretty dramatic difference in stats update times.  It looks like we can now get it done in about 20 minutes, rather than the ~2 hours it was taking previously, for 3 hours worth of stats accumulated.

If this looks good, we will likely switch stats updates to occur more frequently in the future, starting with every 2 hours.
大意：
修改统计代码，现在统计耗时由2小时降至20分钟。

[ 本帖最后由 vmzy 于 2009-4-29 10:00 编辑 ]

vmzy · 发表于 2009-4-29 10:00:49

April 28, 2009
NVIDIA GPU memory checker
One of the members of the Folding@home team, Imran Haque, has developed a memory tester for GPUs.  You can get a copy of it here:

https://simtk.org/home/memtest/

and here's a brief description:

MemtestG80 is a software-based tester to test for "soft errors" in GPU memory or logic for NVIDIA CUDA-enabled GPUs. It uses a variety of proven test patterns (some custom and some based on Memtest86) to verify the correct operation of GPU memory and logic. It is a useful tool to ensure that given GPUs do not produce "silent errors" which may corrupt the results of a computation without triggering an overt error.

Basically, the idea is that we wanted to put out a code to test GPU memory that's roughly equivalent to Memtest on CPUs.  If you run FAH heavily on a GPU, it's a good idea to check out your GPU memory, just as one would run tests on CPU memory.

For now, this is being hosted on Simtk.org, the scientific software repository at Stanford, but we will likely move a copy to the FAH download page in time.  If you're having any problems with the GPU MemtestG80 software, you can leave a bug report on the Simtk.org page.
大意：
N卡的显存稳定性测试工具。推荐大家测试使用。

vmzy · 发表于 2009-5-6 10:03:38

May 05, 2009
NVIDIA GPU memory checker -- Update

One of the members of the Folding@home team, Imran Haque, has developed a memory tester for GPUs. You can find my original post on it here.
We've now moved it to its own download page and included some more text describing how to interpret the results:

If MemtestG80 detects memory errors on your GPU, we recommend taking the following steps:

If your card generates errors and is overclocked (this includes "factory overclocked" or "superclocked" cards - anything with higher-than-reference clock speeds), reset the clock frequencies to the NVIDIA reference frequencies and see if the problem persists.
This is especially true for the memory clock. Errors in the Logic or Random Blocks tests are likely to be at least somewhat sensitive to the shader clock as well. The upshot of this guideline is that if your overclock is generating any errors above the stock frequencies, then it's not a stable overclock.
If after this your card still generates errors in any test OTHER than the Modulo-20 test, these errors are likely indicative of a card that's gone bad somehow. Such a card ought to be replaced.

大意：
NVIDIA显卡内存检测程序更新，并放到下载页面了。

shouldbe · 发表于 2009-5-6 12:41:10

跑FAH超核心超shader但不超显存。这玩意用处不大。

vmzy · 发表于 2009-5-22 10:43:45

May 21, 2009
Issues with GPU AS migration
We have had some issues with the GPU assignment server (AS) migration to new hardware. Several issues have been resolved, including

- issues with ATI GPU client work assignments
- some work servers without work

But there are two remaining issues which we have filed tickets with Stanford IT, including
- port 80 forwarding for the GPU AS
- allowing assignments to 171.64.122.70

Of these remaining issues, the port 80 issue is a big issue for those who have port 8080 blocked and we've been pushing IT to get this resolved ASAP. The second issue (going to 171.64.122.70) is less time critical (that server can still receive completed WUs from clients), but we would like to get more GPU work servers on line in general.
大意：
近期硬件升级，对GPU的任务分配服务器产生了影响，我们正在努力解决问题中。看来有必要添加GPU服务器了。

Bismarck · 发表于 2009-5-22 14:26:44

怪不得最近一到晚上老是接不到包

vmzy · 发表于 2009-5-26 09:57:50

May 25, 2009
Network issues arising

We are seeing the network getting worse right now, especially for the GPU assignment server (eg traceroute to it and other machines on related networks are failing). We have tickets out to IT support to deal with this issue. If this cannot be resolved in a timely manner (24-36 hours) by the IT dept, we will start taking more aggressive measures ourselves.
大意：
网络不稳定，尤其是GPU服务器。我们已经找计算机部门着手解决此事了，如果他们不能及时解决，我们会自己想办法解决。

vmzy · 发表于 2009-6-16 21:55:19

June 16, 2009
GPU server issues
We've had a rough night with GPU servers. One has been down hard over the day yesterday (it crashed hard and now can't find its / partition -- the admins are attempting a rescue disk fsck this morning). Two more went down last night (PST) due to the heavy load, but those were easy to get back up (they are up now).

We are stretched a bit thin as we are implementing the new server infrastructure in parallel with the old one. The upshot is that once the new one has been deployed, we will have much more functional collection servers (CS's) and also get work servers (WS's) that should not need to be restarted nearly as frequently when under heavy load.

We are beginning the roll out of the new WS (v5) code this week onto GPU servers, although these issues have slowed us down a bit.

大意：
GPU服务器出问题了，昨晚挂了一台服务器。我们将会尽快增加新的服务器。本周我们将把新的脚本部署到GPU服务器上。

qiyuwanjia · 发表于 2009-6-16 22:16:15

也许滤*霸更新也需要连接STU的服务器吧

vmzy · 发表于 2009-6-18 13:29:30

June 17, 2009

How does FAH code development and sysadmin get done?One of the more common question I get asked is how we do our client/server/core programming and backend system administration.  Also, others were curious about updates on various core projects.  So, I thought it made sense to answer both in one post, since the answers are related.  This will be a bit of a long answer to several short questions, but hopefully it will help give some insight into how we do what we do.
First, some history.  When we started in 2001, I personally wrote most of the code (client, server, scientific code integration, etc), with some help from a summer student (Dr. Jarod Chapman) and some help from Adam Beberg on general distributed computing issues and the use of his Cosm networking library.  I was just starting out as a professor then, with a relatively small group (4 people at the time), so it was common for the leader of the lab to do a lot of hands on work.

As time went on, the group matured and grew, with increasing funding from NIH and NSF.  This allowed the group to grow to about 10 people in 2005.  At this point, much of the duties were given to different people in the lab: the server code development was performed by (now Prof.) Young Min Rhee and then later by Dr. Guha Jayachandran.  Client development was done by Siraj Khaliq, then Guha, then help from several people (including Adam Beberg as well as volunteers, such as Uncle Fungus).  Core development was done by Dr. Rhee, (now Prof.) Michael Shirts, and others.

This model worked reasonably well, with each team member giving some significant, but not overly onerous amount of his/her time (eg 10% to 20%) to FAH development.  These key developers were able to add a lot of functionality, both to aid the science and the donor experience.

However, in time, this model became unscalable and unsustainable.  As time went on, the individual developers graduated (in academic research, the research is done by graduate students or postdoctoral scholars, both of whom do not stay longer than say 3-5 years).  While the original team was able to build a powerful and complex system, maintaining that system by new generations of students/postdocs became unsustainable.  The code was complex and well known by the original authors, but maintenance by new developers was complex and easy to make errors, due to the complexity of the software.

In parallel with these efforts in code development, we also were maturing in terms of our server backend.  We went from having a few small (10GB hard drives!) servers, to a very large, enterprise style backend, with hundreds of terabytes of storage.  This too became a major challenge to manage by the scientific group.

A new plan. Therefore, in 2007, I started a new plan to migrate these duties (code development and system administration) to professional programmers and system administrators.  Today, most of FAH code development is done by professional programmers, and in time I expect all of it will be done that way. The desire to start with a clean code base lead to new projects, such as the v5 server code, second generation GPU code (GPU2), second generation SMP code (SMP2), new client (v7 client in the works), which have been developed with a clean slate.

There are some differences in how donors will see the fruits of these efforts.  I have found that while the programmers write much cleaner code (much more modular and systematic and maintainable), the code development is typically slower.  While the scientific group can often make certain changes say in a month, the professional programmers may take 2 or 3.  What we get for that extra time is more cleanly written code, no hacks, and a plan for long term sustainability (clean code, well documented code, high level programming practices, etc).  Some projects are still done by the scientific staff (eg Dr. Peter Kasson continues to do great things with the SMP client as well as work towards SMP2), I expect that in time this will all be done by programmers.

Analogously, sysadmin has been pushed to a professional group at Stanford.  Similarly, they are more careful and methodical, but slower to respond due to this.  My hope is that as we migrate away from our older legacy hardware and they set up clean installs with the v5 server code, the issues of servers needing restarts should be greatly improved.  This infrastructure changeover has been much slower than I expected, in part due to the practices used by the sysadmin team to avoid hackish tricks and to keep a well-organized, uniform framework amongst all of the servers (eg scripting and automating common tasks).

One important piece good news is that the people we've got are very good.  I'm very happy to be working with some very strong programmers, including Peter Eastman, Mark Friedrichs, and Chris Bruns (GPU2/OpenMM code), Scott Legrand and Mike Houston (contacts at NVIDIA and ATI, respectively, for GPU2 issues), Joe Coffland and his coworkers (v5 server, Protomol Core, Desmond/SMP2 core, v7 client).  System admin is also now done professionally, via Miles Davis' admin group at Stanford Computer science.  Also, since she has help desk experience, Terri Fedelin (who does University admin duties for me personally) has also been working on the forum helping triage issues.

Where are we now? Much of their work is behind the scenes and we generally only talk about big news when we're ready to release, but if you're curious, you can see some of it publicly, such as tracking GPU2 development via the OpenMM project (http://simtk.org/home/openmm) and the Gromacs/SMP2 core via the http://gromacs.org cvs (look for updates involving threads, since what is new about SMP2 is the use of threads instead of MPI).  You can also follow some more of the nitty gritty details on my Twitter feed (http://twitter.com/vijaypande), where I plan to try to give more day-to-day updates, albeit in a simpler (and less gramatically correct) form; the hope here is to try to have more frequent updates, even if they are smaller and simpler.

As the GPU2 code base matured in functionality, GPU2 core development has been mainly bug fixes, which is a good thing.  SMP2 has been testing in house for a while and I expect it will still take a few weeks.  The main issue is trying to make sure we get good scalability with threads based solutions, removing bottlenecks, etc.  The SMP2 initiative lead to two different cores, one for the Desmond code from DE Shaw Research and another for a Gromacs variant (a variant of the A4 core).  We having been testing both in single cpu-core format (the A4 Gromacs core is a single core version of what will become SMP2) and we hope to release in a week or two a set of single core Desmond jobs.  If those look good, multiple-core versions via threads (not MPI) will follow thereafter.

The v5 system roll out is continuing, with the plan to have a parallel v5 infrastructure (set up by the new sysadmins) with our current one, and have the science team migrate new projects to the new infrastructure.  The v5 code has been running for a while in a few tests and we expect one of the GPU servers to migrate this week, with one or two servers migrating every week as time goes on.  The new code does not crash/hang the way the v3/v4 code does (it hung under high load and needed the process to be killed) and so we expect much more robust behavior from it.  Also, Joe Coffland has been great regarding responding to code changes and bug fixes.

So, the upshot of this new scheme is that donors will likely see more mature software, which also means slower revs between cores, both since fewer revs are needed in the new model (a lot of issues are simplified by the cleaner code base) and because the revs now involve a lot of internal QA and testing and more careful methodical programming.

The long term upshot for FAH is better software and more sustainable software.  It's taking time to get it done, but based on the results so far (eg GPU2 vs GPU), I think it has been worth the wait (but we still have a fair ways to go before we can see all of the fruits of this work).
大意：
FAH代码开发简史
2001年FAH刚启动的时候，几乎所有代码都是由负责人Vijay Pande编写的，而后维护工作交给其他研究人员来做。这样就带来了一系列的问题：1、编码水平有限。经常出问题。2、维护能力有限。因为我们要做很多蛋白质基础研究，没太多的时间来编码与维护。3、人员变换。我们的研究员，大部分是硕士或博士生，一般3~5年就要毕业‘跑路’，然后就要交给后来人维护，造成代码维护难度和成本很高。

2007年起，我们改变了策略，把编码和维护工作，外包给了专业的编程人员来做。他们重写了代码，他们的代码条理更清晰，更模块化，效率更高，维护起来更方便。外包的项目包括：发布中的v5版服务器端程序，已发布的GPU2，即将发布的SMP2，还有未来的V7版客户端程序。虽然这样做，我们的编码速度慢了很多（差不多3倍），但是代码质量提高了很多，而且我们也可以把更多的精力放在蛋白质研究上。我们觉得这样做是值得的。

现在：
GPU2的代码基本上稳定了，大部分都是BUG修正工作。
SMP2正在内测，马上会开始公测。SMP2不再使用MPI库，而是改用线程来实现多核并行计算。SMP2仍然有2个版本。一个是Gromacs A4修改版内核，还有一个是Desmond 内核。我们现在正在对A4修改版内核进行单核任务测试，1、2周内，我们也会开始对Desmond 内核进行单核任务测试。如果一切顺利，我们会开始对这2个内核进行多核并行计算测试。
v5版服务器端程序，本周开始会先部署在GPU服务器上，我们会逐渐对所有服务器程序进行升级。

号外：
Vijay Pande老大开了Twitter （地址是：http://twitter.com/vijaypande），有心人可以去看下，了解下，老大每天都在忙些啥。

译者注：官方在发布GPU客户端时曾说GPU客户端将是最快的客户端，GPU2时代超越了PS3，实现了诺言。同样，官方在发布SMP客户端时也曾说SMP客户端将是最快的客户端，让我们迎接SMP2时代的到来吧，我相信官方不会食言的。

[ 本帖最后由 vmzy 于 2009-6-18 16:19 编辑 ]

Bismarck · 发表于 2009-6-18 16:05:57

嗯？正式版？嘿嘿……

		自动登录	找回密码
密码			新注册用户

[独立平台] [生命科学类] Folding@Home

评分

评分

评分

评分

评分

评分

评分

评分

评分

评分

评分