[BOINC] [生命科学类] Rosetta@home

Youth · 发表于 2006-2-16 09:14:47

Volunteers needed!!

We are seeking volunteers for our new alpha test project, RALPH (http://ralph.bakerlab.org).

There are a number of recent improvements to the rosetta application but we need volunteers to speed up the process of testing to make the updated application available for production as soon as possible.

If you are interested in helping to improve Rosetta@home and can spare a few extra cycles for testing, please join RALPH@home.

招募用户参加Rosetta@home的测试项目RALPH@home，地址在http://ralph.bakerlab.org。

Grё@thΙll · 发表于 2006-2-17 21:46:15

Rosetta@Home的网页都访问不到了～怎么回事啊～疑惑ing!～WU都下载不了了？？

Youth · 发表于 2006-2-18 11:06:16

Outage Notice: The project will be down starting today at 3pm PST for maintenance. The server should be back online later in the evening.

停机通告：太平洋时间今天下午三点开始停机，应该晚上就会恢复正常。

We will be updating the rosetta application today. There are a number of new features:

我们将升级Rosetta的计算程序，新程序有如下新特性：

Work units will have a default cpu run time of 8 hours, and users will have the option to change the cpu run time as a project specific preference. The length of work units will no longer depend on the number of predicted structures. This option was added to allow participants to reduce bandwidth usage per work unit and maintain consistent run times.

任务包将有一个缺省最多八小时的完成时间，用户可以自己在网站上修改这个参数。任务包的长度将不依赖于所预测的结构的数量。这个选项方便用户限制每个任务包对带宽的使用，保持连接的运行时间。

Users will also have the option to change the frame rate and cpu use for graphics.
A new graphics version will be available for Mac OSX users.

用户将可以定制图形的FPS和所占用的CPU。Mac的用户也将拥有新版本的图形。

Youth · 发表于 2006-2-18 13:29:40

原文：http://boinc.bakerlab.org/rosetta/rah_science_news.php

January 25, 2006

I will use this space to give biweekly updates on recent results and the work units planned for upcoming weeks.

以后我将每半个月对近期的结果和接下来的计划进行一次说明。

Today I will begin by summarizing some of the main results of the last few weeks.

今天我将概述一下前面几个星期的主要结果。

More computing power can significantly improve results. This is illustrated by the 1ogw case. For one of the work unit types (NO_SIM_ANNEAL_BARCODE_30) we ran 60,000 independent jobs, for a total of 600,000 structures. If we take the lowest energy ten structures, the median rmsd is 2.86. If we instead take the lowest energy ten structures just from the first 18,000 jobs, the median rmsd is 4.49. So with more sampling, we are able to land more explorers closer to the global minimum, and get more accurate results.

更多的计算资源可以得到更好的结果。这个可以从1ogw实验中看出。比如“NO_SIM_ANNEAL_BARCODE_30”任务类型，我们运行了60,000个独立的任务，总共得到600,000个结构。其中能量最低的十个结构的平均RMSD为2.86。如果我们仅从前面18,000个任务中提取能量最低的十个结构，平均的RMSD则为4.49。也就是说采样越多，接近真实结构（全局能量最低）的采样也就越多，最后得到的结果也就越精确。

Allowing additional flexibility in the chain can significantly improve results (this was the "breakthrough" I described several months ago). In the "NO_VARY_OMEGA" runs, we went back to the pre breakthrough less flexible chain, and the results were consistently worse. For example, in the 1ogw case, the median rmsd of the low energy structures increased to 4.50. For 1r69, the median rmsd of the low energy structures increased from 1.29 to 2.80.

给蛋白质链更多的自由度也能得到更好的结果（这就是我在几个月前所描述的“突破性进展”）。在“NO_VARY_OMEGA”任务类型中，我们限制了链的自由度（就像取得“突破”之前），结果都变差了。比方说，在1ogw实验中，最低能量结果的RMSD提高到了4.50。在1r69实验中，最低能量结构的RMSD从1.29提高到了2.80。

The computationally less expensive NO_SIM_ANNEAL methods were no worse in locating low energy low rmsd structures than the SIM_ANNEAL runs. This is good news, as we can carry out many more of the NO_SIM_ANNEAL searches and so do more searching for the same amount of CPU time.

计算量更少的“NO_SIM_ANNEAL"方法在寻找低能量低RMSD的结构方面并不比“SIM_ANNEAL”方法差。这是一个好消息，因为我们能够运行更多的“NO_SIM_ANNEAL”任务，同样的计算时间，可以进行更多的搜寻。

As Paul Buck anticipated, most of the remaining alternative methods we tested were roughly equivalent (except for the NO_VARY_OMEGA). One way of looking at this is that given the huge space we have to search, all that matters is how many independent explorers are sent out to search, not the details of the instructions each are given about where to search.

正如Paul Buck期望的，其它大部分我们测试过的方法大体上效果都差不多（除了“NO_VARY_OMEGA”）。也可以这样看，对于我们要搜寻的巨大空间，重要的是有多少探索者被送出去搜寻，具体我们怎么指导探索者去搜寻反而并不那么重要。

Excitingly, for many of the proteins, the lowest energy structures are very close (less than 3.0A rmsd) from the true structure. For example, in the NO_SIM_ANNEAL_BARCODE_30 the rmsds of the lowest energy structures are

另人激动的是，对于大部分蛋白质，我们预测的最低能量结构都与真实结构相当接近（RMSD小于3埃）。比如在“NO_SIM_ANNEAL_BARCODE_30”任务类型中，各个蛋白质的最低能量结构的RMSD分别是：

1dtj: 1.93
1dcj: 2.72
1ogw: 2.65
2reb: 1.46
1r69: 1.79
1di2: 1.40

These results are a significant improvement over anything that has been done before. If we are able to do this consistently for proteins in this size range, it will be a major scientific breakthrough.

这些结果和之前的任何结果比较都是一个相当大的进步。如果我们能够对于这种大小的蛋白质一直得到这样的结果，这将是一个很大的科学进展。

Our next step will be to test out the computationally efficient NO_SIM_ANNEAL_BARCODE_30 method on 25 new proteins we haven't done calculations on yet. You will see the new proteins on your screen saver by early next week. The "BARCODE_30" means that for every 30 amino acid residue segment in the protein, a random choice as to the value of the angles for one residue are randomly picked at the beginning of the run. This directs different runs to explore different regions of the space, and is more or less equivalent to directing different explorers to different lattitudes and longitudes.

下一步，我们将对25种新的蛋白质测试“NO_SIM_ANNEAL_BARCODE_30”方法的计算效率。你将在屏保上看到之前从未看到过的新蛋白质。名称中的“BARCODE_30”表示的是在计算开始时，在蛋白质中以30个氨基酸为单位，随机选择一个基的偏转角度。这将产生空间中的不同的搜寻区域，就好比把探索者降落到不同的经纬度。

You will also see more "PRODUCTION_AB_INITIO" runs in the next few weeks. In these runs we are testing the first low resolution part of the search. We will lower the number of trajectories per work unit to avoid the max_cpu_time problem. I think we have largely solved this problem now by going to shorter work units and doubling the max_cpu_time limit.

在接下来的几个星期，你还将看到更多的“PRODUCTION_AB_INITIO”任务。这些任务用来测试搜寻的第一个阶段（低分辨率搜寻）。我们将减少任务包中的轨迹数以避免max_cpu_time问题。我想我们已经基本解决了这个问题，一方面减少任务包的长度，另一方面将max_cpu_time限制增加了一倍。

There will also be tests of calculations for some of the other projects described in the introduction section of the web site. We hope to get the vaccine design calculations running on BOINC in the near future. With regard to the message board posts, we aren't yet doing any work on diabetes or MS specifically, but if we can generate accurate structures of proteins involved in these diseases using the methods you are helping us to develop, it will contribute to efforts to develop therapies.

之后也将有一些在我们网站介绍部分所描述的一些项目的测试计算。我们希望不久后就能在BOINC上运行疫苗设计的计算任务。在此回答留言板的一些帖子，我们还没有进行糖尿病或多发性硬化症相关的工作，但是如果能够使用你们帮助我们开发的计算方法来更好地预测和这些疾病相关的蛋白质的结构，这也会对研究这些疾病的人员有帮助的。

Thank you again for all of your wonderful contributions!

谢谢所有用户的无私贡献！

David Baker

[ Last edited by Youth on 2006-2-18 at 22:51 ]

Youth · 发表于 2006-2-18 13:31:16

去年底David Baker发布在论坛里的一篇帖子，也是类似内容的：

http://www.equn.com/forum/viewthread.php?tid=11072

Youth · 发表于 2006-2-18 14:35:36

Work is flowing again. Today we upgraded our database server. Unfortunately, we will be delaying the application update for a day or two to work out a few minor issues. See Technical News for details.

已经恢复正常。今天我们升级了数据库服务器。而计算程序的升级要推后一两天，有一些小问题要先解决。更详细请看技术新闻。

Grё@thΙll · 发表于 2006-2-18 18:20:16

Youth · 发表于 2006-2-18 22:46:38

原文：http://boinc.bakerlab.org/rosetta/rah_technical_news.php

November 27, 2005
Welcome to our new technical news bulletin.

Today, we backed up our database and reconfigured the database server to match Seti@home's configuration. We'd like to thank Bob Bankay, Seti@home's database administrator, and David Hammer at Einstein@home for providing useful advice and copies of their my.cnf files. Soon, we will be testing database replication on two test servers (64 bit dual Opterons w/ 8 GB RAM) set up by Keith, and if the tests look good, they will be used for production. The benefits of using replication (as stated in the MySQL documentation) are 1) server robustness (if the master server goes down another can be used as a backup), 2) load balancing for non-updating queries, and 3) server maintenance (such as database backups) without disruptions.

欢迎来到技术新闻页面！

今天我们按照SETI项目的配置对我们的数据库进行了备份和重新配置。感谢SETI项目的数据库管理员Bob Bankay和Einstein项目的David Hammer提供了有用的建议和他们的my.cnf文件的复件。很快我们将在由Keith搭建的两台测试服务器（64位双皓龙，8G内存）上测试数据库复制。如果测试结果良好，将正式使用。使用数据库复制的好处（请参考MySQL的文档）有：增强服务器的健壮性（主服务器停机时可以使用备用服务器），对非更新的查询类型的负载均衡，不中断的服务器维护（比如数据库备份）。

December 12, 2005
Our work unit feeder is having a tough time keeping up with all the client requests for work. A short term fix (as has been done before), is to optimize the database tables. We will be doing this later today at 3pm and also backing up the database. As stated before, we are going to expand our servers soon to deal with this issue.

任务生成服务最近很难跟上所有客户端请求任务的速度。一个临时的解决方法是优化数据库表（就像以前做的）。我们将在今天下午三点进行优化并对数据库进行备份。如前所述，我们将会升级我们的服务器以解决这个问题。

December 20, 2005
Last evening we released updated versions of the rosetta application for all three platforms. The updates include changes to, again, increase diversity in the searches. For those familiar with Rosetta, the protocol can now use larger protein fragment libraries and run more cycles. There were also minor changes to the graphics to allow rotation of the native structure.

Additionally, a bug was found and fixed by Bin, a post doc in our lab, that may have been causing the "1%" continual loop. This bug would occur very infrequently in specific circumstances. We do not know for sure yet if this is the only "1%" bug.

We also put our new work unit batch submission system into production. Unfortunately, a batch of work units using this system was not set up correctly. Work units from this batch have names starting with "DEFAULT_xxxxx_205_" where xxxxx are the protein code and chain id. 205 is the batch id.

IF YOU ARE RUNNING ONE OF THESE WORK UNITS, PLEASE ABORT IT. Batch 206 and greater are okay, and should not be aborted.

The work units in batch 205 were set up to predict 1000 structures instead of 10, so they will all reach the run time limit of 12-16 hours before finishing and will eventually error out. WE WILL GRANT CREDIT TO PEOPLE WHO HAVE RUN AND ABORTED THESE WORK UNITS.

Another problem has been identified with some new work units which is causing a 0xc0000005 UNHANDLED EXCEPTION error. This is a weird bug that appears to be dependent on the random number seed and we are currently looking into its cause. A short-term fix of using the computer clock to generate the seed (as has been done in previous runs) is in place.

In an effort to prevent errors like this in the future, we will set up a local test boinc server and do quality control after the holidays.

昨晚我们针对所有平台发布了新版本的计算程序。升级带来的变化主要是增加搜寻的多样性。对于熟悉Rosetta的用户，新程序可以使用更大的蛋白质片段库并运行更多周期。图形也进行了小的修改以允许对自然的结构进行旋转。

另外，我们实验室的一位博士后Bin发现并修复了一个软件中的臭虫，这个臭虫有可能导致“1%”问题（无限循环）。这个臭虫会在特定的情形下很偶尔发生。我们还不确定这是不是导致“1%”问题的唯一臭虫。

我们还正式使用了新的批量任务提交系统。不幸的是，有一批使用这个系统的任务设置错误。这批任务的名称以“DEFAULT_xxxxx_205_”开头，其中的xxxxx是蛋白质代码和链标识。205是批号。

如果你正在运行这些任务，请直接中止它们。批号206以及之后的任务都是正常的，不需要中止。

批号205的任务包被设置成预测1000个而不是10个结构，因此它们的运行时间将超过12到16小时的限制，最终都将报错。但对于已经运行并中止这些任务的用户，我们仍将授予相应的积分。

另外一个引发“0xc0000005 UNHANDLED EXCEPTION”错误的问题也已被确定。这个奇怪的臭虫似乎依赖于随机数种子，我们正在试图找出其原因。一个临时的修复已经准备好了，它将使用计算机时钟来产生这个种子。

为了防止以后类似的错误，我们搭建了一个供本地测试用的BOINC服务器并将在假期后进行质量控制工作。

Youth · 发表于 2006-2-18 22:59:16

January 6, 2006
Today, we are going to back up the database and optimize tables for general maintenance starting at 3pm PST. We are also going to replace the data fileserver with one that is more robust. Our initial fileserver used a logical volume consisting of 5 146GB Ultra3 SCSI drives, w/o redundancy. One of the disks has developed a problem putting the logical volume in peril. As a replacement we've built a new fileserver from a dual 2.8GHz XEON w/ 2GB RAM running a 6 X 146 GB RAID-5 from a LSI MegaRAID controller, providing redundancy.

太平洋时间今天下午三点的例行维护中我们将备份数据库并优化数据表。我们还将替换一个更健壮的数据文件服务器。我们原来的文件服务器使用的是由5个146G的Ultra3 SCSI硬盘组成的一个逻辑卷。其中一个硬盘的问题导致整个逻辑卷不再安全。新文件服务器的配置是双2.8G至强和2G内存以及由6个146G硬盘组成的RAID5以提供冗余。

January 12, 2006
We stated below that we will grant credit to users who have run and aborted bad work units that were initially released on December 20th. This has finally been done for aborted and failed results from work units in batch 205 and work units that were issued bad random number seeds. The claimed_credit from these results was added to the total_credit in the user, host, and team database tables. A total of 274609.56 credits were granted. A tab delimited list of userid, hostid, teamid, and granted credit is available online (4.2M) for anyone curious.

我们之前已声明将授予那些运行并中止了12月20日分发的错误任务的用户相应积分。最后，所有被中止或出错的批号205的任务以及那些错误的随机数种子相关的任务的结果都被授予了积分。来自这些结果的“声请积分”都被增加到了数据库中相应用户、主机、团队的总积分上。总共有274609.56的积分被授予。如何你想了解详细的情况（具体的用户、主机、团队及授予的积分），请看如下列表（大小为4.2兆）：http://boinc.bakerlab.org/rosett ... edit_2006-01-12.txt。

January 13, 2006
The University of Washington experienced a campus wide network slowdown today related to the Windows WMF vulnerability. See more here

今天华盛顿大学全校的网络出了问题，原因是最近Windows的WMF漏洞。详细请看http://www.washington.edu/cac/outages/show.php?id=64。

January 17, 2006
The project will be down for maintenance starting today at 3pm PST. Today's down time is expected to be a bit longer than usual because, in addition to backing up our database and optimizing tables, we are also going to move our project files over to the file server.

太平洋时间今天下午三点将停机进行维护。停机时间将可能比以往的更长，因为除了备份数据库以及优化数据表，我们还要将项目文件移动至文件服务器上。

February 14, 2006
We've modified the webserver to address the problems connecting to the server. This should improve matters for all.

我们修改了网站服务器以解决连接困难的问题，应该已经起作用了。

February 17, 2006
Today we backed up our database and upgraded our production database server which now uses mysql-max and has a SCSI controller serving a RAID10 of 14 drives.

今天我们对数据库进行了备份，并将数据库的服务器软件升级成了mysql-max，而硬件上使用了由14块硬盘组成的RAID10。

February 22, 2006
Starting at around 8:20 (PST) this morning the University of Washington network began to experience widespread connectivity problems. They are working on it.

太平洋时间今天早上8点20开始，整个学校的网络都出了问题，相关人员正在解决。

[ Last edited by Youth on 2006-2-26 at 10:11 ]

Youth · 发表于 2006-2-19 15:38:13

Rosetta application update! Graphics are now available for Mac OSX platforms.

Rosetta的计算程序已经升级！Mac平台的计算程序也已拥有显示图形的功能了。

// 机器里最新的一个任务包已经是用4.82在计算了，屏保里面多了个model数的显示。

Youth · 发表于 2006-3-4 11:41:38

The default cpu run time is now set at 2 hours instead of 8. This change will effect new work units only.

缺省的任务包时间长度由8小时更改为2小时。这个更改只会影响新的任务包。

// 好像只是为了减少出错的几率，如果大家算着没问题的话，建议将任务时间设置成8-10小时，这样对网络对服务器的压力都会小很多。

[ Last edited by Youth on 2006-3-4 at 11:42 ]

Kaoh · 发表于 2006-3-4 13:46:16

我是設兩小時...不過我同時跑其他許多計劃的話
應該沒關係吧?我只要不用一直主攻rosetta,伺服器負擔就不會重了...

是吧?

Youth · 发表于 2006-3-4 14:02:16

嗯，不用的。 // 我一般一台机器就算一个项目，所以设得比较大一些：）

Youth · 发表于 2006-3-14 12:38:51

We will now be posting top predictions each day!

以后我们将会每天对“最佳预测”进行更新！ // 看来官方终于是把这个做成自动的了：）

Today's protein: 1tif

今天的蛋白质是1tif。

Lowest Energy Structure predicted by: BurnHard (Team Dutch Power Cows)
Lowest RMSD Structure predicted by: Al83

For details and pictures of the predicted structures, see the top predictions page
Congratulations to today's winners!

祝贺今天的赢家！更多细节和图片请查看“最佳预测”页面。

Youth · 发表于 2006-3-14 12:48:00

从网页上看，从9号开始就已经更新了，可惜还没有国内的用户“中奖”，以后大家可以多多关注这个页面：）

不知道另外一个result页面会不会也做成自动的，这个页面可以看到用户自己计算的任务包的详细情况（比如离最佳预测的距离）。

		自动登录	找回密码
密码			新注册用户

[BOINC] [生命科学类] Rosetta@home

February 15, 2006

评分

February 17, 2006

评分

[分享] Rosetta@home 科学新闻

评分

February 17, 2006

评分

[分享] Rosetta@home 技术新闻

评分

2006年了

评分

February 18, 2006

评分

March 2, 2006

评分

March 13, 2006

评分

浏览过的版块