|
发表于 2006-2-18 13:29:40
|
显示全部楼层
[分享] Rosetta@home 科学新闻
原文:http://boinc.bakerlab.org/rosetta/rah_science_news.php
January 25, 2006
I will use this space to give biweekly updates on recent results and the work units planned for upcoming weeks.
以后我将每半个月对近期的结果和接下来的计划进行一次说明。
Today I will begin by summarizing some of the main results of the last few weeks.
今天我将概述一下前面几个星期的主要结果。
More computing power can significantly improve results. This is illustrated by the 1ogw case. For one of the work unit types (NO_SIM_ANNEAL_BARCODE_30) we ran 60,000 independent jobs, for a total of 600,000 structures. If we take the lowest energy ten structures, the median rmsd is 2.86. If we instead take the lowest energy ten structures just from the first 18,000 jobs, the median rmsd is 4.49. So with more sampling, we are able to land more explorers closer to the global minimum, and get more accurate results.
更多的计算资源可以得到更好的结果。这个可以从1ogw实验中看出。比如“NO_SIM_ANNEAL_BARCODE_30”任务类型,我们运行了60,000个独立的任务,总共得到600,000个结构。其中能量最低的十个结构的平均RMSD为2.86。如果我们仅从前面18,000个任务中提取能量最低的十个结构,平均的RMSD则为4.49。也就是说采样越多,接近真实结构(全局能量最低)的采样也就越多,最后得到的结果也就越精确。
Allowing additional flexibility in the chain can significantly improve results (this was the "breakthrough" I described several months ago). In the "NO_VARY_OMEGA" runs, we went back to the pre breakthrough less flexible chain, and the results were consistently worse. For example, in the 1ogw case, the median rmsd of the low energy structures increased to 4.50. For 1r69, the median rmsd of the low energy structures increased from 1.29 to 2.80.
给蛋白质链更多的自由度也能得到更好的结果(这就是我在几个月前所描述的“突破性进展”)。在“NO_VARY_OMEGA”任务类型中,我们限制了链的自由度(就像取得“突破”之前),结果都变差了。比方说,在1ogw实验中,最低能量结果的RMSD提高到了4.50。在1r69实验中,最低能量结构的RMSD从1.29提高到了2.80。
The computationally less expensive NO_SIM_ANNEAL methods were no worse in locating low energy low rmsd structures than the SIM_ANNEAL runs. This is good news, as we can carry out many more of the NO_SIM_ANNEAL searches and so do more searching for the same amount of CPU time.
计算量更少的“NO_SIM_ANNEAL"方法在寻找低能量低RMSD的结构方面并不比“SIM_ANNEAL”方法差。这是一个好消息,因为我们能够运行更多的“NO_SIM_ANNEAL”任务,同样的计算时间,可以进行更多的搜寻。
As Paul Buck anticipated, most of the remaining alternative methods we tested were roughly equivalent (except for the NO_VARY_OMEGA). One way of looking at this is that given the huge space we have to search, all that matters is how many independent explorers are sent out to search, not the details of the instructions each are given about where to search.
正如Paul Buck期望的,其它大部分我们测试过的方法大体上效果都差不多(除了“NO_VARY_OMEGA”)。也可以这样看,对于我们要搜寻的巨大空间,重要的是有多少探索者被送出去搜寻,具体我们怎么指导探索者去搜寻反而并不那么重要。
Excitingly, for many of the proteins, the lowest energy structures are very close (less than 3.0A rmsd) from the true structure. For example, in the NO_SIM_ANNEAL_BARCODE_30 the rmsds of the lowest energy structures are
另人激动的是,对于大部分蛋白质,我们预测的最低能量结构都与真实结构相当接近(RMSD小于3埃)。比如在“NO_SIM_ANNEAL_BARCODE_30”任务类型中,各个蛋白质的最低能量结构的RMSD分别是:
1dtj: 1.93
1dcj: 2.72
1ogw: 2.65
2reb: 1.46
1r69: 1.79
1di2: 1.40
These results are a significant improvement over anything that has been done before. If we are able to do this consistently for proteins in this size range, it will be a major scientific breakthrough.
这些结果和之前的任何结果比较都是一个相当大的进步。如果我们能够对于这种大小的蛋白质一直得到这样的结果,这将是一个很大的科学进展。
Our next step will be to test out the computationally efficient NO_SIM_ANNEAL_BARCODE_30 method on 25 new proteins we haven't done calculations on yet. You will see the new proteins on your screen saver by early next week. The "BARCODE_30" means that for every 30 amino acid residue segment in the protein, a random choice as to the value of the angles for one residue are randomly picked at the beginning of the run. This directs different runs to explore different regions of the space, and is more or less equivalent to directing different explorers to different lattitudes and longitudes.
下一步,我们将对25种新的蛋白质测试“NO_SIM_ANNEAL_BARCODE_30”方法的计算效率。你将在屏保上看到之前从未看到过的新蛋白质。名称中的“BARCODE_30”表示的是在计算开始时,在蛋白质中以30个氨基酸为单位,随机选择一个基的偏转角度。这将产生空间中的不同的搜寻区域,就好比把探索者降落到不同的经纬度。
You will also see more "PRODUCTION_AB_INITIO" runs in the next few weeks. In these runs we are testing the first low resolution part of the search. We will lower the number of trajectories per work unit to avoid the max_cpu_time problem. I think we have largely solved this problem now by going to shorter work units and doubling the max_cpu_time limit.
在接下来的几个星期,你还将看到更多的“PRODUCTION_AB_INITIO”任务。这些任务用来测试搜寻的第一个阶段(低分辨率搜寻)。我们将减少任务包中的轨迹数以避免max_cpu_time问题。我想我们已经基本解决了这个问题,一方面减少任务包的长度,另一方面将max_cpu_time限制增加了一倍。
There will also be tests of calculations for some of the other projects described in the introduction section of the web site. We hope to get the vaccine design calculations running on BOINC in the near future. With regard to the message board posts, we aren't yet doing any work on diabetes or MS specifically, but if we can generate accurate structures of proteins involved in these diseases using the methods you are helping us to develop, it will contribute to efforts to develop therapies.
之后也将有一些在我们网站介绍部分所描述的一些项目的测试计算。我们希望不久后就能在BOINC上运行疫苗设计的计算任务。在此回答留言板的一些帖子,我们还没有进行糖尿病或多发性硬化症相关的工作,但是如果能够使用你们帮助我们开发的计算方法来更好地预测和这些疾病相关的蛋白质的结构,这也会对研究这些疾病的人员有帮助的。
Thank you again for all of your wonderful contributions!
谢谢所有用户的无私贡献!
David Baker
[ Last edited by Youth on 2006-2-18 at 22:51 ] |
评分
-
查看全部评分
|