RNA World:常见问题解答

Youth讨论 | 贡献2011年1月11日 (二) 19:43的版本
跳转至: 导航搜索

RNA World 的常见问题解答


RNA World的目标是什么?为什么我要参与呢?(In simple words, what are the goals of RNA World and why should I participate?)

RNA World focuses on RNA research and as such it is the first distributed computing project of its kind. If you take a look for what discoveries the recent Nobel Prizes in chemistry and medicine were awarded (telomerase, ribosome: all RNA-based cell machineries; and before that for RNA interference / miRNAs: small RNAs that regulate cell development and are involved in cancer), you will realize that RNA research is a highly important subject. But maybe you have also once taken antibiotics to help your immune system to battle a bacterial infection. Such antibiotics usually bind to the RNA components of bacterial ribosomes and thereby inactivate protein synthesis and consequently growth of these microbes. Other small RNAs in bacteria are required for attaching to and eventually even invading human host cells. And of course, many aspects of fundamental cellular processes where RNAs are involved are of additional interest to researchers, as e.g. the question whether microorganisms such as bacteria might even possess some sort of primitive immune system (CRISPR RNA family). So as you can see, identifying non-protein-coding RNAs in as many organisms as possible establishes a fundamental basis of knowledge for a diverse range of questions and may even promote the development of drugs to combat diseases. For further information, please see the project description.

RNA World是第一个致力于核糖核酸(RNA)研究的分布式计算项目.从基于RNA结构的细胞器如端粒酶,核糖体,再到能够调控细胞周期,应对癌症的小干扰核糖核酸(siRNA),关于RNA的研究一直是近年来诺贝尔奖的热点.我们平时为了抵御细菌感染而吃的抗生素,通常就是结合在细菌的核糖体RNA上,从而干扰细菌的生长,抑制它们的繁殖;而且,细菌也是依靠一些小RNA来依附和感染人体细胞的.正是因为RNA广泛的参与了许多细胞基础代谢过程,它们也吸引了广泛的研究兴趣,例如:是不是细菌也具有(由CRISPR RNA,即含有规律成簇的间隔短回文重复序列的RNA组成的)某种原始免疫系统呢?于是,在多种有机体内发现和鉴别非编码蛋白质的RNA,能够让我们更多地了解生命的基础,并且促进开发新的药物.如想了解更多的知识,请参见"项目描述".

RNA World目前支持哪些项目?(What applications are currently supported by RNA World?)

InReAlyzer 1.0 Converter to produce high-resolution graphics
from INFERNAL output files (in-house development)

What types of work units are available?

At present we have four types of work units, which are based on (1) CMBUILD, (2) CMCALIBRATE, (3) CMSEARCH and (4) InReAlyzer. CMBUILD work units produce an RNA co-variance model from a text alignment of members of RNAs belonging to the same RNA family. CMCALIBRATE work units calibrate an RNA co-variance model produced by CMBUILD such that it can be used to score the probability that a potential RNA identified as a member of a certain RNA family is indeed a true candiate of that family. CMSEARCH work units use the output of CMBUILD, i.e. a calibrated co-variance model to search for an RNA family in the genome of a specified organism. InReAlyzer work units convert the somewhat cryptic text output of CMSEARCH work units into high-resolution PNG graphics that allow for convenient visual judgement whether or not a given CMSEARCH candidate belongs to the RNA family under investigation.

RNA World支持GPU/CUDA/STREAM运算吗?(Does RNA World support GPU/CUDA/STREAM processing?)

At present, RNA World applications do not profit from GPU processing, but we can think of applications that will and if that applies, we will certainly try our best to make clients available that support such performance improvements.

目前,RNA World尚未从GPU运算中受益。不过我们会为参与者考虑。一旦有相关需求,我们一定会尽全力让参与者获得对这类计算性能提升技术的支持。

RNA World有存盘点吗?(Does RNA World support checkpointing?)

RNA World is a universal framework comprising a diverse set of RNA-relevant bioinformatic tools. Originally, these tools were not designed for incorporation in a distributed computing environment although most of these require significant compute ressources. Traditional checkpointing, however, must be supported by the scientific application. Consequently, checkpointing is not available, but: The RNA World development team is seeking ways to establish novel, universal checkpointing procedures. These novel methods rely on either creating "suspend RAM to disk" images at the system level or the establishment of RNA World as a system that generally runs in a virtual machine (VM). For the former we have a functional 32-bit Linux workaround that, however, requires that memory randomization is deactivated in the Linux kernel. For the latter we have a promising cooperation with CERN.

RNA World是一个通用的框架,包括一系列有关的的生物信息工具。一开始,这些工具并没有设计在分布式计算环境内,尽管它们需要大量(显著)的计算资源。传统的存盘点必须要被科学任务所支持。因此,存盘点并未被RNA World所支持。不过RNA World的技术小组正在试图建立新颖的、通用的存盘过程。这些新颖的方法包括在系统水平上创建将内存(数据)存入硬盘(的功能)或者将RNA World建成系统以便在虚拟机上运行。对于前者,我们可以在一台32-bit的Linux工作环境中实现,不过(事先)要将随机内存功能从Linux内核中移除。对于后者,我们与CERN有着密切的合作且前景良好。

RNA Word 的内存需求如何?(What is the RAM requirement of RNA World?)

RAM requirements
usually below 600 MB of RAM
CMSEARCH up to 1 GB of RAM,
usually below 100 MB of RAM
InReAlyzer 1.0 below 50 MB of RAM

Please note one important thing: If you are using a multi-core machine, RNA World might engage on all of your cores simultaneously. As long as we do not use the OpenMPI-based multi-threaded application mode, this means that the RAM required to successfully complete all of the work units is additive.

请注意一个重要的事实:如果你使用的是多核处理器,RNA World可能会同时使用所有的核心,前提是我们不使用基于OpenMPI的多线程任务模式。这就意味着未来完成任务内存需求会增加。

运行在Ubuntu Linux X64上的英特尔Core2Due 2.4GHz的计算机通常运行RNA World需要多长时间? (What are the typical running times of RNA World work units using an Intel P8600 Core2Duo 2,4 GHz machine with Ubuntu Linux x64?)

INFERNAL 1.0.2 CMCALIBRATE 20 minutes to 10 days,
typically a day
CMSEARCH a few minutes to 100 hours,
typically a few hours
InReAlyzer 1.0 a few seconds to 5 hrs,
typically a few minutes

RNA World支持多线程任务吗?(Does RNA World support multithreaded (MT) applications?)

RNA World supports whatever its scientific applications support. In case of INFERNAL, multithreaded applications are nicely supported on the basis of OpenMPI. While MT applications run well in a local computing environment under Linux, validation of results generated in a distributed computing environment that employ MT applications for unknown reasons fails in a subset of cases, although technically MT applications work excellent. We have not yet identified the cause of this inconsistent behavior, but INFERNAL makes use of random number generation that is tied to the processor core time at which individual threads are started. On a multicore system, an MT application therefore is initiated at slightly different times on each of the cores involved. Consequently, on a different (distributed) system, the random numbers differ and so do the results - although differences are only minor. If the random number generator is supplied with a fixed value, issues are largely solved but, strangely, not entirely and that is a problem for BOINC-based WU validation. The RNA World team is currently corresponding with the INFERNAL developers to solve this issue to fully enable distributed MT applications as soon as possible.

只要是它的科学任务所支持的,RNA World都支持。以INFERNAL为例。基于OpenMPI,多线程任务可以被很好的支持。多线程任务可以在Linux下的传统计算平台下良好运行,但事实结果证实了在分布式计算平台下多线程任务无故的在一些任务的子集中遭遇失败,尽管技术上认为多线程任务工作正常。我们还没有探明这些矛盾的行为的原因,不过INFERNAL使用率随机数生成技术。该技术与独立线程的初始点:核心时钟联系紧密。因此在多核系统中,多线程任务受每个核心的影响在不同的时间被创建(尽管时间差很小)。因此,在不同的系统下(分布式运算),随机数不相同,结果也是如此(尽管差距很小)。如果随机数生成器支持混合数值,问题将在很大程度(奇怪的是,并不是全部)得到解决。而这就是基于BOINC的WU的难题。RNA World小组现在正在于INFERNAL小组进行合作来解决这个问题,以期多线程任务可以尽早展开。

什么是多线程任务?(What is a multithreaded (MT) application?)

A multithreaded application makes use of multiple CPU cores on modern machines to compute a single RNA World work unit. Our measurements for CMCALIBRATE running in a Linux environment showed that a quadcore crunches one work unit in even slightly less (!) than a fourth of the computation time that was required for singlethreaded computation of the same work unit. This shows that CMCALIRATE scales excellent in MT mode.

多线程任务可以让现代计算机上多个CPU核心只运行单一RNA World任务。对在Linux环境下CMCALIBRATE运行情况的测量显示一个4核处理器处理一个任务所需的计算时间甚至少于同一个任务使用单线程所需时间的四分之一。这表明CMCAIBRATE可以在多线程模式下完美的运行。

RNA World的研究最终是否会促进医药的发展?(Will RNA World research ultimately lead to the development of medications?)

Although nobody can forsee this for sure, it seems highly likely that the results generated by RNA World will contribute to the development of medications (see application section of this F.A.Q.).

尽管没有人预见并确信这一点,不过看上去RNA World有很高的可能性可以为医药的发展做出贡献。(参见本FAQ中的任务(介绍)部分)

What are potential applications of the RNA World project?

RNAs like proteins are macromolecules of defined structure that serve vital cellular tasks. Consequently, structural knowledge of RNAs is of similar importance for drug design as is the case for proteins. What you know as antibiotics are actually small molecule drugs that mainly target RNA components of bacterial ribosomes. This illustrates nicely how important RNAs are and what huge potential they represent in the context of drug design. However, before we can take on RNA structures to design drugs, we first need to identify which RNAs are present and where exactly they are located in the genome of any given organism. This we presently try to accomplish in a global manner using the RNA World supercomputer.

RNA World项目的结果会被公开吗?是的话,在什么样的许可下?(Will the RNA World project results be published and if so under what license?)

All RNA World results will be published in high-quality peer-reviewed science journals, preferrably in those offering an open access policy for the general public.

RNA World的所有成果将会发表在高品质,优秀的科学期刊上,尤其是那些为普通公众开放的期刊。

Are work units generated automatically or manually?

BOINC-based work units are always generated in a fully automated manner from operator-supplied input files. RNA World currently relies on operator-curated input archives that will be automatically processed by the RNA World server to yield several thousands of work units per archive. Archives can be placed in a on-hold queue such that once the server is running low on work units, it can process new ones from this supply.

We are currently working on implementing user job submission interfaces such that, under strict security guidelines, researches can use the RNA World distributed supercomputer to process their own project files. Here, we do not plan to allow batch job processing for security reasons and the users will have to register and use a digital certificate for clear identification.

It is also planned to derive work units fully automated by regularly scanning RNA-relevant databases for novel sequences that could be analyzed.

Is a continuous work supply guaranteed?

Our objective is to continuously recruit more and more RNA-relevant bioinformatic tools to RNA World. Moreover, the data sources containing RNA-relevant information that require analysis by RNA World are growing daily. To cope with these two facts, we expect that RNA World will require increasing compute capacities and consequently should be expected not to run out of work, soon. However, we are computing on an individual project basis plus we try to build up databases containing pre-computed results e.g. for listing potential RNA candidate genes in any given organism. Once our objectives are reached, we will naturally stop sending out work units until we have new projects in store. This will be announced in time on the RNA World website to avoid machines to run idle.

Are the work units identical for different operating systems?

RNA World delivers work units based on available applications. If an RNA World application is available for more than one operating system then the work units assigned are usually identical. This, however, does certainly not mean that identical work units will be computed similarly fast on systems that have identical hardware but different operating systems. The reason for this, among others, is e.g. differences in compiler performance.

RNA World有项目表现证书吗?(Does RNA World award project performance certificates?)



RNA World项目无存盘点,当我要关机时将如何保存我的任务?(How can I save my work for non-checkpointed RNA World sub-applications when I need to turn my machine off?)

If you run RNA World on a laptop or if you have multiple operating systems installed, from time to time it will be required to turn your machine off. The current RNA World sub-applications INFERNAL and InReAlyzer dot not support checkpointing, so your work would be lost if you shut down your machine the standard way. To avoid loss of your work, set the machine to enter sleep mode. Under these conditions the entire RAM will be saved to hard disk and upon system reboot your work will be reloaded into memory such that even RNA World can continue to calculate where it left off.

当您在笔记本电脑上或者在多系统计算机上运行RNA World,关机将是必须的。目前,RNA World的主项目INFERNAL和InReAlyzer均不支持存盘点,所以关机意味着任务的丢失。避免任务丢失的方法是使计算机进入休眠模式。在休眠模式下整个内存的数据将会被保存在硬盘中,当系统重新加载时数据也将一同重新加载,这样RNA World能够继续运行。

RNA World支持哪些操作系统?(What operating systems (32/64-bit) are supported by RNA World?)

Since RNA World is a framework consisting of many different bioinformatic applications, support of operating systems will always depend on the individual application. If its source code is available, the RNA World development team will do its best to make it available for as many operating systems as possible, i.e. Linux, Windows and OSX at the minimum. In addition, care will be taken to support 32-bit as well as 64-bit versions and special assembly codes (SSE, SSE2, etc.) wherever possible and useful.

由于RNA World只是一个框架,它包括众多不同的生物信息任务,因此它所支持的操作系统将取决于个人。如果它的资源代码可以获得,RNA World技术小组将尽全力支持尽可能多的操作系统平台。至少包括Linux、Windows和OSX。另外,32bit和64bit系统一样会的得到支持,特殊指令集(SSE,SSE2等)也将被支持,因为它们十分有用。

加入RNA World需要哪些花费?(What are the costs of participating in RNA World?)

None, except for your private electricty, network and hardware maintenance costs.


Rechenkraft.net 的团队成员是否有什么特权?(Are there privileges that apply only to Rechenkraft.net team members?)

As fairness dictates: No.


RNA World网站支持哪些语言?(What are the plans concerning language support on the RNA World website?)

Just take a look here: http://www.rnaworld.de/rnaworld/language_select.php


Will there be CPU-optimized applications?

These are already implemented, i.e. the initially downloaded package contains a program that checks which type of application is optimal for your machine (x86/x64, diverse SSE versions).

Is it possible to exclude participation in certain sub-applications?

Yes. In your RNA World project settings you can decide on your own which RNA World sub-applications to support and which not. In fact, if you have an older machine it might be recommended, e.g. to exclude CMCALIBRATE as work units for this program demand huge amounts of RAM and often take a long time to complete.

Is it possible to enable/disable participation in alpha/beta tests using a simple switch in the RNA World user profile?


Can I as a scientist submit tasks to RNA World?

Hopefully soon. We are working on implementing user job submission interfaces for each of the various RNA World applications. However, you will have to register and receive a digital certificate such that jobs submitted to our system are clearly correlated to an individual known to us.

What systems are going to be supported in the future?

At present we support Linux, Windows and Mac wherever possible. PS3 most likely will not be supported due to its small RAM capacity although it might be possible to use it for other applications in the future which, at present, we have not implemented yet. If we manage to establish a virtual machine approach, however, it might be possible to even support a number of additional systems.

Are BOINC and the RNA World applications safe, i.e. free from viruses and other malware?

Yes. BOINC as well as all RNA World applications are open source, i.e. can be inspected by anyone who is interested. The RNA World applications are compiled in-house using compiler tools that are widely applied public domain tools which e.g. are used to produce the code of the majority of todays webservers. Consequently, if these were malicious, we would already face a much bigger problem.

How much hard disk space is required to run RNA World?

The required disk space varies depending on the types of work units you are being assigned to. At present, RNA World core files require around 25 MB of hard disk space. CMBUILD and CMCALIBRATE work units should not require more than approximately 10 MB while CMSEARCH work units may use up 300 MB at maximum, typically between 2-20 MB. InReAlyzer hard disk space requirements cannot be predicted reliably because the number of images generated depends on the CMSEARCH output file size. However, we do not expect InReAlyzer to require much more than 1-100 MB. Remember that the sum of required hard disk space calculates from the sum of work units that have been downloaded plus the RNA World core files. Currently, a maximum of 10 work units can be downloaded per CPU core. Note that you can specify the maximum hard disk space that RNA World is allowed to occupy manually from either within you local BOINC manager or on the RNA World website.

What Internet traffic can be expected?

All files are transferred in compressed format and most files contain simple ASCII data such that compression rate is around 30%, i.e. original file sizes will be reduced to 30% of their original size. In general, CMBUILD and CMCALIBRATE work units are the smallest and should require less than 1 MB (usually even less than 100 kB) of data traffic. CMSEARCH work units cause somewhat higher download traffic depending on the size of the genome that is going to be analyzed: Current upper limit: With a maximum of 512 MB for one of the chromosomes of an opossum (uncompressed file size), 150 MB would have to be transferred (compressed file size) for a CMSEARCH work unit plus a few kB for additional control files. Of course, the upload traffic only contains the result file and not the genome that was searched for RNA presence and consequently will be much, much smaller. Normal traffic: A typical bacterial genome such as that of e.g. E. coli is about 4.6 MB (uncompressed) in size. Hence, 1.3 MB (compressed) of data plus the control files (just a few kB) will be transferred. Lower limit: Many viral genomes as well as plasmid sequences contain less than 10 kB of data in uncompressed format. However, note that small CMSEARCH work units are expected to complete quickly such that your machine may request new data over and over again depending on your systems performance.

Can RNA World be operated in offline mode?

Currently not, because we do not allow for caching of large sets of work unit packages because we require a high turn-around time. In case of CMCALIBRATE work units this is quite easy to understand, since the results of these are the basis for all subsequent CMSEARCH work units. Generally, CMBUILD results are the basis for CMCALIBRATE calculations and CMCALIBRATE results are required as input data for CMSEARCH. The output of CMSEARCH in turn then serves as input for InReAlyzer. However, since we are planning to add a set of additional applications in the future which lack these strict interdependencies, it is very likely that certain types of future work units will allow for offline computation.

What are the minimal CPU requirements for participation in RNA World?

Concerning the CPU, even processors lacking SSE such as Intel Pentium II (or older) could in principle participate since we have the appropriate applications ready for delivery. However, you should consider the average run times, deadlines and RAM requirements for certain types of work units as detailed elsewhere in this F.A.Q.

What are the network requirements for participation in RNA World?

Currently we recommend RNA World participation only for machines that are connected to the Internet on a 24/7 basis, i.e. all around the clock.

Does RNA World offer a screensaver function?


According to the server status page, work units should be available, so why don't I get any?

Assuming you have activated the type of work units announced to be available for processing in your RNA World project profile, the reason is RNA World's homogenous redundancy policy: a work unit delivered to a Linux x64 machine of a certain CPU type for example will only be sent to another Linux x64 machine for validation which has the same CPU type installed. If your system does not get any work units anymore then the remaining work indicated on the server status page can only be delivered to machines that provide an operating system and/or CPU different from yours.

I came home and my machine was basically unresponsive with multiple RNA World screensaver windows open - what is going on here?

This (rarely occuring) strange behaviour is not yet completely understood. For simplicity, our current screensaver makes use of Adobe FlashPlayer. Consequently, the problem you describe can occur only on machines where FlashPlayer is installed (on others, the screensaver function will not work). To resolve the issue, it seems you need to either upgrade to the latest FlashPlayer version or uninstall it completely from your machine (of course, uninstalling is not really a good suggestion as many websites make use of Flash).

It seems that the entire RNA World website is available only in German?

No, you can individually customize your display language. Forum settings for example are found here. Setting BOINC pages to English (only necessary if it doesn't work properly with the browser's ACCEPT setting) can be done here. Since 18th of January 2010 we have also incorporated the Boinc translation system (BTS).

I got the message "redundant result", what exactly does that mean?

First, a few remarks on the terms used in BOINC. A work unit is defined as a computational job which we would like participants to complete. A result, by contrast, is a collective term for the files which the server generates and sends to the participants. If enough results (quorum) are successful (this includes the data transfer to the participant, computation of the job, return of the result files to the server, etc.) and got validated (i.e. is identical to at least one other result successfully returned to the server), then a work unit is complete. For example, in RNA World, for each CMSEARCH-based work unit three results are being generated and sent to three different machines. If two of these (quorum) are successful and get validated, the work unit is completed. As a consequence, the third result is no longer required, i.e. it is redundant (redundant result). This third result then (1) will not be sent out again (if it has not yet been sent out), (2) will be aborted on the client machine if it has been sent out but computation has not yet commenced or (3) will be completed on receive credits if its computation has already started. We generate more results (three) per work unit than required for the quorum (two), to collect results more quickly. If we would not do it this way, we would always have to wait for the deadline to complete until the server detects that the clients do not send anything else in. Only then the server would generate an additional result and send that on out again and again wait for incoming data.

The progress bar is at 100% and seems to sit there for hours - what is happening here?

This is common behavior in BOINC projects, especially if you have just switched from another project to RNA World or if the work units of a given BOINC project are very heterogenous compared to each other. RNA World work units are de facto extremely heterogenous in their system requirements. For each computation, a series of small mini simulations is run on the server to estimate the time required for completion on the server. Since your machine differs from our server hardware, information based on the benchmarks performed from time to time on your machine are used to scale the duration determined for that work unit on the server to your machine. This scaling process is good but not perfectly accurate. So, the first work units often differ detectably in completion time from what the progress bar indicates. But, with more and more work units of that type pouring in on your system, a BOINC-integrated calculation mechanism corrects for that deviation in a progressive manner. So, with time, this "sitting at 100%" should become more and more rare. However, if the incoming work units are extremely different from each other in type (as is often the case for RNA World work units even if based on the same application), this adjustment might again turn out inaccurate for these new work units and an automatic re-adjustment will take place. In the worst case scenario, this might lead to the perception of an apparently constant unreliablity of the progress bar indicator. The bottom line is that you should just expect a work unit to take longer than indicated and not conclude there is something wrong with the work unit or your hardware.

Why is RNA World not using the standard BOINC forum?

The RNA World forums are multilingual which means there is more than just one forum and these are indeed located on a server different from the BOINC servers and from the RNA World server. The reason is that we need to make sure that forum communication remains intact even if the BOINC and the RNA World project severs are non-functional. It is actually surprising that several other DC projects do not do it the same way as we do. A single drawback is that you have to register on our forum server to make use of it but we feel that given the advantages, this drawback is bearable.

It seems a long-named RNA World work unit is blocking my entire BOINC system

This is a known issue which it is occurring only very rarely and relates to a yet unresolved bug in the BOINC manager. It is also exclusively happening on Windows-based machines. The source of the error is the fact that Windows allows only 256 characters at maximum that can be used for the sum of path name plus file name length. RNA World uses explicit file names, i.e. from the long file names the user can easily derive what is being computed on his or her machine. We would like to keep it like that to allow third-party developers to conveniently construct RNA World monitoring programs. The point is that if such a long-named work unit is being sent to your Windows machine, it will get stuck in the downloading process because it can't be written productively to your hard drive. As a baffling consequence, your BOINC manager will stop downloading work units for any DC project it is hooked up to. To resolve the issue you just have to delete that WU from within the BOINC manager. We hope that the BOINC developers will fix this issue, soon.

Can't you refresh the server status page more frequently?

We could, but we will not do that because we need to give priority to server performance. Updating the status page more frequently would cause a considerable increase in database queries which in turn cause additional server load. That power we prefer to dedicate to more important tasks as e.g. WU processing and serving. Anyway, the status page is refreshed every 10 minutes and if you browser shows something different, then you have a caching issue.

I started RNA World on my Debian Etch machine and got a lot of 'compute error' messages whereas running it on the same machine using RedHat gives no errors at all - is it possible that I experience an OS-dependent problem here?

Yes. If you have an older glibc version, please check your Linux distibution for an upgrade that employs glibc version 2.4. This will resolve the problem.

How can I monitor the RAM usage of individual RNA World WUs?

The most simple way to do this under Windows is to use the task manager (TM). But you need to tweak its settings a bit: start the TM -> choose the tab for 'processes' -> go to the TM menu and click 'view' -> 'select columns' and now check the checkbox of 'Peak Memory Usage'.