|
楼主 |
发表于 2007-7-4 11:03:17
|
显示全部楼层
附带乱七八糟不通顺的烂翻译
10:39 June 22, 2007 SciLINC Update
When development of the SciLINC project began it had four primary goals. Edited for brevity, they were:
1. Increase public access to nationally significant scientific literature.
2. Enhance the usefulness of digitized materials by creating a Web repository of scanned literature, keywords, and online resources with tools for searching and analysis.
3. Create an educational tool for learning about plant life. While the screensaver application is indexing keywords, the participant's computer will display information about plant life within the United States and around the world. The information displayed will describe each plant name or term currently being indexed on the participant's computer, and will include descriptive data, images, maps, and the annotated outlinks for that term.
4. Provide a model for adopting public-resource computing applications within the library community.
Botanicus is doing a wonderful job of meeting goals 1 and 2 including processing data generated by SciLINC. The project has certainly also meet goal 4.
We have learned much about grid-based, distributed, public-resource computing applications and the BOINC architecture. There are thoughts and plans for analyses down the road that will be much more computationally intensive than the original SciLINC analysis and we look forward in time to bringing these projects to you.
While the amount of data that SciLINC has to analyze will increase greatly in the days ahead it does not appear that increasing the volume of information is going to improve the user experience of running the SciLINC client.
It has been suggested that we repackage our data into single files instead of uploading and downloading 50 files per workunit as we currently do. This suggestion has been heeded and implemented. We had planned on doing it before SciLINC was rolled out but scheduling prevented it and the community discovered the project before we were ready to announce it. We expect that testing will show the repackaging lessens the load placed upon the core BOINC client software. But, it does not change the amount of data being transferred.
The truth is that the workunits fly by so rapidly that implementing goal 3 never became realistic.
When development of SciLINC began, the project lead's understanding was that from a technological and economic standpoint it makes sense to use public-resource computing in place of an internal grid computing architecture whenever less than a gigabyte of data is required per cpu-day of computation. Using the BOINC framework to transfer the data to clients, SciLINC meets this volume-of-computation guideline.
However, our brief experience with the dedicated BOINC community over the last couple weeks has shown that, to the community these numbers may differ somewhat. In its original form SciLINC would have needed to transfer roughly 250MiB of compressed data in order to occupy a modern CPU for a day. This would expand to nearly 660MiB of input data. Then the client would need to upload about 44MiB of results which would compress to 17MiB. These numbers have only grown as SciLINC has been improved and made more efficient.
This is not acceptable to the average BOINC user.
Looking at the numbers from the perspective of someone on dial-up, if they set SciLINC to only 1% of their BOINC time, this would be roughly 15 minutes out of a day. For this 15 minutes they would have needed to download around 2.5MiB of data. This may not be a huge issue for broadband users, but if someone is on dial-up (as we have learned many BOINC fans still are) the transfer time would exceed the computation time.
So, where are we now?
Even if the transfer:credit ratios were acceptable to the community, we do not have enough data to realistically occupy hundred or thousands of BOINC enthusiasts for a lengthy period of time. As we have already seen on various community boards a relatively small amount of credit is earned for a comparatively large load on their system resources. Any computational and transport related improvements that have been tested have only resulted in more data needing to be transferred.
As stated above, we are investigating the possibility of performing much more computationally intensive analyses in the months ahead. It is expected that these will be a much better fit for a BOINC project than the current task of text-indexing and taxonomic analysis which has a relatively low mathematical complexity.
Because of this it has been decided that for now all SciLINC computation will be performed internally. When we have something with a better credit-reward ratio (and nicer screensaver) it will be made available to the community.
Thank you again for your interest and support. We look forward to working with you in the future.
The SciLINC Team
This has been cross-posted to the forums for discussion and feedback.
*********************************************************************
SciLINC项目在开展之处就有4个基本目标。简单地说,就是:
1. 增进公众接触全国的重要的科学文献的机会
2. 通过创建一个具有搜索和分析工具的经过扫描的文献,关键字及在线资源的网络知识库,来加强数字化资料的实用性
3. 创建一个帮助学习植物的教学工具。当屏保程序标注出了关键字时,参加者的电脑将会显示美国乃至全世界的植物的信息。显示的信息将描述各个植物的名称及正显示在参与者电脑上的术语,并将包括描述资料、图像、地图以及与术语相关的外部链接。
4. 提供一个能将公众的计算机资源采入到这个图书馆体系中的模型。
Botanicus正在向着目标1和2进行着很不错的工作,包括处理由SciLINC生成的数据。该项目当然也在进行目标4.
我们学习了很多关于基于网格的分布式的公众计算资源程序以及BOINC构架。在这样的思路下,分析者会得到比原有的SciLINC分析方法要好得多的计算方法的构想和计划。我们也希望能够及时的把这些项目带给你们。
当SciLINC需要分析的数据量即将极大的增加时,并没有显示出增加的信息量将改善用户使用SciLINC客户端的体验。
我们被建议把我们的数据重新打包成一个单独的文件,用以取代目前我们所采用的每个WU需上传和下载50个文件的方式。这个建议得到了我们的注意,并被我们执行了。我们本计划在SciLINC大规模展开前就这样做,但时间安排阻止了我们,并且该项目在我们准备好宣布它之前就被外界发现了。我们希望测试能够体现出打包文件可以减轻BOINC客户端软件的负担。但是,这样并没有改变被传输的数据量。
事实是WU进展的如此之快以至于目标3从来都得不到执行。
当开始开发SciLINC时,项目主管的看法就是从一个合理的科技和经济学的立场,来使用公众计算能力来代替内部的网格计算架构,并且每个CPU-day的计算只需不到一个GB的数据(?)。通过BOINC框架来把数据传送给客户端,SciLINC达到了这个计算量的方针。
然而,BOINC社区在过去的两周的重要经历显示出,这些数字有些不同。原先从SciLINC将需传输大约250MB的压缩数据来用于占用一个普通CPU一天的计算。这将扩大为几乎660MB的原始输入数据。然后客户端需要上传44MB的结果(被压缩为17MB)。只有SciLINC被改进的更加高效的时候,这些数字才会增加。
这是无法被一般的BOINC用户接受的。
从一些人的拨号上网的角度来看,如果他们设置SciLINC只占用他们1%的BOINC时间,这将是大约每天15分钟。在这15分钟他们需要下载大约2.5MB的数据,这对于宽带用户来说将不会是一个大问题,但如果是拨号上网的话(我们已经知道很多BOINC爱好者仍是这样的),传输所需的时间将超过计算所需的时间。
所以,现在我们身处何处?
即使这个传输:信用的比率被大家所接受,我们也没有足够多的数据来实际的占用成千上万的BOINC狂热者一个较长的时期。我们也看到了许多讨论版谈到了在占用了他们较多的系统资源的情况下得分却较少的情况。任何正在测试的与计算和传输相关的改进都会导致需要传输更多的数据。
鉴于以上原因,我们正在研究关于在下个月提供更多的计算强度的分析的可能性。我们期望它能够比现在的低精度低复杂的任务模式更加适合于作为一个BOINC项目。
因此我们决定眼下所有的SciLINC将会在内部进行。当我们有了更好的信用-奖励比率(以及更好的屏保)时,我们将会向公众开放。
再次感谢您的关心和支持。我们期待日后与你们的共同工作。
The SciLINC Team
[ 本帖最后由 Julian_Yuen 于 2007-7-4 16:42 编辑 ] |
|