楼主: vmzy

[项目新闻] United Devices [已结束]

 楼主| 发表于 2005-12-22 23:41:22 | 显示全部楼层
We have made some great progress on getting the new cancer job ready. There was an issue with the format of the data we received from Oxford and we believe that we have resolved this. I have been able to get successful cancer runs using the new data on an internal system. Hopefully we will be able to get some of the data loaded onto grid.org today or tomorrow.

In the past we have utilized the beta server and the beta tester team to help verify that things are working correctly before moving to grid.org. Due to hardware issues this is not going to be possible this time. If you are one of the beta tester members please do not be offended at this. We plan to reinstate the beta tester group when we migrate to the new system. I know everyone is anxious about the cancer data so we will not wait.

What this means is that it is possible that there may be a problem with the job when we submit it. I will be keeping an eye on the system and will be able to flip back to the current job quickly if necessary. Since the current job is basically stale data anyway this should not upset anyone. I will post again once the new job is submitted.

I have submitted (one of) the new cancer jobs. I am currently processing ligand 29 of 600 and it appears to be working correctly. We will keep a close eye on it for any problems. The old cancer job has been set to not dispatch any more workunits, but is still active so that any outstanding results can be returned. As I mentioned, there may be a need to do some additional tweaking on the new job, but as of now it looks good.




我试算了一个新癌症任务。我当前已经处理了29/600 ligand,看起来运行正常。我们将留意它是否有问题。老癌症任务已经设置成,停止分发任务模式,但项目仍然是激活状态,以便所有有用的结果可以被上传。如同我全面提到的,新任务也许还需要做一些额外的调整,虽然目前看来它一切正常。


参与人数 1基本分 +40 维基拼图 +20 收起 理由
霊烏路 空 + 40 + 20



使用道具 举报

 楼主| 发表于 2005-12-30 22:43:22 | 显示全部楼层
Unable to connect to UD server

The City of Austin had a transformer blow that affected power to UD during the last 24 hours. This caused Grid.org to be unavailable temporarily. The problem has since been remedied and I have verified that I am able to pull a workunit now from the UD server. Since there are many devices all trying to hit the UD server at the same time, there are still some backoff messages occurring. It took me about 10 minutes to get a successful connection and a download to occur. These will go away as the load subsides. Give it a little time and everyone should be crunching away normally again.




参与人数 1基本分 +30 维基拼图 +15 收起 理由
霊烏路 空 + 30 + 15



使用道具 举报

 楼主| 发表于 2006-1-4 18:00:13 | 显示全部楼层
I hope everyone had a happy new year. Here is the latest status:

Grid.org Outage - As I posted previously, we had a temporary outage due to a blown City of Austin transformer (a rogue squirrel is suspected). This outage should not have required any reregistrations. There was a temporary period where devices may have been "Backing off..." due to a high load after the servers became available, but this should no longer be occurring.

New Cancer Data - A small batch of the new cancer data has been uploaded and members are currently crunching away at it. This is a very small subset of the data we received, ~20 WUs, just so we can verify that we are getting the results we expect. I plan on sending some of the results to our Oxford contact today to verify these are the results they expect. Once that is confirmed, I will upload more data. There have been a few members complaining of occassional lost results. I am not sure how wide spread this is yet nor the cause of the loss. Other members have reported complete success so we will need to investigate this more closely. Note that the old cancer job has been disabled so if you are working on the cancer project, you are crunching the new data.


Grid.org停机 - 如同我前面的帖子所言,由于奥斯汀城的一个变压器爆炸,我们临时停机了(怀疑是一只流氓松鼠干的)。这次停机不需要用户重新注册。由于高流量负载在服务器恢复之后有段时间也许有设备遇到"Backing off..."信息,但这应该不会再发生了。

新癌症数据 -  一小批新癌症数据被上传了,并且成员正在计算它。这是我们接受数据的一个非常小的一部分,约20 个任务,通过他们我们希望能核实,我们是否取得了我们期望的结果。我计划在今天寄发一些结果到我们的牛津部门核实这些结果是否是他们所期望。一旦他们被证实OK,我将上传更多的数据。有是几名成员报告说偶尔结果会丢失。我不敢肯定它是不是普遍现象,更不知道丢失的起因。其它成员报告说完全正常,因此我们将需要严密调查此事。注意老癌症任务已经结束了,因此如果您想研究癌症项目,您需要等新数据。


参与人数 1基本分 +36 维基拼图 +18 收起 理由
霊烏路 空 + 36 + 18



使用道具 举报

发表于 2006-1-4 18:18:15 | 显示全部楼层
提示: 作者被禁止或删除 内容自动屏蔽

使用道具 举报

发表于 2006-1-5 12:53:12 | 显示全部楼层
提示: 作者被禁止或删除 内容自动屏蔽

使用道具 举报

 楼主| 发表于 2006-1-5 15:51:24 | 显示全部楼层
Unable to connect for Cancer job

There was a small issue with the cancer job that caused devices to not be able to download a new cancer WU. This was an unforseen side affect of trying to retreive results so that we can have them verified with our Oxford contact. The process consists of running a script that verifies minimum successful results, retreives them from the grid server, and aggregates them into a known format. Apparently the script marks a WU as complete during the final stages of processing. I have since reset the WUs so they will dispatch again.

To try to prevent this from happening again, I am going to submit another cancer job (a duplicate of the current one) that I can experiment with hopefully without affecting the current one.

Note that until I can successfully get the retrieval script to work and get the results to our Oxford contact for validation, I will not be able to upload the bulk of the new data. Please have a little patience during this phase of the project. As I mentioned, in the past we had the luxury of a beta test system to work these issues out, but unfortunately that is no longer the case. This requires us to do some testing on the production system which we really do not like to do.





强烈BS gird.org的浪费行为!


参与人数 1基本分 +40 维基拼图 +20 收起 理由
霊烏路 空 + 40 + 20



使用道具 举报

 楼主| 发表于 2006-1-11 23:18:54 | 显示全部楼层
January 10, 2006
Sorry for the late status, but I wanted to finish a test before posting:

New Cancer Data - I have been running some tests on an internal system and believe that I have some good news. It appears from my testing that everything is working as expected. There was a false alarm last week when I reported that we were ready to send results to our Oxford contact. Upon further investigation it was seen that there were some missing output files. This problem has been resolved internally so I am ready to grab the results from the new test job that has been running on Grid.org to see if the results are the same.

I will need to run the result aggregation script which is always run after a job is complete. An effect of running the script is that the workunits will all be marked complete and dispatching will stop. This may result in a lost workunit result or two. I will re-enable the job as soon as the result script completes so there will be minimum outage.

Lost Workunits - There have been some complaints of lost workunits with the new data. Since we have many results for each workunit, this is not a problem with the new cancer data per se. We will keep investigating this issue until we understand what is happening.


新Cancer数据 - 我在一个内部系统上进行了一些测试,我相信取得了一些好的进展。从我的测试看来,一切运作如愿。上星期当我向我们的牛津方面报告我们准备好寄发结果时出现了一个假警报。进一步调查发现,有一些输出文件丢失。这个问题已经被内部解决了,因此我准备从Grid.org上运行的新测试任务中提取结果,看和以前的结果是否相同。


丢失任务 - 有人抱怨说新数据会丢失任务。因为我们每个任务都有许多结果,并不是所有的新癌症数据结果都有问题。我们将继续调查这个问题直到我们了解发生了什么错误。

January 10, 2006
I have sent the results from our small test job to our Oxford contact. We will now have to wait to see what they say. As soon as I get confirmation, I can upload all of the new data. Note that we know that there is a problem with the format of the data. I had to manually convert the data in order to run our test. I am asking our contact to provide us with the correct format so that we know the input data is valid from their point of view.



参与人数 1基本分 +40 维基拼图 +20 收起 理由
霊烏路 空 + 40 + 20



使用道具 举报

 楼主| 发表于 2006-1-17 14:58:14 | 显示全部楼层
January 16, 2006
Not a lot to mention this week:

New Cancer Data - The results from the current job have been sent to our Oxford contact. We must now wait for the results to be verified. Note that there is already one issue. Oxford is requesting an additional tag Molecule_ID be added to the input data. Hopefully this is something they will be able to deliver. As soon as I hear back, I will post.

Lost Workunits - One of our 3 servers that handle dispatch and result retrieval was not working properly. I do not see how that would cause workunits to be lost, but I guess it is possible. That server has been fixed and is working properly now. Let's keep an eye on the workunits and see if this has a positive affect on those experiencing the problem.


新Cancer数据 - 当前任务的结果已经寄到了我们的牛津联络处。我们现在必须等待结果被核实。注:已经出了一个问题。牛津要求为输入数据添加一个额外的Molecule_ID标记。希望他们能详细叙述一下细节。当我得知后,我将尽快发帖通知大家。

丢失任务 - 我们的3台负责发放和回收任务的服务器中的一个出了问题。我没发现它怎么会导致任务丢失,但我猜测这是可能的。服务器已经被修好了,现在工作正常。我们会留意任务,看这是否会对那些烦人的问题有一个正面的积极影响。


参与人数 1基本分 +18 维基拼图 +9 收起 理由
霊烏路 空 + 18 + 9



使用道具 举报

 楼主| 发表于 2006-2-11 15:06:25 | 显示全部楼层
January 20, 2006

We recently had a problem with the forums. The problem occurred due to a logfile exceeding a maximum size which caused the apache web server to crash. It took a while to track down this issue, but the forums should be working correctly now. Note that this problem only affected the forums and not any of the Grid.org job processing.




参与人数 1基本分 +16 维基拼图 +8 收起 理由
霊烏路 空 + 16 + 8



使用道具 举报

 楼主| 发表于 2006-2-11 15:26:54 | 显示全部楼层
January 26, 2006

I apologize for not posting a status on Monday, but we are having a company conference this week which is consuming most of my time. The status of the new cancer data is the topic everyone is interested in so here is a very quick status.

I have a conference call with our Oxford contact tomorrow morning to discuss the new data and the results I sent a couple of weeks ago. After the call, I hope to have much more information to share. I know there is frustration about not having all of the new data loaded for members to work on. Since Oxford is the one that will ultimately be using the results, I have had to wait for them to respond as to the validity of the results. That is out of my control.

Regardless of what Oxford has to say, I will load some more of the new data next week. If the data must be recrunched later, so be it. I was trying to minimize the amount of data that must be reworked to cut down on the complaints about wasted time later. Since there are already complaints about time wasted crunching the same workunits, I guess it does not matter.

Please understand that having the data validated by Oxford is the bottleneck here and that United Devices is ready to upload all the new data as soon as the results are confirmed.





请理解牛津校验数据这个瓶颈的存在,但一旦结果得到确认United Devices将会尽快将所有新数据全部上传。


参与人数 1基本分 +30 维基拼图 +15 收起 理由
霊烏路 空 + 30 + 15



使用道具 举报

 楼主| 发表于 2006-2-11 15:38:02 | 显示全部楼层
January 30, 2006

New Cancer Data - I had a conference call with our Oxford contact on Friday. There is an additional field they would like in the results so they have sent a sample data file containing this information. I will be uploading this later today or tomorrow. Hopefully this will produce the results they are expecting and we will be able to make all of the new data available for members later this week or next.

Rosetta - The current batch has been processed and we will be making a new batch available later this week. As always, there will be a two week period for any outstanding workunits to be credited.

Team Stats - For some reason these are missing for the 26th. The stats job will be rerun shortly to pick up this day.

Thanks to everyone for their contribution.


新癌症数据 - 星期五我和牛津联络处开了个电话会议。由于他们需要结果的一些额外的信息数据,所以他们发给我们了包含该信息的新样本数据。我将在今晚或明天把这些上传,希望这能得到他们想要的数据,希望我们能在本周末或下周可以把所有数据对用户公开。

Rosetta - 当前一批任务已经处理完了,本周末我们会开放一批新任务。如常,需要2周时间来处理积分发放。

小组统计 - 由于某些原因,26号的统计信息丢失。今天我们将会尽快恢复这些统计。


[ Last edited by vmzy on 2006-2-11 at 15:48 ]


参与人数 1基本分 +20 维基拼图 +10 收起 理由
霊烏路 空 + 20 + 10



使用道具 举报

 楼主| 发表于 2006-2-11 15:49:33 | 显示全部楼层
February 02, 2006

I have just uploaded a portion of the latest data I received from Oxford. My internal tests looked good as far as the workunits being able to be processed. We will let this job run for a while before shipping the results back to Oxford for verification. Note that the previous cancer job is still running as well, so it is up to the dispatcher as to whether you get one of the new workunits or not. As soon as I see a few successful results, I will stop the previous job from dispatching so everyone will be crunching the new data.




参与人数 1基本分 +16 维基拼图 +8 收起 理由
霊烏路 空 + 16 + 8



使用道具 举报

 楼主| 发表于 2006-2-11 15:56:34 | 显示全部楼层
February 03, 2006

I have received confirmation from our Oxford contact that the sample result data from the latest cancer job looks good. This means that we are ready to start crunching all of the new data. The new data will be uploaded in pieces (jobs) just like we have been doing with Rosetta. When a job has completed, I will allow at least a week for any outstanding results to be uploaded by members.

This is great news for all of us. Thank you for your continued patience and contribution.





参与人数 1基本分 +16 维基拼图 +8 收起 理由
霊烏路 空 + 16 + 8



使用道具 举报

 楼主| 发表于 2006-2-11 16:37:28 | 显示全部楼层
February 06, 2006

Cancer data - As I mentioned previously, Oxford has verified our previous results and has released all of the new data. There is plenty to keep us busy for a while. They also wish to run all of the new data against the previous protein when we are done with the current protein.

Some members are experiencing aborted WUs with the new data. We are currently investigating this issue. The problem is a bit elusive since not all members are experiencing it and some members are experiencing it much more frequently than others. I have verified that there are results for each and every WU and that the total is approximately the same for all. I have also verified that there are approximately the same (small) number of errors returned for each WU.

This tells me that there is not a problem with any particular WU, but some other issue. If there was a bad WU, we would see either a significantly lower number of results or a significantly greater number of errors. This is not the case. We will keep looking into this issue until we find a solution.

These WUs appear to be processing much faster than the previous batches. I am not sure of the reason for this and will rely on Oxford to tell us if something is not right. Note that there is a bunch of data to be processed. I am not quick to modify the job configuration (adding more Ligands) since the next ones I upload may take longer.

It has been noticed that the number of hits is high for some of the WUs. Again I do not know the reason for this and will have to rely on Oxford to tell us if something is wrong.

Thank you for your contribution.


癌症数据 - 如前所述,牛津已经检验通过了我们前面发送的结果,并发布了所有的新数据。这够我们算一阵的了。同时他们希望当我们完成当前蛋白质的计算后,用新数据把以往的蛋白质也算一遍。







参与人数 1基本分 +40 维基拼图 +20 收起 理由
霊烏路 空 + 40 + 20



使用道具 举报

 楼主| 发表于 2006-2-11 16:38:20 | 显示全部楼层
February 10, 2006

We have finished processing the current Rosetta job. A new one will be uploaded tonight. Some members may have experienced a "Cannot connect" message due to this. Until the new job is active, I have reset the current one to continue to dispatch to prevent these messages.

我们已经完成了当前Rosetta任务的处理工作。今晚将会上传新一批数据。因此一些用户可能会收到“Cannot connect”消息。等到任务上传完毕后,我才会开放任务发放。到时这些信息将不会再出现。



参与人数 1基本分 +15 维基拼图 +7 收起 理由
霊烏路 空 + 15 + 7



使用道具 举报

您需要登录后才可以回帖 登录 | 新注册用户



Archiver|手机版|小黑屋|中国分布式计算总站 ( 沪ICP备05042587号 )

GMT+8, 2024-12-14 08:49

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表