|
发表于 2010-10-14 18:02:30
|
显示全部楼层
本帖最后由 vmzy 于 2010-10-15 09:50 编辑
Filesystem Error
Oct 13, 2010 8:22:30 PM
We just experienced an error with the BOINC filesystem. As a result, the operating system re-mounted it in read-only mounted. We are now in the process of taking it offline and running an fsck on the filesystem. While that runs BOINC will be offline and not able to send or receive files.
Oct 13, 2010 10:21:58 PM
The fsck is likely to take many hours to run.
Oct 13, 2010 11:47:12 PM
We have modified the server to return a 503 error with a one hour retry for all attempts to download files.
Oct 14, 2010 12:21:52 AM
The fsck is likely to take a long while (many hours). We are still in phase 1 of fsck (out of 5).
Oct 14, 2010 11:21:45 AM
We have brought the filesystem online but we aren't sure what condition it is in. Instead of making it available, we have made it read-only and we are copying the data to our new filesystem that we were in the process of creating. The copy is now occurring but it will still be a number of hours to be available.
Oct 14, 2010 11:32:35 AM
Copy is 14% complete. I will update this count periodically.
Oct 14, 2010 12:20:20 PM
21.8% copied
Oct 14, 2010 1:02:50 PM
32.3% copied.
Oct 14, 2010 1:02:50 PM
We have two sets of copies running. One for the workunit input files (download files) and one for the result files (upload). The upload is running faster than the download. As a result, we might be able to bring the uploads online before the downloads are available.
Oct 14, 2010 2:13:18 PM
Overall 50.5% copied
Upload 81.5% copied
Oct 14, 2010 3:00:44 PM
Uploads are back online and handling the surge easily.
Overall copying is at 61.2%. We are watching to see how fast this progresses now that the upload copies are done.
Oct 14, 2010 4:23:17 PM
The new filesystem is supporting the uploads just fine so that looks good.
We are at 77.2% complete for the overall copy.
When the copy finishes we will need to check a few things out and then we will open up the scheduler and allow new work to be issues and file downloads to occur.
Oct 14, 2010 5:58:02 PM
We are at 94.8% complete.
Oct 14, 2010 6:32:47 PM
We are at 101.2% complete.
The denominator in my % complete estimate doesn't include the additional data that has been uploaded to the servers since we re-enabled the uploads. So we are close - but I don't know exactly how close we are.
Oct 14, 2010 7:06:29 PM
Sorry folks - it is going to be a bit longer. The block size is different between the two filesystems so the amount of data copied was being shown differently. As a result, my % complete computations were incorrect. We still have some distance to go. I'll work on a better way to inform you of the % complete.
Oct 14, 2010 8:11:12 PM
Ok - copies are done. We have a bit of work to do to reconfigure things and we have stopped uploads again while we do this work.
Once we are done with the reconfiguration we should be able to bring everything back online.
Oct 14, 2010 8:54:08 PM
We have finished configuring things on the backend. Work is flowing to the members.
At the moment the servers are being hit hard and you might get some connection errors. They will recover over the next 30 minutes or so.
However, we know that some number of files have been corrupted. BOINC will download these files and verify them with the signature created before the filesystem was corrupted. If the file does not match the signature, it will report the result as an error. We expect to see some of these over the next week or two. We ask your patience as we work through these errors and get the workunits reloaded with fixed files.
大意:
BOINC服务器的文件系统挂了,正在恢复中,估计需要很多小时。 |
评分
-
查看全部评分
|