|
发表于 2005-6-14 16:20:16
|
显示全部楼层
近期的technical news:
June 13, 2005 - 19:00 UTC
There have been many failures over the past week. Another bug was found (and quickly patched) in our upload/download file server causing it to hang until reboot. Once that was remedied, both the scheduling server and the main web server had separate issues due to extremely high load.
In case you haven't noticed, we recently changed the URL setiathome.ssl.berkeley.edu - instead of pointing to the old SETI@home "classic" project, it now leads users to the new BOINC-based version. As expected, this vastly increased the number of new users joining the BOINC project, and therefore increased the strain on our back-end servers. Soon we will stop new classic account sign-ups altogether, and eventually stop accepting classic results outright (with advance warning) - each step potentially increasing the demands on our hardware.
At this exact point there is no new hardware that BOINC could use as its various servers fail for one reason or another. This is because the classic project is still active and using up half of our server farm. This was soon change.
The classic "master science database server" (a 6 CPU Sun E3500) will be the first machine to be repurposed. We're busy migrating most of its data onto a new database server (an 8 CPU E3500). This migration had been slowed by recent (recoverable) disk failures, but should finish in a month or so. Before then, however, we are going to move the BOINC scheduler onto it. The actual file upload/download handler will remain on its current server, thereby spreading the whole scheduler system over two machines.
As soon as possible, we will add a second webserver (and maybe a third). The BOINC web site contains far more dynamically-generated content than the classic site, and therefore needs more power behind it. We don't really have any spares, so some machines will have to double as web servers and whatever else they are currently doing.
And as if that wasn't enough to worry about, the BOINC replica database has continually fallen further and further behind the master database (because the load on the master increases and the replica hardware is relatively inferior). Then yesterday it was rendered useless as a binary log on the master got corrupted. This didn't damage the master database - only the replica. So we're going to have to build the replica from scratch (or hold off until we somehow obtain between hardware for that).
More to come as things progress...
[ Last edited by Youth on 2005-6-14 at 16:24 ] |
|