|
WCG弹出消息这个。。求翻译一下。
It is certainly time for another update:
The rice structures have been clustered to find the best representative structures and these structures have been compared to libraries of known structures to determine their function. This took several months on our local computing cluster.
We have also used an independent method based only on sequence to determine the function of the genes. This only gave us confident assignments in about 1% of the cases. Nevertheless, this will give us a decent sized gold-standard set with which we can assess how accurate the NRW structure based methods are for predicting gene function.
However, as I was benchmarking the clustering method, I found, quite unexpectedly (which is why you do the benchmarking...), a very good alternative method for clustering the structures that should improve the accuracy of the final predictions.
The structures are being reclustered using the new method.
The comparison and analyses of the function predictions using the original clustering method is underway while this is being done.
Once this is done, we will publish the results and put all the structures up on the website.
Another paper is close to being submitted detailing the new very fast and accurate clustering methods that were developed to deal with the large datasets generated by NRW. The resulting software package Protinifo-cluster is GPU/SSE/AVX accelerated and optimised. It will be released to be used without restriction and should be of use to the general protein folding community. This is the software being used for the re-clustering.
As for further NRW projects - there are no plans for a NRW2 but some plans in development for one based on the 1000 plant project, which is about to release the sequences of 1000 plant genomes. This will depend of course, on how well the methodology worked with rice.
Hong
|
|