Predictor@home

From BOINC Projects
Revision as of 13:45, 3 June 2026 by Al Piskun (talk | contribs) (first light)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search







Predictor@home
Project
StatusCompleted
CategoryBiology and Medicine
ComputeCPU
RequiresNone
Development
DeveloperMichela Taufer, C.L. Brooks III
AuthorMichela Taufer
SponsorThe Scripps Research Institute
MaintainerMichela Taufer
Initial releaseMay 4, 2004  (22 years ago)
DiscontinuedJune 10, 2009  (17 years ago)
Software
Written inC, C++
Operating systemWindows, Linux, macOS (x86)
Metadata
Websitehttp://predictor.chem.lsa.umich.edu/

Predictor@home (also known as ProteinPredictorAtHome or P@H) was a volunteer computing project that used the Berkeley Open Infrastructure for Network Computing (BOINC) framework to predict protein tertiary structure from amino acid sequences. It was developed and run by Michela Taufer and Charles L. Brooks III at The Scripps Research Institute in La Jolla, California.[1] The project holds the notable distinction of being the first independent BOINC-based project ever launched, going live on 9 June 2004.[2][3]

Background

Protein structure prediction is the challenge of determining the three-dimensional tertiary structure of a protein solely from its amino acid sequence. It is one of the most important problems in computational biology, because a protein's structure governs its function, and knowing that structure can open the door to understanding diseases and designing new drugs.[4]

Experimental techniques such as X-ray crystallography, NMR spectroscopy, and cryo-EM can determine protein structures with high accuracy, but they are expensive and time-consuming. Computational methods can model protein structures much more quickly, but they require extensive sampling of the conformational space of possible protein shapes in order to find the most energetically stable native state. As the Predictor@home team described it:

Template:Blockquote

For a protein with n residues, the number of possible backbone conformations grows exponentially, making exhaustive search computationally infeasible on any single machine. Predictor@home addressed this by distributing the sampling problem across thousands of volunteer computers worldwide.

The project was timed to align with the biannual Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition, specifically the sixth round (CASP6) held in 2004. CASP is an international blind prediction challenge in which researchers predict the structures of proteins whose experimental structures have been determined but not yet published, providing a fair, independent benchmark of prediction methods.[5]

History

Foundation and Launch

The BOINC framework was first released in April 2002 by David P. Anderson at the University of California, Berkeley, initially developed to support SETI@home.[6] Predictor@home became the first independent project to use BOINC outside of Berkeley, launching publicly on 9 June 2004 — two weeks before SETI@home's own BOINC-based relaunch on 22 June 2004.[2][3]

The project was set up and run by Michela Taufer, then a postdoctoral fellow at The Scripps Research Institute and the University of California San Diego (UCSD), working alongside principal investigator Charles L. Brooks III in the Department of Molecular Biology.[1][7] The project URL was initially hosted at predictor.scripps.edu.

CASP6 Participation (2004)

Predictor@home was deployed specifically to compete in CASP6 and to test whether volunteer computing could deliver meaningful improvements over traditional cluster computing for protein structure prediction. During this period, the project attracted 6,786 users and accumulated over 12 billion seconds of total compute time.[1]

The project significantly increased sampling capacity by one to two orders of magnitude compared to what was achievable with a local computer cluster, greatly improving the chances of finding near-native protein conformations. For 81% of the CASP6 target proteins, Predictor@home achieved more than 3,000 independent structure samples; for 48% of targets, it exceeded 10,000 samples.[1]

Results showed that Predictor@home produced better predictions than a traditional local cluster for medium-difficulty and hard protein targets. However, very large "new fold" targets exceeding 300 residues proved more difficult, because the vast conformational space and limitations of the sampling algorithm made it hard to converge on good structures within the allotted time.[8]

During the peak of the CASP6 period, the server infrastructure could not keep pace with the rapid growth in participants, and user account creation was temporarily suspended to manage server load.[4]

The Scripps Research Institute in La Jolla, California, where Predictor@home was developed.

Later Development and Shutdown

On 6 September 2006, Predictor@home was temporarily taken offline, with no new work units being distributed. In May 2008, the project reverted to alpha status while researchers experimented with new structural prediction methods. Over the summer of 2008, the project servers were migrated from Scripps to the University of Michigan, with the new URL at predictor.chem.lsa.umich.edu.[9][10]

By December 2008, the project had not sent out any work units for several months, and BOINC statistics sites were unable to obtain updated XML data, as this had been suspended by the project team. On 10 June 2009, the Predictor@home website and forums were shut down permanently.[11]

Science

The Protein Folding Problem

Proteins are polymers made of chains of amino acids. In the cell, each newly synthesised protein spontaneously folds into a specific three-dimensional shape that determines its biological function. The sequence-to-structure relationship is encoded in the energy landscape of the protein, and the native (functional) structure corresponds to a global free-energy minimum.

For a protein chain of n amino acid residues, each with backbone dihedral angles ϕi and ψi, the number of possible conformations is astronomically large, illustrating why this is computationally hard. In practice, Predictor@home used physics-based all-atom force fields with implicit solvation models (the Generalized Born approximation) to score sampled conformations, combined with Monte Carlo conformation sampling methods developed in the Brooks lab.[4]

The task assigned to each volunteer computer was to generate independent samples of protein conformations for a given target sequence and return the resulting structures to the central server. Because each work unit was independent, the sampling was embarrassingly parallel, making volunteer computing an ideal fit.

Predictor@home visualisation of a protein structure prediction run.

Homogeneous Redundancy

Because volunteer computers may run different hardware, different operating systems, and different software versions, and because participants can potentially tamper with results, data integrity is a significant concern for volunteer computing projects. Predictor@home addressed this through a technique called Homogeneous Redundancy (HR), which validates results by sending the same work unit to multiple volunteers and using strict equality comparisons to confirm that the returned results agree.[12] This ensured that floating-point computations performed across heterogeneous hardware produced consistent, trustworthy results.

GDT Score and Prediction Accuracy

CASP evaluates predictions using the Global Distance Test (GDT_TS) score, which measures the fraction of C-alpha atoms in a predicted model that can be superimposed onto the experimental structure within a given distance cutoff. A GDT_TS score of 100 would indicate a perfect prediction. For homology modeling and fold recognition targets, Predictor@home consistently matched or outperformed the best structures found on a local cluster, demonstrating the power of massive distributed sampling.[1]

Comparison with Related Projects

Predictor@home was complementary to Folding@home, the distributed computing project run by Vijay Pande at Stanford University. While Folding@home studies the dynamics of the protein folding process (how a protein transitions from an unfolded to a folded state over time), Predictor@home was focused on identifying the final tertiary structure — the endpoint itself — regardless of the pathway taken to get there.[11] The two projects also differed fundamentally in infrastructure: Predictor@home used the open BOINC platform, whereas Folding@home maintained its own entirely separate software stack.[11]

Predictor@home also operated alongside Rosetta@home, launched in 2005 by the David Baker lab at the University of Washington, which uses a different fragment-assembly approach (the Rosetta algorithm) to predict protein structures. Each project represented a different computational philosophy for tackling the same underlying biological problem.[11]

Comparison of protein structure prediction volunteer computing projects
Project Institution BOINC? Approach Status
Predictor@home Scripps Research / U. Michigan Yes (first) Physics-based Monte Carlo sampling Completed (2009)
Rosetta@home University of Washington Yes Fragment assembly (Rosetta) Active
Folding@home Stanford University No (own platform) Molecular dynamics of folding process Active
proteins@home Ecole Polytechnique Yes Large-scale non-profit prediction Completed (2008)

Publications

The Predictor@home team produced several peer-reviewed publications describing the project's methods and results. Below is a selected bibliography.

  1. Taufer, M., An, C., Kerstens, A., Brooks III, C.L. (2006). "Predictor@Home: A 'Protein Structure Prediction Supercomputer' Based on Global Computing". IEEE Transactions on Parallel and Distributed Systems 17(8): 786-796. doi:10.1109/TPDS.2006.148}[1]
  2. Taufer, M., An, C., Kerstens, A., Brooks III, C.L. (2005). "Predictor@Home: A 'Protein Structure Prediction Supercomputer' Based on Public-Resource Computing". Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005). doi:10.1109/IPDPS.2005.335}[4]
  3. Taufer, M., Anderson, D.P., Cicotti, P., Brooks III, C.L. (2005). "Homogeneous Redundancy: a Technique to Ensure Integrity of Molecular Simulation Results Using Public Computing". Heterogeneous Computing Workshop, IPDPS 2005, Denver, April 4-8 2005.[12]
  4. Estrada, T., Flores, D.A., Taufer, M., Teller, P.J., Kerstens, A., Anderson, D.P. (2006). "The Effectiveness of Threshold-Based Scheduling Policies in BOINC Projects". 2006 Second IEEE International Conference on e-Science and Grid Computing. doi:10.1109/E-SCIENCE.2006.261172}
  5. Taufer, M., Kerstens, A., Estrada, T.P., Flores, D.A., Zamudio, R., Teller, P.J., Armen, R., Brooks, C.L. (2007). "Moving Volunteer Computing towards Knowledge-Constructed, Dynamically-Adaptive Modeling and Scheduling". 2007 IEEE International Parallel and Distributed Processing Symposium. doi:10.1109/IPDPS.2007.370668}

Legacy

Predictor@home left a lasting imprint on the history of distributed computing. As the first project to go live under the BOINC framework, it served as an early proof of concept that the BOINC platform could support independent scientific projects beyond SETI@home. Its techniques for managing result integrity (Homogeneous Redundancy) and for scheduling volunteer resources influenced later BOINC projects including Docking@home, which was a successor project also run by Michela Taufer and applied the same infrastructure to protein-ligand docking problems in drug discovery.

The Predictor@home experience also informed subsequent work on BOINC performance simulation, leading to the development of EmBOINC and SimBA (Simulator of BOINC Applications), tools for predicting how BOINC projects will perform under varying volunteer conditions, which were published by the same research group.[13]

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 Taufer, M., An, C., Kerstens, A., Brooks III, C.L. (2005). "Predictor@Home: A 'Protein Structure Prediction Supercomputer' Based on Global Computing". IEEE Transactions on Parallel and Distributed Systems. doi:10.1109/TPDS.2006.148}
  2. 2.0 2.1 Marc Seil (17 January 2007). "BOINC History". BOINC message boards. Retrieved 2022-11-05.
  3. 3.0 3.1 Predictor@home. Wikidata. Retrieved 2024-01-01}.
  4. 4.0 4.1 4.2 4.3 Taufer, M., An, C., Kerstens, A., Brooks III, C.L. (2005). "Predictor@Home: A 'Protein Structure Prediction Supercomputer' Based on Public-Resource Computing". Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005). doi:10.1109/IPDPS.2005.335}
  5. CASP6. Prediction Center. Retrieved 2024-01-01}.
  6. Anderson, D.P. (2004). "BOINC: A System for Public-Resource Computing and Storage". 5th IEEE/ACM International Workshop on Grid Computing, pp. 365-372. doi:10.1109/GRID.2004.14}
  7. Michela Taufer. Computing Research Association. Retrieved 2024-01-01}.
  8. Taufer, M. et al..Predictor@Home: A "Protein Structure Prediction Supercomputer" Based on Global Computing. Retrieved 2024-01-01}.
  9. (2008-02-16}).Predictor has finished moving to Michigan. Team MacNN. Retrieved 2011-09-21}.
  10. (2008-11-08}).Predictor@Home (archived). Retrieved 2022-11-05}.
  11. 11.0 11.1 11.2 11.3 Predictor@home. Wikipedia. Retrieved 2024-01-01}.
  12. 12.0 12.1 Taufer, M., Anderson, D.P., Cicotti, P., Brooks III, C.L. (2005). "Homogeneous Redundancy: a Technique to Ensure Integrity of Molecular Simulation Results Using Public Computing". Heterogeneous Computing Workshop, IPDPS 2005, Denver.
  13. Taufer, M., Kerstens, A., Estrada, T., Flores, D., Teller, P.J. (2007). "SimBA: A Discrete Event Simulator for Performance Prediction of Volunteer Computing Projects". 21st International Workshop on Principles of Advanced and Distributed Simulation. doi:10.1109/PADS.2007.27}

External links