Proteins@home: Difference between revisions
formatting |
No edit summary |
||
| Line 21: | Line 21: | ||
}} | }} | ||
'''Proteins@home''' (styled ''proteins@home'') was a non-profit [[volunteer computing]] project built on the [[Berkeley Open Infrastructure for Network Computing]] (BOINC) platform.<ref name="wikipedia">{{cite encyclopedia |title=Proteins@home |encyclopedia=Wikipedia |url=https://en.wikipedia.org/wiki/Proteins@home |access-date=2026-06-08}}</ref> The project ran from December 28, 2006 to June 2008 and was operated by the Laboratoire de Biochimie (CNRS UMR 7654) in the Department of Biology at [[École Polytechnique]], located in Palaiseau, near Paris, France.<ref name="boinc-news">{{cite web |url=https://boinc.berkeley.edu/forum_thread.php?id=5136 |title=The Proteins@Home project is now open |publisher=BOINC Message Boards |date=2006-12-28 |access-date=2026-06-08}}</ref> Its scientific goal was to map the ''inverse protein folding problem'' across approximately 1,500 representative [[protein fold|protein folds]], building a database of pairwise energy functions that could be used to predict protein structure, understand protein evolution, and design new proteins with potential biomedical applications.<ref name="projdescription">{{cite web |url=https://web.archive.org/web/20070601000000*/http://biology.polytechnique.fr/proteinsathome/ |title=proteins@home |publisher=Wayback Machine |access-date=2026-06-08}}</ref> | '''[https://web.archive.org/web/20070306022756/http://biology.polytechnique.fr/proteinsathome/ Proteins@home]''' (styled ''proteins@home'') was a non-profit [[volunteer computing]] project built on the [[Berkeley Open Infrastructure for Network Computing]] (BOINC) platform.<ref name="wikipedia">{{cite encyclopedia |title=Proteins@home |encyclopedia=Wikipedia |url=https://en.wikipedia.org/wiki/Proteins@home |access-date=2026-06-08}}</ref> The project ran from December 28, 2006 to June 2008 and was operated by the Laboratoire de Biochimie (CNRS UMR 7654) in the Department of Biology at [[École Polytechnique]], located in Palaiseau, near Paris, France.<ref name="boinc-news">{{cite web |url=https://boinc.berkeley.edu/forum_thread.php?id=5136 |title=The Proteins@Home project is now open |publisher=BOINC Message Boards |date=2006-12-28 |access-date=2026-06-08}}</ref> Its scientific goal was to map the ''inverse protein folding problem'' across approximately 1,500 representative [[protein fold|protein folds]], building a database of pairwise energy functions that could be used to predict protein structure, understand protein evolution, and design new proteins with potential biomedical applications.<ref name="projdescription">{{cite web |url=https://web.archive.org/web/20070601000000*/http://biology.polytechnique.fr/proteinsathome/ |title=proteins@home |publisher=Wayback Machine |access-date=2026-06-08}}</ref> | ||
== Background == | == Background == | ||
| Line 118: | Line 118: | ||
* [https://web.archive.org/web/20080101000000*/http://biology.polytechnique.fr/proteinsathome/ Proteins@home official website (Wayback Machine)] | * [https://web.archive.org/web/20080101000000*/http://biology.polytechnique.fr/proteinsathome/ Proteins@home official website (Wayback Machine)] | ||
* [https://proteus.polytechnique.fr/ Proteus] — successor software package, École Polytechnique | * [https://proteus.polytechnique.fr/ Proteus] — successor software package, École Polytechnique | ||
* [https://commons.wikimedia.org/wiki/Category:Proteins@home Proteins@home] at Wikimedia Commons | * [https://commons.wikimedia.org/wiki/Category:Proteins@home Proteins@home] at Wikimedia Commons | ||
Revision as of 14:26, 8 June 2026
Proteins@home (styled proteins@home) was a non-profit volunteer computing project built on the Berkeley Open Infrastructure for Network Computing (BOINC) platform.[1] The project ran from December 28, 2006 to June 2008 and was operated by the Laboratoire de Biochimie (CNRS UMR 7654) in the Department of Biology at École Polytechnique, located in Palaiseau, near Paris, France.[2] Its scientific goal was to map the inverse protein folding problem across approximately 1,500 representative protein folds, building a database of pairwise energy functions that could be used to predict protein structure, understand protein evolution, and design new proteins with potential biomedical applications.[3]
Background
Protein folding and the inverse problem
Every protein is a chain of amino acids. The linear sequence of the chain — the primary structure — ultimately determines the protein's three-dimensional shape, or fold. Formally, a protein of length has a primary structure where each belongs to the set of 20 standard amino acids. The chain folds by minimising its free energy, which includes contributions from electrostatics, van der Waals forces, and interactions with the solvent.
The inverse of this prediction problem asks: given a known three-dimensional fold, which amino acid sequences are compatible with it? This is known as the inverse protein folding problem or computational protein design (CPD). It has applications in understanding protein evolution, identifying stabilising mutations, and engineering entirely new proteins for biomedical or industrial purposes.
A key feature that made the problem tractable for distributed computing is that the energy can be expressed as a sum over all pairs of residue positions:
where is the pairwise interaction energy between amino acid types and at positions and in rotamer conformations and . Because each pairwise term is independent of all others, the energy table can be precomputed in parallel across thousands of volunteer computers with almost no communication required.[4]
Project description
Launch and operation
Proteins@home was formally announced as open on December 28, 2006, when BOINC project administrator David Anderson posted on the BOINC message boards that the project was "now open" and "based at the École Polytechnique in Paris."[2] Volunteers could register and download the BOINC client to begin donating CPU cycles to the project.
The research team was led by Thomas Simonson, with contributions from Marcel Schmidt am Busch, Anne Lopes, David Mignon, Thomas Gaillard, Najette Amara, and Christine Bathelt, all based at the Laboratoire de Biochimie (CNRS UMR 7654), Department of Biology, École Polytechnique, 91128 Palaiseau, France.[6][7]
The BOINC news feed recorded on February 7, 2008 that "Proteins@Home has resumed operations",[8] indicating a temporary interruption before the project reopened to participants. The project concluded in June 2008.
During its operational period, the Proteins@home distributed computing platform was used by volunteers in over 100 countries.[9]
Computational methodology

Each work unit sent to a volunteer computer contained the structural coordinates of one or more protein backbone templates drawn from a representative subset of the Structural Classification of Proteins (SCOP) database. For each template, the XPLOR molecular modelling program was used to precompute the pairwise interaction energy between all pairs of residue positions, considering all possible amino acid types and rotamer conformations at each position.[9]
The interaction energy used a classical molecular mechanics model that combined a Coulomb electrostatics term with an accessible surface area (ASA) implicit solvation correction. Protein stability was estimated by comparing the energy of the folded state to that of an extended, unfolded-state model constructed from a library of tripeptide structures.[6] An effective estimate of folding free energy change upon mutation is:
where and are the wild-type and mutant sequences respectively.
Once all energy tables for a given backbone were returned from volunteers and assembled, a heuristic search algorithm rapidly explored the full sequence and conformational space, generating between 200,000 and 300,000 candidate sequences per backbone template and retaining the lowest-energy ones.[9]
The BOINC infrastructure
The project leveraged BOINC's client-server model. The Proteins@home server distributed work units (protein backbone files plus parameter inputs) to volunteers, who ran the energy table precomputation using idle CPU time. Completed energy tables were validated by quorum (comparing results from multiple independent hosts) before being accepted and assembled into the central database.[4] The project was listed among BOINC's official project directory at the URL http://biology.polytechnique.fr/proteinsathome.[11]
Proteins@home was one of several BOINC-based projects focused on protein science active in the mid-2000s. Rosetta@home, operated by the Baker Lab at the University of Washington, focused on forward structure prediction and protein-protein docking, and is still active. Predictor@home, based at the Burnham Institute, was the first independent BOINC project ever launched and entered predictions in the CASP biennial evaluation of protein structure prediction methods.[12] POEM@Home, hosted at the Karlsruhe Institute of Technology, modelled protein folding dynamics using Anfinsen's dogma and ran from 2007 to 2016.[13]
What distinguished Proteins@home from these projects was its focus on the inverse problem — designing sequences to fit given folds — rather than predicting folds from sequences. It also aimed to cover a large, systematic slice of protein fold space (roughly 1,500 folds) rather than working on individual targets or a specific set of challenge proteins.
Scientific publications

The Proteins@home computing platform directly enabled several peer-reviewed publications from the Simonson group.
Computational protein design: software and benchmarks (2008)
The primary methods paper describing the Proteins@home software pipeline, parameter optimisation, and performance on a simple molecular mechanics model was published in the Journal of Computational Chemistry in 2008.[4] The paper validated the approach against experimental data and described the BOINC-distributed workflow in detail.
Testing the Coulomb/ASA solvent model (2008)
A companion study in BMC Bioinformatics used the Proteins@home platform to evaluate the Coulomb/accessible-surface-area implicit solvent model for protein stability, ligand binding free energies, and protein design.[6] The calculations were performed using volunteer computers in over 70 countries. The model was benchmarked against experimental mutation free energies and binding affinities across a range of proteins and peptides.
Fold recognition via computational design (2010)
A follow-up study published in PLOS ONE after the project's conclusion used the Proteins@home-generated sequence libraries to investigate whether computationally designed sequences could supplement natural sequences for protein fold recognition and homology searching.[9] Four SCOP families were redesigned — Small Kunitz-type inhibitors, Interleukin-8 chemokines, PDZ domains, and Caspase catalytic subunits — across 43 backbone templates. The SUPERFAMILY profile Hidden Markov Model library recognised 85% of the low-energy designed sequences as native-like, supporting the utility of designed sequences as diverse complements to experimental databases.
Legacy and successor work
Although the volunteer computing phase of the project ran for only about 18 months, its distributed energy table calculations made possible a systematic exploration of protein sequence space at a scale that would not have been feasible on the group's local hardware alone.
The insights and code base from Proteins@home fed directly into the Proteus software package, developed by the same group at École Polytechnique and their collaborators.[7] Proteus extended the pairwise decomposition framework with additional energy terms including generalised Born solvation, Monte Carlo simulation at constant pH, and improved rotamer libraries, and has been applied to problems such as enzyme active site redesign and aminoacyl-tRNA synthetase specificity engineering. The first full description of Proteus was published in the Journal of Computational Chemistry in 2013.[15]
See also
- BOINC
- Rosetta@home
- Predictor@home
- POEM@Home
- Protein structure prediction
- Computational protein design
- Folding@home
- École Polytechnique
References
- ↑ Proteins@home. Wikipedia. Retrieved 2026-06-08.
- ↑ 2.0 2.1 (2006-12-28).The Proteins@Home project is now open. BOINC Message Boards. Retrieved 2026-06-08.
- ↑ proteins@home. Wayback Machine. Retrieved 2026-06-08.
- ↑ 4.0 4.1 4.2 (2008-05-29).Computational protein design: software implementation, parameter optimization, and performance of a simple model. Journal of Computational Chemistry. pp. 1092–1102. DOI: 10.1002/jcc.20870.
- ↑ File:Ecole Polytechnique France seen from lake DSC03389.JPG. Wikimedia Commons. Retrieved 2026-06-08.
- ↑ 6.0 6.1 6.2 (2008-03-13).Testing the Coulomb/Accessible Surface Area solvent model for protein stability, ligand binding, and protein design. BMC Bioinformatics. DOI: 10.1186/1471-2105-9-148.
- ↑ 7.0 7.1 The Proteus software for computational protein design. École Polytechnique. Retrieved 2026-06-08.
- ↑ boinc_news.php (BOINC site source). GitHub / BOINC. Retrieved 2026-06-08.
- ↑ 9.0 9.1 9.2 9.3 (2010-05-05).Computational Protein Design: Validation and Possible Relevance as a Tool for Homology Searching and Fold Recognition. PLOS ONE. DOI: 10.1371/journal.pone.0010410.
- ↑ File:Protein Structure Gif.gif. Wikimedia Commons. Retrieved 2026-06-08.
- ↑ old_projects.inc. GitHub / BOINC. Retrieved 2026-06-08.
- ↑ Predictor@home. Wikipedia. Retrieved 2026-06-08.
- ↑ POEM@Home. Wikipedia. Retrieved 2026-06-08.
- ↑ File:Protein-structure.png. Wikimedia Commons. Retrieved 2026-06-08.
- ↑ (2013).Computational protein design: the Proteus software and selected applications. Journal of Computational Chemistry. pp. 2472–2484. DOI: 10.1002/jcc.23418.
External links
- Proteins@home official website (Wayback Machine)
- Proteus — successor software package, École Polytechnique
- Proteins@home at Wikimedia Commons
- Publications by BOINC Projects at boinc.berkeley.edu
