proteins@home

From BOINC Projects
Revision as of 16:13, 8 June 2026 by Al Piskun (talk | contribs) (External links)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search








proteins@home
The proteins@home screensaver, visualising a rotating protein structure
Project
StatusCompleted
CategoryBiochemistry
ComputeCPU
RequiresNone
Development
DeveloperThomas Simonson et al., École Polytechnique
AuthorThomas Simonson
SponsorLaboratoire de Biochimie (CNRS UMR 7654), École Polytechnique
Initial releaseDecember 28, 2006  (20 years ago)
CompletedJune 2008
Software
Operating systemWindows, Linux, macOS
Metadata
Websitehttp://biology.polytechnique.fr/proteinsathome/ (archived)


proteins@home was a non-profit volunteer computing project built on the Berkeley Open Infrastructure for Network Computing (BOINC) platform.[1] The project ran from December 28, 2006 to June 2008 and was operated by the Laboratoire de Biochimie (CNRS UMR 7654) in the Department of Biology at École Polytechnique, located in Palaiseau, near Paris, France.[2] Its scientific goal was to map the inverse protein folding problem across approximately 1,500 representative protein folds, building a database of pairwise energy functions that could be used to predict protein structure, understand protein evolution, and design new proteins with potential biomedical applications.[3]

Background

Protein folding and the inverse problem

Every protein is a chain of amino acids. The linear sequence of the chain — the primary structure — ultimately determines the protein's three-dimensional shape, or fold. Formally, a protein of length n has a primary structure s=(a1,a2,,an) where each ai belongs to the set of 20 standard amino acids. The chain folds by minimising its free energy, which includes contributions from electrostatics, van der Waals forces, and interactions with the solvent.

The inverse of this prediction problem asks: given a known three-dimensional fold, which amino acid sequences are compatible with it? This is known as the inverse protein folding problem or computational protein design (CPD). It has applications in understanding protein evolution, identifying stabilising mutations, and engineering entirely new proteins for biomedical or industrial purposes.

A key feature that made the problem tractable for distributed computing is that the energy can be expressed as a sum over all pairs of residue positions:

Etotal=i<jE(ai,ri,aj,rj)

where E(ai,ri,aj,rj) is the pairwise interaction energy between amino acid types ai and aj at positions i and j in rotamer conformations ri and rj. Because each pairwise term is independent of all others, the energy table can be precomputed in parallel across thousands of volunteer computers with almost no communication required.[4]

Project description

The campus of École Polytechnique at Palaiseau, France, home of the Laboratoire de Biochimie (CNRS UMR 7654) that ran proteins@home.[5]

Launch and operation

proteins@home was formally announced as open on December 28, 2006, when BOINC project administrator David Anderson posted on the BOINC message boards that the project was "now open" and "based at the École Polytechnique in Paris."[2] Volunteers could register and download the BOINC client to begin donating CPU cycles to the project.

The research team was led by Thomas Simonson, with contributions from Marcel Schmidt am Busch, Anne Lopes, David Mignon, Thomas Gaillard, Najette Amara, and Christine Bathelt, all based at the Laboratoire de Biochimie (CNRS UMR 7654), Department of Biology, École Polytechnique, 91128 Palaiseau, France.[6][7]

The BOINC news feed recorded on February 7, 2008 that "proteins@Home has resumed operations",[8] indicating a temporary interruption before the project reopened to participants. The project concluded in June 2008.

During its operational period, the proteins@home distributed computing platform was used by volunteers in over 100 countries.[9]


Computational methodology

A rotating 3D protein structure, illustrating the kind of tertiary fold geometry that Proteins@home worked to map using distributed volunteer computing.[10]

Each work unit sent to a volunteer computer contained the structural coordinates of one or more protein backbone templates drawn from a representative subset of the Structural Classification of Proteins (SCOP) database. For each template, the XPLOR molecular modelling program was used to precompute the pairwise interaction energy between all pairs of residue positions, considering all possible amino acid types and rotamer conformations at each position.[9]

The interaction energy used a classical molecular mechanics model that combined a Coulomb electrostatics term with an accessible surface area (ASA) implicit solvation correction. Protein stability was estimated by comparing the energy of the folded state to that of an extended, unfolded-state model constructed from a library of tripeptide structures.[6] An effective estimate of folding free energy change upon mutation is:

ΔΔGfoldEfolded(s)Efolded(s)[Eunfolded(s)Eunfolded(s)]

where s and s are the wild-type and mutant sequences respectively.

Once all energy tables for a given backbone were returned from volunteers and assembled, a heuristic search algorithm rapidly explored the full sequence and conformational space, generating between 200,000 and 300,000 candidate sequences per backbone template and retaining the lowest-energy ones.[9]

The BOINC infrastructure

The project leveraged BOINC's client-server model. The Proteins@home server distributed work units (protein backbone files plus parameter inputs) to volunteers, who ran the energy table precomputation using idle CPU time. Completed energy tables were validated by quorum (comparing results from multiple independent hosts) before being accepted and assembled into the central database.[4] The project was listed among BOINC's official project directory at the URL http://biology.polytechnique.fr/proteinsathome.[11]

Comparison with related BOINC protein projects

Proteins@home was one of several BOINC-based projects focused on protein science active in the mid-2000s. Rosetta@home, operated by the Baker Lab at the University of Washington, focused on forward structure prediction and protein-protein docking, and is still active. Predictor@home, based at the Burnham Institute, was the first independent BOINC project ever launched and entered predictions in the CASP biennial evaluation of protein structure prediction methods.[12] POEM@Home, hosted at the Karlsruhe Institute of Technology, modelled protein folding dynamics using Anfinsen's dogma and ran from 2007 to 2016.[13]

What distinguished Proteins@home from these projects was its focus on the inverse problem — designing sequences to fit given folds — rather than predicting folds from sequences. It also aimed to cover a large, systematic slice of protein fold space (roughly 1,500 folds) rather than working on individual targets or a specific set of challenge proteins.

Scientific publications

The four levels of protein structure — from the primary amino-acid sequence through to a tertiary fold. The relationship between sequence and fold was at the heart of the Proteins@home project.[14]

The Proteins@home computing platform directly enabled several peer-reviewed publications from the Simonson group.

Computational protein design: software and benchmarks (2008)

The primary methods paper describing the Proteins@home software pipeline, parameter optimisation, and performance on a simple molecular mechanics model was published in the Journal of Computational Chemistry in 2008.[4] The paper validated the approach against experimental data and described the BOINC-distributed workflow in detail.

Testing the Coulomb/ASA solvent model (2008)

A companion study in BMC Bioinformatics used the Proteins@home platform to evaluate the Coulomb/accessible-surface-area implicit solvent model for protein stability, ligand binding free energies, and protein design.[6] The calculations were performed using volunteer computers in over 70 countries. The model was benchmarked against experimental mutation free energies and binding affinities across a range of proteins and peptides.

Fold recognition via computational design (2010)

A follow-up study published in PLOS ONE after the project's conclusion used the Proteins@home-generated sequence libraries to investigate whether computationally designed sequences could supplement natural sequences for protein fold recognition and homology searching.[9] Four SCOP families were redesigned — Small Kunitz-type inhibitors, Interleukin-8 chemokines, PDZ domains, and Caspase catalytic subunits — across 43 backbone templates. The SUPERFAMILY profile Hidden Markov Model library recognised 85% of the low-energy designed sequences as native-like, supporting the utility of designed sequences as diverse complements to experimental databases.

Legacy and successor work

Although the volunteer computing phase of the project ran for only about 18 months, its distributed energy table calculations made possible a systematic exploration of protein sequence space at a scale that would not have been feasible on the group's local hardware alone.

The insights and code base from Proteins@home fed directly into the Proteus software package, developed by the same group at École Polytechnique and their collaborators.[7] Proteus extended the pairwise decomposition framework with additional energy terms including generalised Born solvation, Monte Carlo simulation at constant pH, and improved rotamer libraries, and has been applied to problems such as enzyme active site redesign and aminoacyl-tRNA synthetase specificity engineering. The first full description of Proteus was published in the Journal of Computational Chemistry in 2013.[15]

See also

References

  1. Proteins@home. Wikipedia. Retrieved 2026-06-08.
  2. 2.0 2.1 (2006-12-28).The Proteins@Home project is now open. BOINC Message Boards. Retrieved 2026-06-08.
  3. proteins@home. Wayback Machine. Retrieved 2026-06-08.
  4. 4.0 4.1 4.2 (2008-05-29).Computational protein design: software implementation, parameter optimization, and performance of a simple model. Journal of Computational Chemistry. pp. 1092–1102. DOI: 10.1002/jcc.20870.
  5. File:Ecole Polytechnique France seen from lake DSC03389.JPG. Wikimedia Commons. Retrieved 2026-06-08.
  6. 6.0 6.1 6.2 (2008-03-13).Testing the Coulomb/Accessible Surface Area solvent model for protein stability, ligand binding, and protein design. BMC Bioinformatics. DOI: 10.1186/1471-2105-9-148.
  7. 7.0 7.1 The Proteus software for computational protein design. École Polytechnique. Retrieved 2026-06-08.
  8. boinc_news.php (BOINC site source). GitHub / BOINC. Retrieved 2026-06-08.
  9. 9.0 9.1 9.2 9.3 (2010-05-05).Computational Protein Design: Validation and Possible Relevance as a Tool for Homology Searching and Fold Recognition. PLOS ONE. DOI: 10.1371/journal.pone.0010410.
  10. File:Protein Structure Gif.gif. Wikimedia Commons. Retrieved 2026-06-08.
  11. old_projects.inc. GitHub / BOINC. Retrieved 2026-06-08.
  12. Predictor@home. Wikipedia. Retrieved 2026-06-08.
  13. POEM@Home. Wikipedia. Retrieved 2026-06-08.
  14. File:Protein-structure.png. Wikimedia Commons. Retrieved 2026-06-08.
  15. (2013).Computational protein design: the Proteus software and selected applications. Journal of Computational Chemistry. pp. 2472–2484. DOI: 10.1002/jcc.23418.

External links