Connect with us


Predicting the proteome: unveiling the secrets of biology’s molecular marvels through AI



Science writer Freya Masters reports on the game-changing impact of Alphafold. 

Have you ever stopped to wonder about your proteome?

No? Well, it certainly wonders about you! 

You have a vast collection of proteins in your body (predicted to be anything from 80, 000 to 400, 000 individual proteins) which mediate nearly all the functions in your body.

As you read this, tiny proteins are whizzing around in every one of your cells – all thirty trillion of them.

From enzymes – the crucial molecular machines binding their target molecules in a myriad of biochemical reactions to provide you with crucial energy – to the army of Y-shaped antibodies targeting invaders of your immune system.

Other proteins such as collagen – the most abundant protein in your body – provide the structural connections in your bones or skin.

Proteins are formed from specific sequences of amino acids. At an even smaller level, your DNA determines the sequence of such amino acids.

In turn, the specific folding of the amino acid sequences governs the protein’s specific three-dimensional structure.

This 3D structure confers the protein’s function; whether the protein is an enzyme digesting your latest meal (for instance amylase which digests starch into sugars) or a hormone transmitting messages between nerve cells (like the release of serotonin as you exercise). 

A protein can adopt a tremendous number of conformations before its final 3D structure – so many that it would take a longer amount of time than the universe’s known age to work them all out.

Even more fascinating is the fact that a protein spontaneously achieves this process in a matter of milliseconds. As if they weren’t impressive enough already.

Now, an error in the DNA which establishes an amino acid sequence, may alter the amino acid sequence. Therefore, if such genetic mutations change the amino acid sequence, the protein’s 3D structure will consequently be changed – and (you guessed it) as structure leads to function that might not be good news, resulting in a faulty or malformed protein. 

For instance, if the altered protein structure in question is that of an enzyme, the enzyme may not be able to carry out its function.

An enzyme’s structure is highly specific to its function. Its site of catalysis – the reaction itself – is called its ‘active site’.

This active site is a specific shape such that it can bind its target (its substrate) in order for the appropriate reaction to occur.

Returning to our previous example, amylase will bind its substrate starch at its active site and thus break it down to sugars.

A mutation in the DNA affects the amino acid sequences conferring the shape of amylase’s active site, may result in an inability of the enzyme to bind starch.

A misfolded protein arising from genetic mutations can confer neurodegenerative diseases including Alzheimer’s or Parkinson’s and non-neurological diseases such as Type 2 Diabetes.

Proteins really are at the core of all our lives – not only the serotonin molecules mediating that feeling of reward after exercise but the misshapen protein unintentionally leading to the devastating consequences of disease.

The challenge of predicting just how amino acid chains will fold into the elaborate 3D structure of the humble protein is known as the ‘protein folding problem’.

This has been considered to be one the grandest of challenges in the world of biology – a solution for which has remained elusive for over fifty years.

The determination of complex protein structures from all domains of life has been achieved throughout history by methodologies considered to be the ‘gold standards’ (Cryo-Electron Microscopy, Nuclear Magnetic Resonance and X-ray Crystallography). In reality, these processes are long, painstaking and expensive.

The development of machine learning approaches in artificial intelligence (AI) such as deep learning (AI which can mimic the human brain in numerous processes including data mining) has revolutionised biology and to a very impressive extent has overcome the ‘protein folding problem’ which has baffled scientists for decades.

The biennial global championship – ‘Critical Assessment of Techniques for Protein Structure Prediction’ (CASP) – is a competition for predicting protein structures and measuring the progress of the most novel methods which aim to improve the accuracy of the predictions.

It involves around 100 target proteins or protein domains (structural protein units) which are released over a period of several months to teams.

Subsequently, the teams solve the structures and submit their predictions which are assessed in relation to the actual structures.

In 2018, the 13th CASP championship was wowed by a new team from DeepMind (a deep learning approach which has previously made games such as Atari) who developed the AI programme ‘AlphaFold’.

In 2018, AlphaFold possessed the ability to predict interatomic distances of the most difficult protein structures presented by the competition and was awarded first place.

At that stage, the approach was like slotting together some but not all parts of a jigsaw puzzle, with the overall 3D structure of the select proteins not yet achieved. 

Two years later, in the 2020 CASP competition, DeepMind’s ‘AlphaFold2’ programme astoundingly outperformed 100 other teams, solving proteins all the way from their sequences to 3D structures.

In the global distance test (GDT) – a primary means of assessment in the competition – AlphaFold2 scored 90/100 for two thirds of the presented proteins.

Indeed, the program has been trained on more than 170 000 proteins (by showing it their sequences and structures) and can accurately predict a protein shape according to the amino acid sequence – in a matter of days.

But why hasn’t AlphaFold entirely solved the protein folding problem? Some argue that it does not characterise the rules which govern protein folding. 

Nevertheless, AlphaFold presents a transformational approach to one of the greatest mysteries in biology, establishing the crucial foundations for containing to solve it.

The immense potential for this programme to rapidly aid humanity in numerous creative ways has been recognised.

So far, we only know the structures of around half of all proteins made by a human cell. Therefore, AlphaFold could greatly progress our understanding of just how our bodies tick.

As touched upon, the disease implications of malformed proteins are significant – thus, AlphaFold could improve predictions of protein structures in rare disease research, helping our understanding of how gene variations between people cause disease.

Ultimately, this will lead to cures for such diseases, whilst reducing the economic costs of research.

The programme could significantly increase the efficiency of drug discovery for the treatment of diseases, helping to more effectively decipher the shape and structure of drug targets. 

Additionally, increasing our understanding of protein structures will also enhance our ability to manipulate proteins for other useful purposes.

For instance, in 2018, researchers modified specific amino acids situated on the surface of one biodegradable enzyme ‘PETase’.

In nature, this enzyme produces a bacterium to digest plastics such as polyethylene terephthalate (PET) to obtain energy.

This resulted in a serendipitous discovery that the modified PETase could digest plastics rapidly in a matter of days; a striking finding when you consider that the same degradation process occurs over hundreds of years in nature.

More recently, PETase was engineered by fusing it with a second enzyme ‘MHETase’ to create a ‘super-enzyme’ which exhibited an even greater ability to break down specific plastics such as PET or polyethylene furanoate (PEF).

Professor McGeehan, director of the Centre for Enzyme Innovation at the University of Portsmouth, and colleagues made use of X-ray beams at the Diamond Light Source Synchrotron facility in Oxfordshire to explicate the enzyme 3D structures.

Through Alphafold shining a light on how proteins fold, the design of biodegradable proteins (like this super-enzyme) as crucial novel solutions for dealing with environmental pollutants such as plastic will become easier.

AlphaFold has even predicted the structures of six proteins in the SARS-CoV-2 viral genome (the structure of one of these proteins ‘ORF3a’ was then determined in scientific experiments), with predictions of an even greater accuracy made for another SARS-CoV-2 protein ORF8.

Thus, the programme could be pivotal in its potential for preparing us to combat a future pandemic. 

Whilst AlphaFold has demonstrated the sheer ingenuity of science in unveiling the secrets of the proteome, there is still a long way to go.

There is a need for the programme to solve protein complexes (which are formed when proteins interact with other proteins) or the structures of proteins embedded in cell membranes. 

Well, now you have sufficiently wondered about your proteome! 

We may know of around 200 million proteins (including those in your proteome) – but with another 30 million proteins discovered each year, just imagine how many more of these molecular marvels we are yet to unveil.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending stories