When does mRNA level not predict protein level? A new paper from our lab revisited the question of how well mRNA levels reflect protein variances across different tumors and normal tissues using CPTAC data.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010702

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Author summary The abundance of mRNA is often measured as a surrogate variable of protein levels, but how well the mRNA level of different genes correlate with their protein across samples remains incompletely understood. Here we trained machine learning models over large RNA sequencing and mass spectrometry data from up to 8 cancer types in the CPTAC data sets to evaluate how well protein level variances across samples can be predicted from their transcripts. Despite voluminous data, up to one-third of genes shows poor mRNA-protein correlation suggesting their protein abundance is not primarily regulated from cognate transcripts. The inclusion of mRNA level information from protein interaction partners into the prediction models substantially improved prediction performance for a subset of genes, suggesting their protein abundance may be primarily regulated post-transcriptionally through protein-protein interactions. Notably, these proteins involve not only subunits of large multi-protein complexes such as the ribosome as previously suspected, but many proteins that form stable interactions with one or few other partners, including the propionyl-CoA carboxylase, mitochondrial calcium uniporter, calcineurin, and others. The results add to emerging evidence of independent regulation of protein levels from their cognate transcripts and suggest avenues to improve the interpretation of transcriptomics data.

Building on prior work, we trained machine learning models to predict the across-sample protein variance from RNA-seq data. We saw huge gene-wise differences in predictability. We found that up to 1/3 of proteins are poorly predicted by mRNA.
Inspecting further, we saw many proteins show very poor correlation with their cognate mRNA but instead a strong correlation with another transcript, which are usually but not always their known protein-protein interaction partners.
The data suggests degradation of supernumerary interactors is a driver of protein levels. While this was known for large complexes, this phenomenon is widespread and affects many small stable complexes incl. propionyl-CoA carboxylase, mito. calcium uniporter, calcineurin, etc.
Prior selection of mRNA features not only improved protein predictions, but may also help find new protein-level driver genes. E.g., using a directed graph model, we predict that the LACTB mRNA may have an outsized effect on mitochondrial ribosome protein abundance.
The paper “widespread post-transcriptional regulation of protein abundance by interacting partners” is on PLOS Comp Biol. Thanks to Himangi Srivastava, Mike Lippincott, Jordan Currie, and the Maggie Lam Lab for this collaboration, and the reviewers and editors for their constructive comments!
@edwardlau Fascinating stuff, and great to see a more nuanced assessment of mRNA/protein abundance relationship in the context of protein interaction. Many moons ago, I looked at the link between genomic copy-number and expression for proteins that form complexes (or not) and found an eery lack of correlation in many cases of weak binding. I never dug deeper, so really excited about this new analysis! (for ref: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0009474)
Dosage Sensitivity Shapes the Evolution of Copy-Number Varied Regions

Dosage sensitivity is an important evolutionary force which impacts on gene dispensability and duplicability. The newly available data on human copy-number variation (CNV) allow an analysis of the most recent and ongoing evolution. Provided that heterozygous gene deletions and duplications actually change gene dosage, we expect to observe negative selection against CNVs encompassing dosage sensitive genes. In this study, we make use of several sources of population genetic data to identify selection on structural variations of dosage sensitive genes. We show that CNVs can directly affect expression levels of contained genes. We find that genes encoding members of protein complexes exhibit limited expression variation and overlap significantly with a manually derived set of dosage sensitive genes. We show that complexes and other dosage sensitive genes are underrepresented in CNV regions, with a particular bias against frequent variations and duplications. These results suggest that dosage sensitivity is a significant force of negative selection on regions of copy-number variation.

@bensb Interesting, thanks for sharing! There was a 2017 paper that showed CNV being buffered at the protein level in CPTAC data ... would be interesting to see how that relates to gene coordinates https://www.sciencedirect.com/science/article/pii/S240547121730385X