In separate papers published this week, two independent teams have drafted the first maps of the human proteome -- which charts all of the proteins that make up a person. And both teams discovered that proteins do come from “noncoding” DNA sequences.
The proteome is an important complement to the genome and transcriptome, and together they create a more complete resource for researching health and diseases. While genes determine many of our characteristics, they’re able to do that by providing instructions for making proteins. So these draft maps -- which you can think of as the Human Genome Project for proteins -- consist of profiles of proteins expressed in all sorts of different human cell types. Both drafts were generated using mass spectrometry.
One of the teams, led by Akhilesh Pandey from Johns Hopkins University, identified and annotated proteins encoded by 17,294 genes -- that accounts for around 84 percent of all the genes in the human genome that are predicted to encode proteins (that number is estimated at 19,629, if you’re curious). The team extracted proteins from samples of 30 different tissues, then used enzymes to cut them into small pieces called peptides. They ran the peptides through a series of instruments to identify and measure their relative abundance.
They also discovered 193 novel proteins that come from regions of the genome that haven't been predicted to code for proteins. Within the genome, there are stretches of DNA whose sequences don’t follow a conventional protein-coding gene pattern -- these have been labeled as noncoding. “The fact that 193 of the proteins came from DNA sequences predicted to be noncoding means that we don’t fully understand how cells read DNA, because clearly those sequences do code for proteins,” Pandey explains in a news release.
The other team, led by Bernhard Kuster of Technische Universität München (TUM) in Germany, assembled protein evidence for over 18,000 genes (or 92 percent of the entire proteome) by compiling raw mass spec data from databases and other analyses that were already available. These include a core of 10,000–12,000 proteins expressed in several different tissues, and to fill in the gaps, they generated their own mass spec data by analyzing 60 human tissues, 13 body fluids, and 147 cancer cell lines.
Like the Hopkins team, they also found evidence of translation from DNA regions that were not thought to be translated. This includes more than 400 translated long, intergenic non-coding RNAs (lincRNAs). "While we have a good idea of what the genome looks like, we didn't know how many of those potentially 20,000 protein-coding genes would actually make protein," Kuster tells BBC. The team also identified protein markers that may predict an individual’s resistance or sensitivity to drugs for diseases like cancer.
“You can think of the human body as a huge library where each protein is a book,” Pandey says. “The difficulty is that we don’t have a comprehensive catalog that gives us the titles of the available books and where to find them.” Now it looks like we’ve got two first drafts of that comprehensive catalog. Each group has built a publicly accessible, interactive database of their datasets: Human Proteome Map and ProteomicsDB.
Although they had seen each other's work at conferences, both Pandey and Kuster tell BBC they had "no idea" they were headed towards publishing simultaneously. And now they share a Nature cover. "We never saw this as a race to be first," Kuster says. "My interpretation is that when the time is right, somebody's going to just do it. And perhaps two people are going to do it!" Here's the human body map of protein expression.
Images: H. Hahne, TUM