Proteome Journal
Proteomics: Large-Scale
Protein Analysis
by Justin Saeks
jsaeks@zipmail.com
Justin Saeks is the Proteome Society's first
contributing author. Justin is a biotechnology consultant in the San
Francisco Bay Area. This article is also posted in the June
2001 Issue of the Proteome News
The theme of molecular biology research, in the past, has been
oriented around the gene rather than the protein. This is not to say that
researchers have neglected to study proteins, but rather that the approaches and
techniques most commonly used have looked primarily at the nucleic acids and
then later at the protein(s) implicated. The main reason for this has been that
the technologies available, and the inherent characteristics of nucleic acids,
have made the genes the low hanging fruit. This situation has changed
recently and continues to change as larger scale, higher throughput methods are
developed for both nucleic acids and proteins. The majority of processes
that take place in a cell are not performed by the genes themselves, but rather
by the proteins that they code for. A disease can arise when a
gene/protein is over- or under-expressed, or when a mutation in a gene results
in a malformed protein, or when post translational modifications alter a
protein's function. Thus to truly understand a biological process, the
relevant proteins must be studied directly.
But there are more challenges when studying proteins compared to
studying genes, due to their complex 3-D structure which is related to the
function, analogous to a machine. Proteomics is defined as the systematic
large-scale analysis of protein expression under normal and perturbed (stressed,
diseased, and/or drugged) states, and generally involves the separation,
identification, and characterization of all of the proteins in a cell or tissue
sample. The meaning of the term has also been expanded, and is now used
loosely to refer to the approach of analyzing which proteins a particular type
of cell synthesizes, how much the cell synthesizes, how cells modify proteins
after synthesis, and how all of those proteins interact.
There are orders of magnitude more proteins than genes in an organism - based on
alternative splicing (several per gene) and post translational modifications
(over 100 known), there are estimated to be a million or more. Fortunately
there are features such as folds and motifs, which allow them to be categorized
into groups and families, making the task of studying them more tractable.
There is a broad range of technologies used in proteomics, but the central
paradigm has been the use of 2-D gel electrophoresis (2D-GE) followed by mass
spectrometry (MS). 2D-GE is used to first separate the proteins by
isoelectric point and then by size. The individual proteins are
subsequently removed from the gel and prepared, then analyzed by MS to determine
their identity and characteristics. There are various types of mass analyzers used in proteomics MS including
quadrupole, time-of-flight (TOF), and ion trap, and each has its own particular
capabilities. Tandem arrangements are often used, such as quadrupole-TOF,
to provide more analytical power. The recent development of soft
ionization techniques, namely matrix-assisted laser desorption ionization (MALDI)
and electro-spray ionization (ESI), has allowed large biomolecules to be
introduced into the mass analyzer without completely decomposing their
structures, or even without breaking them at all, depending on the design of the
experiment. There are techniques which incorporate liquid chromatography
(LC) with MS, and others that use LC by itself. Robotics have been applied
to automate several steps in the 2DGE-MS process such as spot excision and
enzyme digests.
To determine a protein's structure, XRD and NMR techniques are being improved to
reach higher throughput and better performance. For example, automated
high-throughput crystallization methods are being used upstream of XRD to
alleviate that bottleneck. For NMR, cryo-probes and flow probes shorten
analysis time and decrease sample volume requirements. The hope is that
determining about 10,000 protein structures will be enough to characterize the
estimated 5,000 or so folds, which will feed into more reliable in silico
structural prediction methods. Structure by itself does not provide all of
the desired information, but is a major step in the right direction.
Protein chips are being developed for many of the processes in proteomics.
For example, researchers are developing protocols for protein microarrays
at institutions such as Harvard and Stanford as well as at several companies.
These chips - grids of attached peptide fragments, attached antibodies, or
gel "pads" with proteins suspended inside - will be used for various
experiments such as protein-protein interaction studies and differential
expression analysis. They can also be used to filter out high abundance
proteins before further experiments; one of the major challenges in proteomics
is isolating and analyzing the low abundance proteins, which are thought to be
the most important.
There are many other types of protein chips, and the number will continue to
grow. For example, microfluidics chips can combine the sample preparation steps
prior to MS, such as enzyme digests, with nanoelectrospray ionization, all on
the one chip. Or, the samples can be ionized directly off of the surface
of the chip, similar to a MALDI target. Microfluidics chips are also being
combined with NMR. In the next few years, various protein chips will be
used increasingly in diagnostic applications as well.
The bioinformatics side of proteomics includes both databases and analysis
software. There are many public and private databases containing protein data
ranging from sequences, to functions, to post translational modifications.
Typically, a researcher will first perform 2D-GE followed by MS; this will
result in a fingerprint, molecular weight, or even sequence for each protein of
interest, which can then be used to query databases for similarities or other
information. Swiss-Prot and TrEMBL, developed in a collaboration between
the Swiss Institute of Bioinformatics and the European Bioinformatics Institute,
are currently the major databases dedicated to cataloging protein data, but
there are dozens of more specialized databases and tools.
New bioinformatics approaches are constantly being introduced. Recent
customized versions of PSI-BLAST can, for example, utilize not only the curated
protein entries in Swiss-Prot but also linguistic analyses of biomedical journal
articles to help determine protein family relationships. Publicly
available databases and tools are popular, but there are also several companies
offering subscriptions to proprietary databases, which often include
protein-protein interaction maps generated using the yeast two-hybrid (Y2H)
system.
The proteomics market is comprised of instrument manufacturers, bioinformatics
companies, laboratory product suppliers, service providers, and other biotech
related companies which can defy categorization. A given company can often
overlap more than one of these areas. Many of the companies involved in
the proteomics market are actually doing drug discovery as their major focus,
while partnering, or providing services or subscriptions, to other companies to
generate short term revenues.
The market for proteomics products and services was estimated to be $1.0B in
2000, growing at a CAGR of 42% to about $5.8B in 2005. The major drivers
will continue to be the biopharmaceutical industry's pursuit of blockbuster
drugs and the recent technological advances which have allowed large-scale
studies of genes and proteins.
Alliances are becoming increasingly important in this field, because it is
challenging for companies to find all of the necessary expertise to cover the
different activities involved in proteomics. Synergies must be created by
combining forces. For example, many companies working with mass
spectrometry, both the manufacturers and end user labs, are collaborating with
protein chip related companies. The technologies are a natural fit for
many applications, such as microfluidic chips which provide nanoelectrospray
ionization into a mass spectrometer. There are many combinations of
diagnostics, instrumentation, chip, and bioinformatics companies which create
effective partnerships.
In general, proteomics appears to hold great promise in the pursuit of
biological knowledge. There has been a general realization that the
large-scale approach to biology, as opposed to the strictly hypothesis-driven
approach, will rapidly generate much more useful information. The two
approaches are not mutually exclusive, and the happy medium seems to be the
formation of broad hypotheses which are subsequently investigated by designing
large-scale experiments and selecting the appropriate data. Proteomics and
genomics, and other varieties of 'omics', will all continue to complement each
other in providing the tools and information for this type of research.
|