Proteins II
Last time we saw some definitions we will need in the study of proteins, and we analyzed some of the techniques used to separate, purify, and identify proteins according to mass and electric properties. Now we have to start talking about the fun part, their structure and function.
Protein Structure
There are different levels of organization of proteins, and each level of organization is termed a structure. The first level, or primary structure, is, as we will see in brief, the arrangement of amino acids in a polypeptide chain. Secondary structure refers to the spatial arrangement, or conformation, of relatively short segments of polypeptide chain. The tertiary structure of a protein reflects how the elements of secondary structure are organized with respect to one another. Finally, quaternary structure refers to the organization of different subunits or monomers of a protein complex. One thing that we have to keep in mind is that primary structure prety much describes everything else!
Protein primary structure
From what we know so far, we can combine our 20 standard amino acids using peptide bonds to form chains anywere from 50 to 8,000 residues. Lets be conservative and say that we want to make a 100-residue polypeptide. In principle, there is no restriction as to how we can combine our residues, which gives us up to 20100 (a 2 followed by 101 zeros) possible polypeptide chains. This means that we can create that many different proteins using the 20 standard amino acids and combining them in chunks of 100. Therefore, with just 100 residues we will have an obcene number of possibilities, and a humongous number of possible different activities.
A proof of the infinite number of possibilities is the number different proteins deposited in different databases, such as PIR and SWISS-PROT. The latest release of SWISS-PROT (#38, July 1999) contains over 80,000 different protein sequences that have been determined (see below meaning of sequence...).
The arrangement and type of amino acids present in a protein is called its primary structure, which is determined by the sequence of amino acids in the polypeptide. As we will see in the very near future, the primary structure of the protein determines the secondary and to a large extent the tertiary structures of a protein (basically the shape and conformation of that protein).
Following our intuition, we can postulate that the biological activity of a protein will depend on its primary structure. How can we prove that this very obviuos statement is true?
First, we can look at proteins with similar number of residues that do different things. We see that function changes a lot with the amino acid composotion, and the number of residues has nothing to do with activity:
H2N-Tyr-Gly-Gly-Phe-Met-COOH
- Enkephalin - pain perception in mammals
H2N-Phe-Phe-Gly-Trp-Gly-NH2
- Insect Kinin - gut contraction in roaches/crickets/mosquitoes
Second, we can look at mutations that cause disease. For example, sickle-cell anemia is caused by the mutation of a single residue in hemoglobin (Hb), an oxygen-transport protein. The mutation involves a change of a glutamate (E) at position 6 in a chain of ~160 residues for a valine (V). This causes a change in the total negative charge of the protein, and this change of charge and residue polarity causes a conformational change. The mutated protein agregates into large clumps, and since there is a high concentration of Hb in the red cell their shape changes and resembles a sickle.
Healthy
H2N-VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYP...
Sickle-cell H2N-VHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYP...
There are many important things that we can discover by looking exclusivelly at the primary structure of a protein which also indicate that sequence dictates function. We can isolate enzymes that perform the same tasks in very different organisms, and many times we will find that their primary structures have high similarity. In many cases, we will find sections of amino acids that are conserved from the protein in one specie to the protein that performs the same task in another specie. These conserved amino acids are usually crucial for activity, and can point out regions of the protein that are involved in binding a substrate or performing certain reaction. For example, we can compare the primary structure of cytochorme c, a component of the electron transport system found in the mitochondria, from human and pumpkin, and find several conserved regions:
Human ...NPKKYIPGTKMIFVGIKKK...
Pumpkin
...NPXKYIPGTKMVFPGLXK...
Although that some amino acids are not exactly the same, they have the same polarities or type of side chains. Changes may be a Glu for an Asp, an Arg for a Lys, Val for Ile or Leu, etc. This also is a way of mapping the evolution of different organisms: Organisms closer in the evolutionary pathway have more similarities than those very far apart.
One thing that we briefly mentioned last time are disulfide bridges. As we saw, two cysteine residues can oxidize (air) to form a disulfide bond (cystine residue). Disulfide bonds are extremely important, because they can bring regions of the polypeptide that are far from each other in the sequence very close in space. Disulfide bridges have very big effects on the overal fold (or shape) of our protein, and we will also have to deal with them when trying to determine primary structure sequences.
Sequence determination
For all the reasons mentioned above, studying the primary structure of proteins is extremelly important. We already know how to determine the identity of the first amino acid (the N-terminal residue). We react our polypeptide with either 4-F-DNB, dansyl chloride, dabsyl chloride, or fluorescamine, digest the protein, separate the amino acids and compare the derivative to the corresponding derivative of each of the 20 amino acids.
This is fine for the N-terminal amino acid, but in the process we blasted the whole protein to pieces. If we are lucky, we will be able to determine the amino acid composition (the percentage of each amino acid in the protein) from the amino acid mixture by HPLC or something like it, but not the order in the sequence, which as we saw is the important thing to know.
A process that does this is call the Edman degradation, in which the N-terminal residue reacts with phelyisothiocyanate (PITC) and is cleaved from the polypeptide chain, leaving the next amino acid in the sequence as the free amino form. We can then repeat the cycle, and after each cycle compare the phenythioidantoine-amino acid derivative with a standard. Since each cycle tells us the identity of the amino acid and its position relative to the N-terminus, we get the sequence:

The whole process is nowadays fully automated, and a machine called a sequenator ("...beam me up Scotty...") will do all the slicing-and-dicing plus the determination of the released N-terminal amino acids. Obviously, you punch some stuff into a computer keyboard, and let it go...
We have to remember, however, that each step does not have a 100% yield, and we will start losing strarting material for each subsequent step of the Edman degradation. Therefore, we cannot sequence polypeptides of more than 50 to 100 residues using this technique. How do we get around this. What we have to do is find a way of partially choping up our polypeptide. There are chemical and enzymatic methods to do this.
The simplest chemical method is a partial digestion with acid: Instead of boiling the heck out of the protein in 6N HCl for a day, we do it in less aggressive conditions. This will not give fully digested protein (i.e., free amino acids), but chunks of different lenghts. The problem is that it is not repeptitive, and it will be very hard to find out where in the whole sequence did the fragments came from.
A method that is a little better is the
the cyanogen bromide (CNBr) digestion. Cyanogen bromide will selectively
cleave peptide bonds to the right of a methionine residue. Therefore, if
we have a polypeptide with three Met residues, we end up with four smaller
fragments. These fragment could be then used in an Edman degradation protocol.
This reaction can be summarized as follows:

Not only does it give us smaller fragments, but it indirectly tells us how many methionines we have in the sequence of our original polypeptide. Obviously, if we have no methionines we are screwed.
Now, what if this fails too, and we still have some very long chunks of polypeptide? We turn to enzymatic methods. Peptidases are enzymes with high specificity for certain types peptide bonds, so they can come to the rescue. There are two common ones called trypsin and chymotrypsin. Trypsin will cleave the peptide bond to the right of lysine or arginine residues:

The only exception is when the residue to the right of the peptide bond is a proline. Chymotrypsin will cleave to the right of bulky hydrophobic residues, like phenylalanine, tryptophan, and tyrosine:

We have many others that cleave other peptide bonds: Endopeptidase V8 cleaves the peptide bond to the right of glutamate, elastase to the right of small neutral residues (Ala, GLy, Ser, Val), etc., etc. Again, there are some exceptions to the rule, and the presence of certain residues to the right of the peptide bond targeted by the particular enzyme will make things go slower or not go at all.
Strategies for determining polypeptide sequences
Now, in which order do we do all this stuff to pinpoint the amino acid sequence of our polypeptide? We have to take a series of well thought out steps to be succesful:
1) First, we have to determine the size by SDS-PAGE. This will allow us to figure out if a single Edman degradation will do or if we will need partial digestions.
2) Second, we perform a full digestion of the polypeptide with 6N HCl to determine the amino acid composition. Previous to this, we couple the N-terminal amino acid with one of the reagents we saw last time (dansyl chloride, dabsyl chloride, fluorescamine, 4-F-DNB) to determine the identity of the firts amino acid.
3) If find have cysteines in our amino acid determination, they can be forming disulfide bridges in our protein. This may impair us to use enzymatic cleaveges we may need to perform prior to Edman degradation, because the disulfide bridges may be holding the protein in a conformation in which the peptide bonds are not reachable by the peptidase. We therefore have to break the disulfide bridges.
The easiest way to do this is a reduction, normally done with b-mercaptoethanol, HS-CH2-CH2-OH:

Now, to prevent reformation of the disulfide bridge, and also to aid in the identification of the identities of the two cysteines forming part in it, we can derivatize the cysteines. One way is with iodoacetic acid, or iodoacetate. The sulfur atom attacks the carbon of the iodoacetate displacing the iodide:

Anoher way of derivatizing a cystine residue is with performic acid, which oxidizes both groups in the disulfide bridge to sulfonates (-SO3-). These are then called cysteic acids:

4) According to the size and the amino acid composition, we have to chose the appropriate peptidases to obtain fragments of appropriate lenghts suitable for Edman degradation and automated sequencing
5) Finally, we perform Edman degradations with the different peptides obtained from the the different chemical and enzymatic digestions.
All this steps will give us the sequence of fragments of different lenghts of our original protein. Now come the part in which we start thinking again. We have to find the overlap between the different fragments. Hoepfully, there are going to be different regions that overlap from different fragment, so we can re-build our original protein sequence.
As an example, lets say that we have a certain peptide of molecular weight 3,000 Da as per SDS-PAGE. This is around 25 amino acids:
The amino acid analysis gives us the following information:
i) A = 4; G = 5; T = 1; Y = 1; K = 1; H = 1; P = 3; V = 1; F = 1; M = 1; Q = 1; D = 1; L = 1; S = 2; E = 1Although we have a small enough peptide to do a single Edman degradation, we hit it first with CNBr (we have a methionine), and we get two fragments. Edman degradation works fine for the small one, but bad for the bigger one, giving us partial information:ii) The N-terminal residue is alanine from the dabsyl chloride reaction.
Partially read fragment: A-G-T-Y-K-H-G-P-P-F-A; Fully read fragment: Q-D-L-P-S-G-S-E-G
Treatment with chymotrypsin gives us three fragments (we have one tyrosine and one phenylalanine), which work OK on Edman degradation:
A-G-T-Y; K-H-G-P-P; F-A-V-G-A-A-M-Q-D-L-P-S-G-S-E-G
Now we can strat piecing the puzzle together:
i) We have an alanine at the begining.Now, the sequence of our mystery peptide is:ii) Since we have only one methinine, the CNBr digestion will give two peptides. The one starting with alanine is the first one N-terminal fragment, which contains the alanine:
A-G-T-Y-K-H-G-P-P-F-A
The other one is the C-terminal fragment.
iii) The third fragment from the chymotrypsin digestion overlaps the partially read frament from the CNBr digestion and the C-terminal fragment. We can re-build the whole peptide:
A-G-T-Y-K-H-G-P-P-F-A
F-A-V-G-A-A-M-Q-D-L-P-S-G-S-E-G
Q-D-L-P-S-G-S-E-G
A-G-T-Y-K-H-G-P-P-F-A-V-G-A-A-M-Q-D-L-P-S-G-S-E-G
Next class we will start looking at secondary structure.
Prepared by Guillermo
Moyna, 1999.