Proteins IV
Last time we discussed the terminology needed to analyze protein structure, we saw which angles of the backbone define the direction and shape of the polypeptide backbone, and we analyzed the different types of regular polypeptide secondary structure. In a nutshell, these were:
a helix: Residues have angles in the <f = -60> - <y = -50> region. The helix has 3.6 residues per turn, and a pitch of 5.4 Å. The helix is stablized by h-bonds from HN(i) to OC'(i+3). Side chains poke out from the center of the helix, making certain amino acid residues stabilize and other destabilize the helix.Tertiary structureb sheet: Residues in these motifs have <f,y> combinations around f = -120 and y = 140. These angles correspond to a partially extended polypeptide backbone. b strands normally come together to form large layered structures known as b pleated sheets, or b sheets for short, which depending on the direction of one strand with respect to the other, are classified as parallel and anti-parallel b sheets. In these sheets there are numerous stabilizing inter-strand h-bonds.
Polyproline II helix (a.k.a. collagen helix): The f = -70 and y = 160. This corresponds to a structural element that is not as extended as the b strand, but not as curled up as the a helix. It is known as the collagen helix or polyproline II helix. A collagen helix is left handed, and has only three residues per turn. It is therefore more open than an a helix.
Turns: Polypeptide turns or hairpins introduce a sharp change in the direction of the polypeptide backbone, usually reversing its direction. The most important turns in polypeptide chains are b turns, which are formed by four residues of the polypeptide chain, stabilized by a h-bond between HN(i) and OC'(i+2).
Great. So how does all this stuff comes together to form the tertiray structure of the protein? We first have to look at two different big classes of protein tertiary structure. The first class are fibrous proteins, and we will briefly mention their characteristics. The second one are globular proteins, and we will spend quite some time with them, as most enzymes are globular proteins.
Fibrous proteins, such as collagen, a-keratin, and elastin, are crucial in maintaining the structure of living organisms. Collagen, as we have said many times, forms cartilaghe and tendons. a-keratin is the main component of hair in mammals. In these tissues, the proteins arrange in long fiber-like arrangements. In collagen, we have fibrils made up of collagen triple-helices. In a-keratin, two a helices coil up around each other, and then form larger protofilaments.
In LNC there is a nice explanation as to why curly hair can be stretched in the hair saloon that I will not go into here. However, these proteins are pretty dull when we think of their task: Their structure gives them a biological property, but they don't do much in terms of catalyzing reactions, transporting molecules, or passing chemical signals aorund the cell.
We will focus mainly on globular proteins. Most enzymes are globular proteins. By globular we mean that the the backbone of the protein is tightly packed around itself. What is the purpose of packing all this stuff together? We have to remember that after we condensed the amino acids into the polypeptide, most of the polar residues are gone (no carboxylates or amines except for those on the side chains). This means that the protein will have many regions that are highly hydrophobic, and will want to be as far away from water as possible.
Therefore, a polypeptide chain will try to curl up over itself as much as possible, in a way that it exposes its polar residues (Arg, Lys, Glu, Asp) to water, and buries the non-polar, hydrophobic residues in the protein core. The elements of secondary structure we saw (a helices, b sheets, and turns) will combine in more or less defined arrangements to achieve this. These patterns are called supersecondary structure, or motifs, or simply folds.
For example, an a helix cannot go on indefinetly (although there are exceptions). It will have to turn around at some point, most likely using a b turn. After the turn, it can start another a helix, or part of a b strand. If it forms a b strand, another b strand will have to come to the rescue to stabilize the first one through h-bonding (remember that a single b strand is not stabilized by h-bonds). Fortunately for us (or maybe not for you at this point - you'll have to be able to recognize them...), there are a limited number of folds that are found in many proteins:
All-a motifs: As the name implies, these are proteins in which most of the secondary structure elements are a helices. The different helices in these folds will be connected by loops, which are combination of turn or random coil segments. The most basic all-a is an a-a corner, in which two helices are linked together by a short loop. Many of these a-a corners can come together to form larger all-a motifs, such as an a/a toroid.
All-b motifs: Most of the polypeptide has b sheets, either parallel or antiparallel (or both). One important thing is that when we have large b sheets, the sheet will gain stability if the b strands composing have a slight right handed twist. This usually gives rise (when the b sheet is big enough) to larger all-b motifs, such as the b barrel, propellers, or b helices (not to confuse with the a helix: The b sheet curls up into a helix...).
Then we have a/b and a + b motifs. In a/b motifs segments with a helical secondary structure alternate with segments of the polypeptide with b sheet secondary structure. The simplest example is the b-a-b motif, in which an a helix is packed against a small segment of parallel b sheet. b-a-b motifs can come together and form a/b barrels. In a + b motifs, the different a helical and b sheets don't intercalate and are more isolated from each other. Here is an example.
In all cases, we will see that all these motifs work towards a common goal: to minimize the surface area of hydropohbic residues exposed to solvent (or in other words, to maximize hydrophobic interactions in the core of the protein), and to maximize the interactions between polar or charged groups of the polypeptide with solvent (water).
What about larger proteins that cannot form a single packed motif? In many cases, we will see that a polypeptide chain will fold into two or more domains. Each domain will adopt different globular supersecondary structures. In some cases, the two domains may be distant or relatively 'un-related', and in other cases there may be a lot of non-bonded contacts between residues of both domains that will stabilize the whole pretein in its native conformation.
Quaternary structure
Things may not stop after we get our polypeptides folded into their final, native, tertiary structure. Many times different folded polypeptides may come together to form complexes that range from simple dimers to huge multimers. The arrangement of different polypeptide units into larger multimeric proteins is called quaternary structure.
The forces that link different polypeptide subunits in a multimer are almost exclusivelly non-bonded in nature: salt bridges, h-bonds, hydrophobic interactions. Therefore, the formation of a multimer depends a lot on the conditions (pH, salt content, concentration of organic solvents, temperature).
Why do proteins arrange in some cases into large multimeric complexes? Many times, the function of a catalytic site in one of the subunits will be affected by the conformation of the other subunits. Certain subunit may bind a small molecule, resulting in a conformational change that induces a conformational change on the subunit that catalyzes a certain reaction (or the subunit that binds a certain molecule). An excellent example of this is hemoglobin, a protein with which we are going to spend a reasonable ammount of time in the next few lectures.
The different polypeptide subunits in a multimeric protein usually adopt a very defined orientation with respect to one another, which introduces symmetry to the complex: we can perform certain simple geometrical operations (translations, rotations, etc.) and we will obtain the same 3D picture from a but looking at different units of the complex. We therefore classify multimeric proteins in terms of their symmetry.
- For example, if we have an axes around we can rotate the complex and obtain the same picture, we have cyclic symmetry (C). Depending on the amount we have to rotate the complex, we will have two-fold, three-fold, four-fold, etc..., symmetry:

- If we had more than two axes perpendicular to one another around which we can rotate the complex to obtain the same picture from looking at different subunits, we have dihedral symmetry. Each axes can have two-fold, three-fold, etc..., symmetry:

Very large proteins, like the coat proteins of viruses, have many subunits, and many axes, each with different levels of rotational symmetry. Understainding this is very important when studying the structure of proteins by X-ray or NMR.
Denaturation, re-naturation, and activity
When we have a protein folded in its native, active, state, we can do things to it to destroy its native fold. These processes are known as denaturing the protein, and involve destroying the regular secondary structural elements in the polypeptide chain, turining it into a random-coil (there is were the term comes from. Remember, however, that there is nothing random about the random-coil, only that it has not enough residues in a particual conformation to be called anything else.
For example, we can heat it up and destroy its activiy: When we cook an egg, the egg albumin coagulates to a white solid. This case is an extreme, because we cannot redisolve the protein and recover its native conformation. What happens is that the thermal energy we put in shufles all the weak non-bonded interactions and new, non-specific, interactions appear. These ones are 'bad', in the sense that the protein is not soluble anymore and falls out of solution.
There are chemical ways we can use to denature a protein that are reversible. For example, if we increase or decrease the optimal pH at which a protein is active, we will mess up many of the non-bonded interactions that hold the protein in its native state: salt-bridges will be destroyed, the charge in ionaizable groups will change, etc.
Similarly, we can add an organic solvent, such as ethanol or acetone, and destroy the native structure of the protein. Instead of meesing up ionic interactions, organic solvents can interact with the innards of the protein, affecting hydrophobic interactions. Once we break these type of interactions, the native fold of the protein will also be lost.
Another chemical way of denaturing a protein is using high concentrations of urea or guanidinum hydrochloride. Although the basis of action of these compound is not fully understood, it clearly has to do with their ability to influence non-covalent interactions that maintain the protein in its native fold.
When a protein becomes denatured, its secondary and tertiary structure become randomized. Since the native 3D fold is needed for activity, we will lose activity. However, we said before that there are reversible ways to dentarure a protein, particularly when the denaturing agent was a chemical. If we return the solution where we had our protein to its normal pH and salt concentration conditions, the polypeptide chain will be re-natured. That is, it will fold back to ist native structure. How does the protein knows how to get back? We have said a couple of times already that primary structure determines everything else. The experiments that proved this were carried out by Christian Anfinsen in the 50's. His experiment was on ribonuclease, a small globular protein with 124 amino acid residues and four disulfide bridges between eight cysteine residues.
Anfinsen first reduced the disulfide bridges with mercaptoethanol (HSCH2CH2OH), and then denatured it with 8M urea. At this point he had a completly random polypeptide chain with no disulfide bridges. He checked for the activity and saw nothing.
In the first experiment, he oxidized the protein with air (O2) while it was still under denaturing conditions (8M urea). This caused the cysteines to oxidize and form disulfide bridges. He then removed the urea (dialysis) and measured activity. He saw nilch.
In the second experiment, he took the reduced, denatured (urea) ribonuclease, and he removed the urea (dialisis), and he then let the cysteines to oxidize to disulfide bridges by action of air (O2). He checked for activity, found full restoration of activity.
How do we explain his results? In the first experiment the oxidation of the cysteines under denatuiring conditions cause the formation of random disulfide bridges. Instead of getting the disulfide bridges present in the native protein, any random combination (8 disulfides that can recombine in up to 105 different ways) was formed, resulting in different inactive conformations. In the second experiment, on the other hand, after the urea was removed the polypeptide chain was able to refold to its native conformation. Oxidation at this stage formed the right disulfide bridges, because the cysteines were in their proper location. Since there is nothing else in solution but the protein, this means that primary structure define secondary and tertiary structure.
Due to the implications of his findings, Anfinsen, a pretty sharp fellow, got the Nobel Prize in Chemistry.
Protein folding
So now we know elements of secondary structure and how they come together to form tertiary and quaternary structures. We also know that, in principle, we only need the primary structure to get a functional, properly folded, native proteins at physiological pH. How does all this happens? What are the driving forces that come into play for a protein to fold into its native tertiray/quaternary structure? How is that upon removal of 8M urea ribonuclease refolds into its native conformation, in miliseconds to seconds? This is a problem that clearly deals with the energies of different conformers the protein can adopt, and the time the protein spends passing through them.
As we saw, each amino acid residue adds two rotatable bonds to the protein, and therefore the number of possible conformations increases geometrycally with the number of residues. However, the conformation of the native protein is a single (or a very few closely related) combination of <f,y,w> angles for each residue. These <f,y,w> dihedrals usually place the protein at the lowest (or very close to the lowest) free energy it can have (i.e., the most favourable conformation).
How do we find the right combination of angles that give the polypeptide its native 3D structure? We can first assume it is pure chance: Each rotatable bond will adopt a value randomly until all the <f,y,w> combinations corresponding to different conformers are sampled. The ones with the lowest energies will prevail over higher energy ones, giving at the end the protein with the proper fold.
This is a very neat and logical way of doing things, until we start considering the time that this would take. Say that we had a 100 residue polypeptide, and that each dihedral angle combination could adopt only 10 posible values. If It took us only 10-13 seconds to achieve every different conformation, it would take 1077 years to sample all the possible conformers, and therefore achieve the native fold! Since there are bugs out there that have a lifespan of hours and need to have their properly folded proteins to work, we obviously have to find a better way to obtain the proper protein folding.
The study of how proteins fold is by no means finished, and there are many theories out there. However, NMR, molecular modeling, and many other physical techniques have given us a lot of data of how the process happens. What we now know is that it is a stepwise process, in which things approach gradually to the properly folded, native protein:
1) We first have to remember the forces that will be acting on an extended (random-coil) polypeptide chain. The initial force driving the folding of the polypeptide will be the tendency of the hydrophobic groups of the protein to come together. Rmember that the more hydrophobic groups we have in water, the more we have to order water around them and we have no good enthalpic contributions (no polar interactions). Therefore, water will try to lump all the hydrophobic gorups of the protein together so as to minimize the surface area of the grease. This phenomenon is known as the hydrophobic collapse of the polypeptide chain.Now lets sit back a little and evaluate the things in favour and agains folding:2) After the hydrophobic collapse, the polypeptide chain is in a partially folded state, called the molten globule. In this conformation there are recognizable elements of secondary structure, but they are still fairly disordered with respect to one another, and they are not fully formed. Interactions between different sections of the polypeptide have not been maximized, and the energy is still higher than in the final, native conformation.
We have to keep in mind that during all this process the protein will become more and more ordered. Therefore, there will be a decrease in entropy for the protein, which is unfavourable. It will have to be counter-balanced by favourable enthalpic contributions. These come from the maximization of non-bonded interactions between the different segments of the protein: van der Waals interactions, h-bonds, and salt-bridges.
Furthermore, we have to remember that the polar groups of the peptide backbone were h-bonded to water while in the extended (random-coil) conformation. When the polypeptide folds, these h-bonds to water have to be replaced with intra molecular h-bonds, such as those found in a helixes or b sheets. There will be almost no gain in the total energy of the system, because we are replacing non-specific h-bonds to water by specific h-bonds in the folded polypeptide.
In the molten globule state is when the polypeptide tries different arrangement of conformations. We have to realize that these are a lot less than if we treid with each rotatable bond. These states are relatively long-lived, and can be determined by certain experimental technique, or in molecular modeling simulations.
3) After a while (a very short while), the best partially folded conformation sampled by the molten globule is found, and the last favourable interactions (h-bonds, van der Waals contacts) that define the native fold of the protein is achieved. This is the final, native, 3D structure of the active protein.
1) From the point of view of the polypeptide, folding is extremelly entropically unfavourable: We are ordering the polypeptide while it was deorganized before. However, if we consider the whole system, solvent and all, we are removing ordering of water molecules because the surface area of exposed hydrophobic regions of the polypeptide is minimized upon foldingTherefore, there is a decrease of entropy (bad) for the polypeptide accompanied by an increase of entropy for the solvent (good), and enthalpic contributions to the free energy that were present in the random-coil between the polypeptide and the solvent are replaced by favourable enthalpic interactions between groups in the polypeptide. The whole gain in free energy is pretty minuscule if we consider the inconmensurable feat protein folding that protein folding represents. The folded proteins may be only 2 to 5 Kcal/mol more favourable in their folded state than than in their random-coil conformation. This very delicate balance is what gives the proteins many of their interesting properties. One of them, as we will see later, is breathing: A protein is not stuck in a single conformation, but moving around it constantly - This has very important implications in the activity of proteins.2) From the point of view of the system, we are losing favourable enthalpic contributions by removing h-bonds and polar interactions between polar groups of the polypeptide and water. However, most of these interactions will be replaced by intermolecular interactions (h-bonds in a helicex and b sheets) within the folded polypeptide.
Next time we will go over some of the techniques employed in the study of protein 3D structure, namely X-ray crystallography, NMR spectroscopy, and, maybe, CD (circular dichroism).
Prepared by Guillermo
Moyna, 1999.