Proteins: definition, composition, and structure

The term “protein” derives from the Greek word “proteios”, that means primary or preeminent, and was suggested for the first time by Jöns Jacob Berzelius, one of the fathers of modern chemistry, to his colleague Gerardus Johannes Mulder, who was studying the chemical composition of albumins in 1839. In fact Berzelius thought, on the basis of the formula given by Mulder to albumin, C40H62O12N10, wrong formula, that proteins could be the most important biological substances.
Despite the mistake of Mulder, Berzelius had a “prophetic intuition”.
They are a class of molecules present in all living organisms and in all compartments of the cell; in animal cells they may constitute more than 50% of their dry weight.
Animal, plant, bacterial and viral proteins are linear polymers made up of subunits called amino acids. About 20 amino acids have been identified which, according to Fischer-Rosanoff convention or D-L system, a way to describe the configuration of molecules with chirality, are present almost exclusively in the L form, and are bound together by a covalent bond called peptide bond, that is rigid and planar. The amino acid sequence, coded by a specific gene, is called a polypeptide chain or a protein. Each amino acid is repeated a more or less high number of times.

Occasionally, D-amino acids have been found in certain bacterial proteins.
Proteins have very different structures, even into the same cell type, where we can find hundreds of different types, which perform different functions.
It should be noted that the peptide bond is very stable at physiological pH: in the absence of external interventions its life is about 1100 years.


Structure of the proteins

Proteins are the most versatile molecules present in living organisms, where they perform functions essential for life. The great variety of functions that are capable of performing derives from the possibility of the polypeptide chain to fold into specific three-dimensional structures which provide the ability to bind different molecules and carry out various functions.
In the description of how polypeptide chains fold into their three-dimensional structures, it is helpful to distinguish different levels of organization, which will be analyzed below.

Note: the structures subsequent to the secondary one are present in globular proteins.

Protein primary structure

Bovine insulin was the first protein whose primary structure was determined, thanks to the work of Frederick Sanger in 1953.
The primary structure is the amino acid sequence of the proteins, their lowest level of organization, and, as previously said, it is unique and genetically determined.
It may consist of 40 to over 4,000 amino acid residues and it determines the three-dimensional structure of the protein itself, which in turn determines its function.
The polypeptide chain has polarity because its two ends are different: one has a free amino group and is called NH2-terminus or amino-terminus, the other a free carboxyl group, and is called COOH-terminus or carboxyl-terminus. The two ends of the polypeptide chain are also known as N-terminal end and C-terminal end to distinguish them from the carboxyl and amino groups present within the chain. By convention the N-terminal end is taken as the beginning of the amino acid chain, and is always put on the left.
The primary structure is also interesting because, comparing that of a same protein in different species, we can identify the variations that the corresponding gene has undergone, which are an indicator of the divergence of the species in the course of evolution.
The terms dipeptide, tripeptide, oligopeptide and polypeptide are used to indicate chains of different lengths, respectively consist of 2, 3, less than 50, and more than 50 amino acids.

Protein secondary structure

The discovery of the secondary structure of proteins is due to the work of Linus Pauling and Robert Corey in 1951, which proposed two structures called α-helix and β-sheet structure or β-pleated sheet.
The secondary structure results from the formation of hydrogen bonds between contiguous parts of the polypeptide chain with particular amino acid sequences. Therefore it describes the arrangement in space of amino acids not very far apart along the primary structure.
In addition to the above mentioned structures, others have been identified as β-turns (beta turns), γ-turns (gamma turns) and Ω-loops (omega loops), all belonging to the group called reverse turns. These structures are often found where the polypeptide chain reverses direction, and typically are located on the surface of the molecule.

Note: about 32-38% of the amino acids in globular proteins are found within α-helix structures.

Supersecondary structures or motifs

They are combination of secondary structures to form a region of the molecule with a particular three-dimensional structure and topology. The supersecondary structures are connected to each other by loop regions with undefined structure.
Common motifs are:

  • the “zinc finger” (β-α-β), that is often found in proteins that bind RNA or DNA;
  • the Greek key, the β-meander, and the β-barrel.


The domains are the next level of organization. They are globular regions that result from the combination of motifs that fold independently from the rest of the polypeptide chain to give a stable structure.
They consist of 40-400 amino acids, except motor and kinase domains that are formed by a much larger number of amino acids.
The domains were classified into three main groups, on the basis of secondary structures and motifs present:

  • α-domains;
  • β-domains;
  • α/β-domains.

Over 1,000 domain families have been found (the members of each family are called “homologues”), and they seem to be evolved from a common ancestor.
Very often, each domain has a specific function, that is, it is a functional unit of the protein in which it is contained.
Proteins may consist of a single domain, the smaller ones, or of several domains. For example, chymotrypsin consists of a single domain, papain of two domains.

Protein tertiary structure

The tertiary structure, also called “native structure”, is the three-dimensional structure of the proteins. The first protein whose tertiary structure was determined was myoglobin in 1958, thanks to the work of John Kendrew.
In this type of structure, the folding of the protein chain is responsible for putting in close contact amino acid residues far from each other along the chain, that is, it refers to the three-dimensional arrangement of amino acids far from each other along the primary structure.

Proteins: the tertiary structure of oxy-myoglobin

The tertiary structure of the proteins, in particular of proteins consisting of more than 200 amino acid residues, is formed by different domains linked by short polypeptide segments. It is often stabilized by disulfide bridges between cysteine residues, bridges which are formed after the molecule has attained its native conformation.
It should be noted that not all globular proteins have a tertiary structure.
An example are milk caseins, whose polypeptide chain assumes a disordered three-dimensional conformation, also known as random coiled structure. The disordered structure makes them highly susceptible to the action of the intestinal proteases, and therefore to the release of the constituent amino acids. This makes them highly suitable for their nutritional role.
Another example of random coiled protein is elastin.

Protein quaternary structure

This additional level of structural organization describes how more than one polypeptide chains associate to form a single protein structure. Therefore, it refers to the spatial arrangement of the individual chains and the nature of the forces that bind them together, such as:

  • the hydrophobic effect, which is the main driving force for protein folding;
  • hydrogen bonds;
  • van deer Waals interactions;
  • ionic interactions;
  • covalent cross-links.

The resulting structure is called oligomer (oligomeric protein) and the constituent polypeptides, which may be identical or different, monomers or simply subunits.
In general, most of the intracellular proteins are oligomers, unlike most of the extracellular ones. A classic example of protein with quaternary structure is hemoglobin.
This level of structure is obviously absent in globular proteins consisting of a single polypeptide chain, that is, in monomeric proteins.
Proteins are also capable to interact to form structures in which, acting in a synergistic way, they perform functions that they would not be able to accomplish alone.
Examples are the “macromolecular machines” involved in the synthesis of DNA, RNA and proteins themselves, in the muscle contraction, or in the transmission of signals between adjacent cells.


  1. Lodish H., Berk A., Zipursky S.L., et al. Molecular Cell Biology. 4th edition. New York: W. H. Freeman; 2000. Section 3.1, Hierarchical Structure of Proteins.
  2. Kessel A., Ben-Tal N. Introduction to proteins: structure, function, and motion. CRC Press, 2011 doi:10.1002/cbic.201100254
  3. Nelson D.L., Cox M.M. Lehninger. Principles of biochemistry. 6th Edition. W.H. Freeman and Company, 2012
  4. Stipanuk M.H., Caudill M.A. Biochemical, physiological, and molecular aspects of human nutrition. 3rd Edition. Elsevier health sciences, 2012

Biochemistry, metabolism, and nutrition