Protein classification: a comprehensive guide

Protein classification is a useful approach for organising the vast diversity of these molecules found in living organisms. Proteins differ widely in their chemical composition, structure, physicochemical properties, and biological roles, and these differences reflect the variety of functions they perform in cells and tissues.

Over time, different methods of protein classification have been proposed, each based on a specific characteristic of the protein molecule. Some classifications emphasize chemical composition, distinguishing between simple and conjugated proteins, whereas others focus on molecular shape, separating fibrous from globular proteins. Additional approaches classify proteins according to their biological functions, such as enzymatic activity, transport, structural support, or regulation. Finally, physicochemical properties, including solubility in different solvents, provide a practical criterion that is particularly useful in experimental protein chemistry.

None of these classification systems is universally applicable, as each highlights only certain aspects of protein structure and behaviour. For this reason, different classification schemes are often used in parallel. In the following sections, protein classification based on chemical composition, shape, biological function, and solubility is described and discussed.

Contents

Protein classification based on chemical composition

On the basis of their chemical composition, proteins may be classified according to whether they consist exclusively of amino acids or whether they also contain additional non-protein components. This distinction reflects important differences in structure, physicochemical properties, and biological function. According to this criterion, proteins are divided into two main classes: simple proteins and conjugated proteins.

Simple proteins

Simple proteins, also known as homoproteins, are composed exclusively of amino acids linked by peptide bonds and do not contain any non-protein components. Upon hydrolysis, they yield only amino acids. Despite their relatively simple chemical composition, these proteins may perform a wide range of structural, mechanical, and physiological functions in living organisms. Examples of simple proteins include plasma albumin, collagen, and keratin.

Conjugated proteins

Conjugated proteins, also referred to as heteroproteins, are proteins that contain, in addition to the polypeptide chain, a non-protein component known as a prosthetic group. This component is tightly bound to the protein (often covalently) and is essential for its biological activity. Depending on the nature of the prosthetic group, conjugated proteins may be involved in transport, structural organization, signalling, or catalytic processes. Important classes of conjugated proteins include glycoproteins, chromoproteins, and phosphoproteins.

Glycoproteins

Glycoproteins are proteins that covalently bind one or more carbohydrate units to the polypeptide backbone. Typically, the carbohydrate chains consist of no more than 15–20 monosaccharide units, which may include arabinose, fucose (6-deoxygalactose), galactose, glucose, mannose, N-acetylglucosamine (GlcNAc or NAG), and N-acetylneuraminic acid (Neu5Ac or NANA).
Examples of glycoproteins include:

  • glycophorin, one of the best-known glycoproteins of the erythrocyte membrane;
  • fibronectin, which anchors cells to the extracellular matrix through interactions with collagen or other fibrous proteins on one side and with cell membranes on the other;
  • all blood plasma proteins, with the exception of albumin;
  • immunoglobulins, also known as antibodies.

Chromoproteins

Chromoproteins are proteins that contain coloured prosthetic groups responsible for their characteristic absorption of visible light.
Typical examples include:

  • haemoglobin and myoglobin, which bind four and one heme groups, respectively;
  • chlorophyll-protein complexes, where the pigment has a porphyrin ring containing a magnesium atom at its centre;
  • rhodopsins, which bind retinal.

Phosphoproteins

Phosphoproteins are proteins in which phosphoric acid is covalently bound to serine, threonine and tyrosine residues. They generally perform either structural functions, such as in tooth dentin, or reserve functions, such as in milk caseins (α-, β-, γ-, and δ-caseins) and egg yolk phosvitin.

Protein classification based on shape

Protein classification based on shape considers the overall three-dimensional organization of the polypeptide chain and the way protein molecules are assembled in space. The molecular shape of a protein is closely related to its amino acid sequence, secondary and tertiary structures, and ultimately to its biological function. From this structural point of view, proteins exhibit markedly different physical properties, such as solubility, flexibility, and mechanical resistance.

On the basis of their shape, proteins may be divided into two main classes: fibrous proteins and globular proteins. These two groups differ not only in their overall conformation, but also in their physicochemical behaviour, biological roles, and nutritional properties.

Fibrous proteins

Fibrous proteins primarily perform mechanical and structural functions, providing support for cells as well as for the entire organism.

These proteins are insoluble in water because they contain a high proportion of hydrophobic amino acids, both within their core and on their surface. The presence of hydrophobic amino acids on the surface facilitates their assembly into highly ordered supramolecular structures. In this context, it should be noted that their polypeptide chains form long filaments or sheets in which, in most cases, only one type of repeating secondary structure is present.

In vertebrates, fibrous proteins provide external protection, support, and shape. Owing to their structural properties, they ensure flexibility and/or mechanical strength.

Some fibrous proteins, such as α-keratins, are only partially hydrolysed in the intestine.

Representative examples of fibrous proteins are described below.

Fibroin

Fibroin is produced by spiders and insects. A well-known example is the fibroin synthesized by the silkworm Bombyx mori.

Collagen

The term collagen does not refer to a single protein, but to a family of structurally related proteins (at least 29 different types) that constitute the main protein component of connective tissue and, more generally, the extracellular scaffolding of multicellular organisms. In vertebrates, collagens account for approximately 25–30% of total protein content.

Collagens are found in various tissues and organs, including tendons and the organic matrix of bone, where they are present in very high proportions, as well as in cartilage and in the cornea of the eye.

In different tissues, collagens form distinct supramolecular organizations, each adapted to specific functional requirements. For example, in the cornea, collagen molecules are arranged in an almost crystalline array, rendering the tissue virtually transparent. In contrast, in the skin they form fibres that are less tightly interwoven and oriented in multiple directions, thereby ensuring the tensile strength of the tissue.

α-Keratins

α-Keratins constitute almost the entire dry weight of nails, claws, beaks, hooves, horns, hair, wool, and a large portion of the outer layer of the skin.

The different degrees of stiffness and flexibility of these structures depend on the number of disulfide bonds which, together with other intermolecular forces, contribute to the stabilization of the protein structure. For this reason, wool keratins, which contain relatively few disulfide bonds, are flexible, soft, and extensible, whereas keratins found in claws and beaks are rich in disulfide bonds and therefore much stiffer.

Elastin

Elastin is a fibrous protein that provides elasticity to the skin and blood vessels. This property derives from its randomly coiled structure, which differs markedly from the more ordered structures of α-keratins and collagens.

Note: the various types of collagen have low nutritional value because they are deficient in several essential amino acids. In particular, they lack tryptophan and contain low amounts of other essential amino acids. Gelatin used in food preparation is a derivative of collagen.

Globular proteins

Most proteins belong to this class.

Globular proteins possess a compact, approximately spherical structure that is more complex than that of fibrous proteins. In addition to secondary structure elements, they exhibit well-defined motifs and domains, as well as tertiary and, in many cases, quaternary levels of organization.

They are generally soluble in water; however, some globular proteins are embedded in biological membranes as transmembrane proteins and therefore operate in a hydrophobic environment.

Protein classification: quaternary structure of haemoglobin, illustrating the globular arrangement of four polypeptide chains.
Quaternary Structure of Haemoglobin

Unlike fibrous proteins, which mainly perform structural and mechanical functions, globular proteins carry out a wide variety of biological roles, including:

  • enzymatic catalysis;
  • hormonal regulation;
  • membrane transport and receptor activity;
  • transport of triglycerides and fatty acids, which are classes of lipids, as well as oxygen in the blood;
  • immune defence, as in the case of immunoglobulins (antibodies);
  • nutrient storage in grains and legumes.

Examples of globular proteins include myoglobin, haemoglobin, and cytochrome c.

At the intestinal level, most globular proteins of animal origin are almost completely hydrolysed into their constituent amino acids.

Protein classification based on biological functions

The wide range of functions performed by proteins arises from both the folding of the polypeptide chain, which determines their three-dimensional structure, and the presence of numerous functional groups in the amino acid side chains, such as thiols, hydroxyl groups, thioethers, carboxamides, carboxyl groups, and various basic groups.

From a functional perspective, proteins may be classified into several groups.

Enzymes

In living organisms, almost all biochemical reactions are catalysed by specific proteins known as enzymes. These proteins exhibit very high catalytic efficiency, increasing the rate of the reactions in which they participate by at least a factor of 106. Consequently, life as we know it could not exist without their catalytic activity.

Almost all known enzymes, including the thousands present in the human body, are proteins, with the exception of certain catalytic RNA molecules known as ribozymes (ribonucleic acid enzymes).

Transport proteins

Many small organic and inorganic molecules are transported in the bloodstream and extracellular fluids, across cell membranes, and within cells from one compartment to another by specific transport proteins.

Examples include:

  • haemoglobin, which transports oxygen from the from the lungs to the peripheral tissues;
  • transferrin, which carries iron in the blood;
  • membrane carrier proteins;
  • fatty acid-binding proteins (FABPs), which are involved in the intracellular transport of fatty acids;
  • proteins of plasma lipoproteins, macromolecular complexes of proteins and lipids responsible for the transport of triglycerides, which are otherwise insoluble in water;
  • albumin, which transports free fatty acids, bilirubin, thyroid hormones, and certain drugs, such as aspirin and penicillin, in the blood.

Many of these proteins also play a protective role, since the bound molecules, such as fatty acids, may be harmful to the organism when present in free form.

Storage proteins

Examples of storage proteins include:

  • ferritin, which stores iron intracellularly in a non-toxic form;
  • milk caseins, which act as a reserve of amino acids in milk;
  • egg yolk phosvitin, which contains high amounts of phosphorus;
  • prolamins and glutelins, which are the main storage proteins of cereals.

Mechanical support

Proteins play a pivotal role in the stabilization of many biological structures. Examples include α-keratins, collagen, and elastin. The cytoskeleton itself, which forms the structural scaffold of the cell, is composed of proteins.

Generation of movement

Certain proteins are responsible for the generation of movement, including:

  • contraction of muscle fibres, of which actin and myosin are the major components;
  • propulsion of spermatozoa and microorganisms by means of flagella;
  • separation of chromosomes during mitosis.

Nerve transmission

Some proteins are involved in nerve transmission. An example is the acetylcholine receptor located at synapses.

Control of development and differentiation

Several proteins participate in the regulation of gene expression and cellular differentiation. An example is nerve growth factor (NGF), discovered by Rita Levi-Montalcini, which plays a key role in the formation of neural networks.

Hormones

Many hormones are proteins. These regulatory molecules control numerous cellular functions, ranging from metabolism to reproduction. Examples include insulin, glucagon, and thyrotropin or thyroid-stimulating hormone (TSH).

Protection against harmful agents

Antibodies are glycoproteins that recognize antigens expressed on the surface of viruses, bacteria, and other infectious agents. Interferon, fibrinogen, and blood coagulation factors are additional members of this functional group.

Energy storage

Proteins, and in particular the amino acids that constitute them, represent an important energy reserve, second in size only to adipose tissue. Under certain conditions, such as prolonged fasting, protein catabolism may become essential for survival. However, a reduction in body protein mass exceeding 30% leads to severe impairment of respiratory muscle contraction, immune function, and organ function, conditions that are incompatible with life. For this reason, proteins constitute an extremely valuable, though critical, energy source.

Protein classification based on solubility

Another criterion commonly used for protein classification is based on their solubility in different solvents. Protein solubility depends on several factors, including amino acid composition, three-dimensional structure, and interactions with water or other solvents. For this reason, proteins with different structural and chemical properties display markedly different solubility behaviour.

A classification based on solubility is particularly useful from an experimental point of view and has been widely applied, especially in the study of plant proteins. However, the same general principles can be extended to proteins in general.

On the basis of their solubility, proteins may be divided into several groups.

Water-soluble proteins

These proteins are soluble in pure water or dilute aqueous solutions. They are typically globular proteins with a high proportion of polar and charged amino acids exposed on their surface. Many enzymes, hormones, and transport proteins belong to this group. Serum albumin is a well-known example of a water-soluble protein.

Salt-soluble proteins

Some proteins are insoluble in pure water but become soluble in dilute saline solutions. Their solubility is favoured by the presence of ions, which reduce electrostatic interactions between protein molecules. This group includes many globulins found in both animal and plant tissues.

Alcohol-soluble proteins

A further group of proteins is insoluble in water and saline solutions but soluble in aqueous alcohol solutions, typically at a concentration of about 70% ethanol. These proteins are generally rich in hydrophobic amino acids and are found mainly in plant seeds, where they function as storage proteins.

Proteins soluble in acidic or alkaline solutions

Certain proteins are insoluble in water, saline solutions, and aqueous alcohol but become soluble in dilute acidic or alkaline solutions. These proteins often possess strong intermolecular interactions that can be disrupted only under extreme pH conditions.

The Osborne classification of plant proteins

In 1924, the American chemist Thomas Burr Osborne, who is considered the founder of plant protein chemistry, proposed a classification of plant proteins based on their solubility in different solvents. This classification, which is still widely used, divides plant proteins into four families:

  • albumins, soluble in water;
  • globulins, soluble in dilute saline solutions (for example, avenalin from oat);
  • prolamins, soluble in aqueous alcohol solutions but insoluble in water and absolute alcohol; this group includes gliadins, which together with glutenins constitute gluten;
  • glutelins, soluble only in dilute acidic or alkaline solutions.

This classification remains particularly useful for the study of seed storage proteins.

References

  • Alberts B., Johnson A., Lewis J., Morgan D., Raff M., Roberts K., Walter P. Molecular biology of the cell. 7th Edition. Garland Science, Taylor & Francis Group, 2022.
  • Berg J.M., Tymoczko J.L., Gregory J.G. Jr, Stryer L. Biochemistry. 9th Edition. W.H. Freeman and Company, 2019.
  • Branden C., Tooze J. Introduction to protein structure. 2nd Edition. Garland Science, 1999. doi:10.1201/9781136969898
  • Garrett R.H., Grisham C.M. Biochemistry. 6th Edition. Brooks/Cole, Cengage Learning, 2016.
  • Heilman D., Woski S., Voet D., Voet J.G., Pratt C.W. Fundamentals of biochemistry: life at the molecular level. 6th Edition. Wiley, 2023.
  • Nelson D.L., Cox M.M. Lehninger. Principles of biochemistry. 8th Edition. W.H. Freeman and Company, 2021.
  • Osborne T.B. The vegetable proteins. 2nd Edition. London: Longmans, Green and Co., 1924
  • Petsko G.A., Ringe D. Protein structure and function. Oxford University Press, 2008.
  • Rodwell V.W., Bender D.A., Botham K.M., Kennelly P.J., Weil P.A. Harper’s illustrated biochemistry. 31st Edition. McGraw-Hill, 2018.

Biochemistry and Metabolism