Current strategies for interpreting pulmonary function tests (PFTs) are, in reality, guidelines for classifying them1. The verbs to classify and to interpret are not interchangeable. Interpretation implies assigning meaning to a result. The interpretation of PFT results corresponds to the clinician, who, in light of the medical history, physical examination, and other diagnostic studies, gives meaning to functional findings. Interpretation implies integration of information. In contrast, classifying the results of functional studies is a simpler task, whose objective is to standardize the way we refer to physiological patterns. Standardized classification requires adherence to operational definitions, which inherently have limitations. Despite the clear differences between classification and interpretation, the terms are frequently used interchangeably, and this misuse underlies many of the clinical controversies surrounding PFT results.
Functional patterns derived from PFTs only increase or decrease the pretest probability of a specific diagnosis. Classification algorithms are based on mathematical models. These models may be deterministic or probabilistic. An example of the former includes classical physical laws, in which all factors determining an outcome are known. For example, velocity is determined by distance and time. By contrast, in clinical medicine, models are generally probabilistic. That is, they allow estimation of the probability that a phenomenon is present. Probabilistic models increase or decrease certainty regarding the existence of a phenomenon. For example, if a subject has a forced vital capacity (FVC) of 0.8 L, there is a high probability (greater certainty) of fibrosing lung disease; however, other causes (eg, muscle weakness, air trapping) may explain the finding. Probability is therefore a measure of certainty and ranges from 0 to 1. The predictive performance of mathematical models used in respiratory physiology—even when incorporating multiple parameters—typically ranges between 0.4 and 0.6. The remaining difference from unity reflects uncertainty.
In clinical physiology, the comparator is usually a multiple regression equation (accounting for age, height, and sex) used to estimate expected lung function in a healthy individual. The data used to construct these equations are derived from individuals considered healthy. However, what defines “healthy”? The stricter the criteria used to define health, the greater the selection bias. Functional values derived from exceptionally healthy individuals may be inappropriately high when used as comparators, thereby increasing false-positive rates2. This represents only one source of variability in reference equations. Currently, global reference equations constructed from data from thousands of healthy individuals across multiple countries are widely accepted3. Implementation of these equations has altered the prevalence of functional abnormalities and clinical diagnoses, with a significant increase in the frequency of respiratory disease, particularly among individuals of African ancestry3,3,4.
Even assuming that reference equations accurately represent healthy individuals in a population, another issue arises: the definition of normality. Normality has been defined statistically; however, what is frequent is not necessarily synonymous with healthy. For example, based on frequency alone, one might argue that in Mexico a body mass index (BMI) between 28 and 32 is “normal,” but although frequent, it is not healthy. “Normal” should not be conflated with “healthy.”
To define normality, we rely on data distribution. A standard normal distribution has a mean of 0 and a standard deviation (SD) of 1, with mean, median, and mode identical. This curve has been used as a reference for defining normality5. Returning to FVC: if an individual has an FVC of 4.9 L and the reference mean is 5 L, there is a high probability that the value is “normal,” since it deviates minimally from the mean. How close must a measurement be to the reference mean to be considered normal? The distance between the measured value and the reference mean may be expressed in original units (eg, liters) or standardized as SD units. The number of SDs represents a uniform and comparable measure of deviation from the group mean. SD units are synonymous with the Z score6. The Z score standardizes data dispersion6.
In a standard normal distribution, 90% of observations lie between –1.64 and +1.64 Z scores, 95% between –1.96 and +1.96, and 99% between –2.58 and +2.58. By convention, the lower limit of normal (LLN) corresponds to the 5th percentile (Z = –1.64). Notably, 5% of healthy individuals will fall below this threshold. Thus, defining normality using the LLN excludes 5% of healthy individuals and represents an imperfect cutoff.
To illustrate cutoff limitations, consider the FEV1/FVC ratio. This spirometric parameter defines obstruction. Most pulmonologists accept that FEV1/FVC < LLN indicates obstruction; consequently, values above LLN are considered nonobstructive. However, when less stringent definitions are used (eg, FEV1/FVC < 0.7), functional abnormalities are detected even among individuals classified as normal by the stricter LLN criterion. These abnormalities include reduced diffusing capacity for carbon monoxide (DLCO), impaired ventilatory efficiency during exercise, and lower oxygen consumption7.
Even within the area under the curve considered normal, important differences exist. In terms of mortality, it is not equivalent to be classified as normal with an FEV1 at –1.5 Z score vs an FEV1 at +0.5 Z score. Both values fall within the normal range; however, the former is associated with higher mortality than the latter. This mortality gradient among individuals considered healthy was recently demonstrated in a US study that included 2 cohorts followed for 20 years8. These findings underscore that nature does not recognize arbitrary cutoff points. Lung function is a continuous variable, and categorizing individuals into discrete functional intervals is therefore somewhat artificial and potentially misleading. A recent British study including more than 300,000 participants demonstrated—consistent with the observations of John Hutchinson in 1846—that lung function measured in its original units (liters of FVC and FEV1), without covariate adjustment, predicts 12-year mortality9,10.
Given the many limitations of the current definition of normality—some of which have been discussed above—it is reasonable to ask how we might achieve a more meaningful and efficient interpretation of PFT results. Concluding that a given range of FEV1 or FVC values is normal or abnormal, and further categorizing them as mild, moderate, or severe, clearly oversimplifies functional findings.
One alternative to this limitation is to perform repeated measurements of lung function, thereby describing the temporal course of functional parameters, also referred to as functional trajectories. Numerous prenatal and postnatal factors influence lung development during childhood and, consequently, the level of lung function achieved in adulthood11. Lung function attained between 20 and 25 years of age ultimately reflects both genetic and environmental influences. The clinical and epidemiologic challenge is to identify and modify risk factors to alter the trajectory of lung function so that individuals reach optimal pulmonary capacity. Otherwise, individuals may enter adulthood with reduced lung function and subsequently experience progressive decline associated with aging, environmental exposures, or chronic disease. Thus, it may be advisable to move beyond purely cross-sectional analysis of lung function toward a longitudinal approach, in which the individual serves as their own comparator. Several methods exist to evaluate longitudinal changes in both children and adults1. The FEV1Q method is among the most robust approaches for this purpose in adults12. FEV1Q is calculated by dividing the measured FEV1 (in liters) by 0.5 in men and 0.4 in women; these denominators correspond to the first percentile of FEV1 in a healthy population for each sex. As the ratio (FEV1Q) approaches 1, respiratory function is increasingly impaired. Under normal conditions, a decline of one unit in this ratio occurs over approximately 18 years. If such a decline occurs over a shorter period—eg, 10 years—this suggests accelerated loss of lung function. For nonspirometric functional parameters, however, longitudinal data remain limited and represent an important area for future research. Most functional trajectories have been derived from cross-sectional data; therefore, additional long-term cohort studies are needed to reduce sources of variability.
Another alternative in the interpretation of PFTs-beyond cross-sectional and longitudinal analysis-is systems medicine and artificial intelligence (AI)13. Biomedical research has evolved substantially. Historically, medicine was reactive, focusing on addressing emerging health problems through clinical descriptions, case series, or relatively simple clinical trials. Subsequently, research efforts shifted toward phenotype identification, particularly through omics-based approaches. Multiomics, which examines interactions among omic layers, has improved understanding of systems biology, where the interactions among components are more important than isolated elements. This paradigm has advanced the development and implementation of precision medicine14. In this context, PFTs can be integrated as functional aggregates into large-scale data networks, enabling the construction of complex biological models capable of predicting therapeutic response and individual prognosis with minimal error. Under such circumstances, pulmonary function assessment would acquire greater clinical meaning. Until then, the interpretation of PFTs will remain—as Dr Alberto Neder has noted—a study with n = 115.
By way of corollary, several conclusions may be drawn. First, cross-sectional PFT results inherently generate classification errors depending on the cutoff used. Second, statistical normality does not necessarily equate to clinical health. Third, individualized analysis of longitudinal functional trajectories appears to be a more efficient strategy for interpreting PFTs. Fourth, incorporation of lung function into complex AI-based models may improve the identification of health priorities.
Funding
This study received no specific funding from public, commercial, or not-for-profit agencies.
Conflicts of interest
The author declared no conflicts of interest whatsoever.
Ethical considerations
Protection of humans and animals. No experiments involving humans or animals were performed.
Confidentiality and informed consent. The study does not involve personal patient data and did not require ethics approval. SAGER guidelines do not apply.
Declaration on the use of artificial intelligence (AI). The authors declare that no generative artificial intelligence was used in the writing or creation of the content of this manuscript.
