
2023 Impact Factor
Department of Gastroenterology, VA Greater Los Angeles Healthcare System; Division of Digestive Diseases, David Geffen School of Medicine at UCLA; Department of Health Services, UCLA School of Public Health; and UCLA/VA Center for Outcomes Research and Education, Los Angeles, California, USA.
Patients typically seek health care because they experience symptoms. Health care providers must elicit, measure and interpret patient symptoms as part of the clinical evaluation. Patient-generated reports, also known as Patient-Reported Outcomes (PROs), capture the patients' illness experience in a structured format and may help bridge the gap between patients and providers. The United States Food and Drug Administration (FDA) defines a PRO as "any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else."1
PROs measure any aspect of patient-reported health (e.g., physical, emotional or social symptoms) and can help to direct care and improve clinical outcomes. When clinicians systematically collect patient-reported data in the right place at the right time, PRO measurement can effectively aid in detection and management of conditions,2,3 improve satisfaction with care4 and enhance the patient-provider relationship.4-8
In addition to their use in clinical practice, PROs also play an important role in clinical trials and other research endeavors. For example, health-related quality of life (HRQOL), a sub-type of PRO that measures biopsychosocial health, has gained traction as an outcome in clinical research, including clinical trials. HRQOL measures can document patient improvement or decrement over time, and help to estimate the benefits of clinical interventions. In addition, the FDA now considers the patient report in drug approval, and has developed guidance for use of PROs in clinical trials.1 The National Institute of Health (NIH) has also supported a major PRO initiative, called the Patient Reported Outcome Measurement Information System (PROMIS®; www.nihpromis.org), designed to develop and evaluate several HRQOL domains.9,10 I will discuss PROMIS in detail later in this article because it is now becoming the new gold standard of PRO measurement. Finally, the rising prominence of the Chronic Care Model, which emphasizes the centrality of the provider-patient relationship in clinical decision making,2,3 places the patient report front-and-center. In short, there is a confluence of scientific, regulatory and political factors that amplify the importance of PRO research.
Gastrointestinal (GI) illnesses can lead to physical, mental and social distress.11 For this reason, patients, providers, investigators and regulators are interested in using PROs to guide clinical decision-making, conduct clinical research and achieve drug approval in GI disorders. Over the last 2 decades, investigators have developed PROs that measure a range of GI symptoms, including physical, emotional and social features of digestive disorders. It is important for GI providers and researchers to be aware of existing PROs, their limitations and how they can be used.
The purpose of this article is to help guide clinicians and investigators by reviewing PRO measurement in gastroenterology. Because much of the work in PROs focuses on functional gastrointestinal disorders (FGIDs), like irritable bowel syndrome (IBS), I will intermittently focus on FGIDs as a model for how to measure and employ PROs for clinical and research purposes.
There are at least four reasons why measuring PROs is relevant in clinical practice. First, any experienced provider knows that traditionally measured biomarkers often fail to correspond with how a patient is actually feeling. For example, in diabetes, clinicians usually measure the hemoglobin A1C level. This value is often used to make treatment decisions, like how aggressively to treat the diabetes, or whether to change to a new medication. But the problem is that some patients may have a low hemoglobin A1C but still feel listless or depressed despite their favorable laboratory values. In contrast, others with unfavorable levels may nonetheless feel upbeat and vigorous. Similarly, an IBS patient with 6 daily bowel movements may share the same problems at work (like difficulty getting the job done because of the constant bother of symptoms) as another IBS patient with 3 daily bowel movements. In both examples the traditional outcome measured by healthcare providers (e.g., hemoglobin A1C levels, bowel movement frequency) fails to capture other aspects of health. In other words, just asking about bowel movement frequency in IBS (for example) is not enough to fully understand how the disease is affecting a patient; it is only part of the picture.
A second reason to measure PROs is that patients rarely value traditional biomarkers in the same manner as providers. For example, patients with hypertension often fail to share the same enthusiasm as their providers in achieving specific blood pressure goals, but are quick to comply with therapy when their hypertension leads to headaches or dizziness. Similarly, some patients with chronic constipation could care less if their therapy allows them to achieve an increase of 1 or 2 more bowel movements per week, but care greatly if the improvement also allows them to eat dinner without worrying about the consequences of their food selection. Measuring HRQOL, in particular, directly acknowledges that patients often value different outcomes than their providers.
Third, PROs provide a key component to understanding the true burden of any disease.Traditional measures of disease burden include the prevalence of a disease, direct and indirect expenditures of a disease, and the worker productivity decrements related to a disease. However, in order to fully appreciate the true burden of a disease, it is also important to appreciate the HRQOL decrement engendered by the disease. The notion of "weighting" diseases not only by their cost and prevalence but also by their HRQOL decrement has an innate sense of fairness and is a fundamental principle of health economics.12 This is important, because it suggests that society and health providers are willing to spend more money to cure diseases that severely impact HRQOL than diseases that only moderately impact HRQOL. For this reason alone it is critical to carefully understand the HRQOL decrement of various diseases, because that information may have policy implication when it comes to developing a healthcare budget.
Fourth, PROs are especially important in diseases that are marked by morbidity, but not mortality. If we only emphasized the care of diseases that are mortal, then most of the world would be without treatment, since, thankfully, many diseases do not cause death. Fortunately, the FGIDs, like IBS, are not mortal diseases. But that does not mean that they are not important, or that they are just a "nuisance" that should be disregarded. To the contrary, FGIDs can severely affect HRQOL. Without measuring PROs like HRQOL, it would be unfair to conclude that FGIDs are simply a "nuisance" because they are not mortal diseases like cancer. This is distinguished from diseases where HRQOL really has no important role, such as in patients with an acute gallstone, or patients with an exsanguinating peptic ulcer. These are conditions where acute treatment must be rendered, regardless of how the HRQOL is affected at the moment of illness. We do not administer "feeling thermometers" or HRQOL questionnaires in patients with an active ulcer bleed or a gallstone obstruction - we manage the disease immediately regardless of precise PRO data. In contrast, PROs have large clinical relevance in patients with disorders like chronic migraine headache, fibromyalgia or depression. The FGIDs share characteristics with these latter conditions, since they uniformly affect quality but not quantity of life. Thus, failing to acknowledge PROs in chronic conditions like the FGIDs is a dangerous approach, because it tends to minimize the importance of these conditions.
PROs are usually measured with patient questionnaires, or so-called "instruments." PRO instruments may collect data across several areas of health, including physical, psychological and social functioning. PRO instruments are generally classified as either "generic" or "disease-targeted."12 Generic instruments are questionnaires that were developed to measure PROs across many different conditions. In contrast, disease-targeted instruments are designed to target one or more specific conditions. Examples of the former include the short form-36 health survey (SF-36)13 and the sickness impact profile.14 As generic instruments, the SF-36 and sickness impact profile were designed to measure health across a range of conditions rather than specific individual conditions. For example, the SF-36 has been used to measure health in over 100 conditions ranging from hepatitis C to colon cancer to IBS. As with other balanced HRQOL instruments, the SF-36 measures health across several areas, including mental, physical and social health. The instrument includes 36 items (thus the "36" of "SF-36") that are organized into 8 discrete scales, such as "physical functioning," "vitality," or "bodily pain." Each scale can be scored from 0 to 100, with higher scores indicating a better HRQOL. In this manner, any patient can be assigned a set of values to measure their HRQOL, much like a vital sign. The SF-36 is particularly useful in FGIDs because it captures areas that are deemed important by patients with FGIDs, including bodily pain, energy/fatigue and social functioning. In particular, the SF-36 contains several items pertaining to "vital exhaustion," including the degree of feeling "full of life," feeling "worn out" and feeling "tired." Because vital exhaustion is thought to be a critical component of the biology of FGIDs like IBS,15 the SF-36 is an example of an appropriate generic instrument to use in measuring PROs in FGIDs.
Beyond generic PRO instruments, there have been over 110 disease-targeted PROs developed in gastroenterology. The PROs cover a range of conditions, including achalasia,16 celiac sprue,17,18 dyspepsia,19-36 eosinophilic esophagitis,37 fecal incontinence,38-53 functional GI disorders,18,34,42,46,53-79 GERD,23,31-33,36,80-102 GI malignancies,41,70,103-107 post gastrectomy,105,108 ileal conduit diversion,109 ileostomy,110 inflammatory bowel disease,111-119 pregnancy-related GI symptoms,120-122 systemic sclerosis123 and radiation enteritis,50 among others. Thirteen of the PROs apply to the pediatric population,37,38,53,70,73,90,93,95,107,115,116,124,125 and 6 apply specifically to women.49,120-122,126,127
There have been several PROs designed for the FGIDs.128 These instruments vary in content, length and degree of data supporting their validity for use in clinical practice. Of the multiple disease-targeted instruments in FGIDs, the IBS quality of life (IBS-QOL) questionnaire,66 originally developed by Patrick and Drossman, has the most extensive data supporting its validity. In particular, the IBS-QOL can be used to measure patient HRQOL over time, and the change in IBS-QOL scores often correlate with other important features of health. In this manner, providers could use the IBS-QOL to follow their patient longitudinally to complement traditionally measured parameters. In addition, the IBS-QOL has been used in clinical treatment trials to help determine whether specific therapies improve HRQOL. This type of information is critical, because treatments that improve only 1 or 2 symptoms might not be useful if they cannot simultaneously improve HRQOL.
Despite all the academic and practical reasons to measure PROs in clinical practice, most providers do not routinely administer PRO instrument. There are several reasons why. First, many providers find it burdensome. In order to accurately measure PROs, providers must perform a thorough evaluation of multiple physical, psychological and social areas of health.Validated instruments like the SF-36 or IBS-QOL are designed to capture information from each of these areas of health in order to establish a broad and balanced portrait of a patient's unique health status. However, these measures are primarily designed for research purposes, and clinicians rarely have the time or inclination to assess PROs with this degree of detail. Second, most healthcare providers, including gastroenterologists, have not received training in how to perform a complete "biopsychosocial" evaluation,129 in which they take the time not only to ask about individual physical symptoms, but also to determine whether and how the disease causes emotional or social distress.11 Indeed, studies indicate that this type of history taking is often not performed. Third, many providers question whether PRO data are clinically "actionable." Without knowing how to employ PRO data, some clinicians question the value of routinely collecting PRO information as part of everyday care.
In light of this reality, it is important to arm providers with a concise list of questions that can help them rapidly understand their patients PRO status. Moreover, it is important to show how knowing this information can help drive treatment decisions by allowing providers to gain better insight into their patients' overall health. This leads to a discussion of the NIH PROMIS system - a novel approach to measuring PROs for research and clinical practice.
In 2002, the NIH made a renewed commitment to chart a "roadmap" for medical research into the 21st century. Through a series of planning meetings with multiple stakeholders, the NIH identified gaps in biomedical science and research, and prioritized major areas of inquiry that cut across all of human health. The idea was to develop a mechanism to support research that no single NIH institute could handle on its own. The ultimate goal of the roadmap initiative was to jump-start the biomedical research enterprise in the U.S., re-engineer it to help meet the modern realities of healthcare, and ultimately deliver a set of tools that can improve individual and public health in tangible and measurable ways.
PROMIS was a major program that emerged from the roadmap initiative.10 Sponsored in 2004, PROMIS was launched with the goal of building, validating and disseminating a toolbox of publically available item banks capable of measuring PROs across the breadth and depth of the human illness experience. To accomplish this, the NIH sponsored a consortium of centers to work collaboratively to develop PROs for public use. To ensure harmonization of efforts, all the PROMIS sites were mandated to employ the same package of rigorous methodologies while building their PRO item banks. Moreover, the resulting PROs would be different from traditional paper-and-pencil questionnaires. Instead, the PROMIS item banks were intended from the start to be administered electronically, and to employ item response theory (IRT) with computerized adaptive testing (CAT).130,131 The PROMIS vision was to use these modern techniques to create highly efficient and very short questionnaires that could be easily implemented in busy clinical systems while preserving reliability and validity (more on IRT and CAT, below).
There are currently over 10 primary research centers that comprise the PROMIS network. Our unit at the University of California, Los Angeles, is the PROMIS site focused on measuring GI-related symptoms. Each PROMIS site is unified by the same "PROMIS standards" of methodology. A statistical coordinating center, based at Northwestern University in Chicago, supports and administers the online "Assessment Center" program that holds the PROMIS item banks. Through Assessment Center, PROMIS investigators, other researchers, industry and clinical sites can use the existing PROMIS item banks for their clinical and research purposes. Refer to the PROMIS website at www.nihpromis.gov for more information on the full set of freely-available item banks, the conceptual framework of all PROMIS domains (including completed domains and those under construction) and for more information on Assessment Center and its functions.
The PROMIS effort was borne from the realization that patients are the ultimate consumer of healthcare, and are the final judge of whether our healthcare is working. Moreover, patients seek healthcare because of symptoms, as noted in earlier in this article. Healthcare providers, in turn, are charged with eliciting, measuring, and interpreting these symptoms to direct clinical decision-making. But this is easier said than done.132 Although understanding the patient report serves a vital role in directing care, most providers have opted for ad hoc and informal measurement of symptoms and function. Yet there have been extensive efforts to develop and apply reliable and valid PRO measures across diseases, with a special focus on chronic illnesses.4,9,133
Although there have been many efforts to bring PRO data to the clinical setting, as reviewed extensively elsewhere,4,132-136 a confluence of events, coupled with advancements in PRO measurement techniques, justifies re-evaluating the role of PROs in clinical practice. In particular, advances in computing and information technology permit inexpensive and seamless data collection and processing, facilitating both previously unimaginable individualization and efficient data delivery to healthcare providers. Systematic collection of patient reported data promises to inform other aspects of the evolving Chronic Care Model, such as routine database driven practice audits. With increasing development and dissemination of electronic medical record systems, there are more opportunities than ever to integrate PRO data into everyday clinical practice.
In this context, the NIH launched PROMIS to develop and validate a toolbox of PROs spanning most illness domains, as noted above.9,10 Using modern psychometric techniques, such as IRT and CAT,130,131 PROMIS is an example system that offers the potential for addressing modern PRO measurement needs, establishing common-language benchmarks for symptoms across conditions, and identifying clinical thresholds for action and meaningful improvement or decline. The FDA and NIH are currently in discussions to evaluate whether and how PROMIS item banks can help fit regulatory objectives in drug approval. In short, there is now a confluence of scientific, regulatory, and political factors that amplify the importance of PRO research, and PROMIS finds itself in the midst of this confluence with rigorously developed PROs to assist end-users (most importantly and patients themselves).
Despite the unprecedented opportunities for realizing the vision of collecting PRO data in routine clinical care, we must again recognize the practical challenges to this vision; Table lists several of these well-acknowledged challenges, described in previous reviews.132,134,135 However, NIH PROMIS investigators believe these challenges can now be adequately met, and that the time has come to parlay modern advancements in PRO science to overcome these obstacles. I have coupled each challenge in Table with possible solutions, along with providing examples of how the challenges are being overcome.
The success of PROMIS is firmly rooted in its use of IRT and CAT. A full review of these techniques is well beyond the scope of this brief overview - for a detailed discussion, refer to these citations.130,131 In short, IRT is often referred to as "modern psychometric theory," in contrast to "classic test theory," or CTT. The basic idea behind both IRT and CTT is that there is some latent construct, or "trait," underlying an illness experience. This construct cannot be directly measured, but can be indirectly measured by creating items that are scaled and scored. For example, "fatigue," "pain," "GI distress" or even "happiness" are latent constructs - we cannot take a picture of these things, run a blood test for these things, or obtain an X-ray to view these things. But we know they exist. People can experience more or less of these constructs, which can be measured along a continuous scale. We can infer the amount of a latent construct by measuring it indirectly with individual items. These items, in turn, can be rolled into scores using a variety of algorithms.
Although both IRT and CTT assume the presence of an underlying, unobservable, latent trait, the techniques diverge when it comes to how the trait is measured. The main difference between IRT and CTT is that the former can support CAT, whereas the latter does not; this is the beauty of IRT.
In a CAT-administered item bank, patients all view the same initial item. However, depending on the respondent's answer to the first item, the computer selects a tailored second item - i.e., it adapts based on the respondent's input. This process continues, based on an underlying algorithm, until the computer is satisfied that it has a good sense of the amount of some underlying construct (e.g., fatigue). The computer is usually able to figure this out quickly, and the patient can typically answer far fewer questions than would be necessary with CTT, in which a full questionnaire is administered from start to finish, no matter what, and the score is based off the full set of items.
So how does CAT work? It can only work if there is an underlying IRT algorithm. Whereas CTT asks: "Given a person's total score on a questionnaire, what is the respondent's level on the trait being measured?", IRT asks: "Given what is known about the unique set of items viewed and individual responses to those items, what is the respondent's most likely level of the trait being measured." So, CTT deals with total scores based on all items, whereas IRT deals with individual item responses. Furthermore, IRT employs those responses to estimate a likely score without having to use all the items in the full questionnaire.
The mathematics of IRT is fairly complex, and reviewed elsewhere.130 But the basic idea is that IRT assumes there is a natural order of difficulty of items in an item bank. This is not difficulty in the usual sense, like one examination question being "harder" than another. Here, difficulty refers to the likelihood of reaching a certain level of illness severity. For example, walking 10 feet is less difficult than walking 10 miles - those have an obvious order. In GI, having reflux symptoms alone would be ordered lower than having reflux with dysphagia. And bowel urgency alone would be ordered lower than urgency with bowel incontinence. Similarly, nausea with vomiting would be ranked higher than mere "queasiness" alone, although these symptoms are arranged along a spectrum. So, the idea is that items can be rank-ordered along a continuum of difficulty, and this ordering is fundamental to IRT.
In order for IRT to work, each item is assigned a variety of parameters. One parameter, already discussed above, is the difficulty of the item. Another important concept is the item discrimination parameter. This parameter models the rate of increase in the probability of endorsing an item as the amount of underlying trait increases; it indicates the strength of association between an individual item and the overall trait being measured. Highly discriminating items can reliably identify patients with small but measurable differences in a trait along a continuum.
Using the difficulty and discrimination parameters of items in a bank, CAT can pick-and-choose items that a patient will view, and quickly hone in on a trait-level estimate. With just a few steps, the IRT algorithm can employ CAT to predict, with a high degree of accuracy, what the patient would have scored had they completed the entire questionnaire.
Our group at UCLA is developing the GI Symptom item bank for PROMIS. The final product will yield a set of publically available, CAT-administered item banks capable of measuring GI symptoms across several domains (Fig. 1). We are evaluating the performance of these item banks using both cross-sectional and longitudinal cohorts; the latter is important to evaluate whether the item banks can detect meaningful change over time. We are working to capture the breadth and depth of physical symptoms associated with GI involvement, including the multiple dimensions of those symptoms (e.g., frequency, severity, bothersomeness, predictability and duration etc.). This item bank, due for completion in mid 2013, will be applicable to both the general population, and to patients with a defined GI illness. It will be a system-targeted item bank that measures physical symptoms of the GI tract; it will not be a disease-targeted item bank. This is an important distinction of PROMIS, because disease-targeted item banks are not useful across the population as a whole. PROMIS aims to support item banks that are applicable to all comers.
The conceptual framework in Figure 1 represents our current understanding of GI symptoms. This model has been principally informed by structured cognitive interviews we conducted in over 120 patients with IBS, published elsewhere,137 coupled with systematic literature searches and expert opinion. In our PROMIS work we found that this model applies across all conditions marked by GI symptoms - not just IBS.
The current GI PROMIS framework posits that GI Symptoms are captured by 8 domains: (1) Belly pain, (2) Bloat/gas, (3) Diarrhea, (4) Constipation, (5) Bowel incontinence/soilage, (6) Heartburn/reflux, (7) Swallowing and (8) Nausea/vomiting.
These PROMIS item banks may ultimately be employed in several settings, including clinical practice, research and clinical trials. The item banks will be especially useful in clinical practice given their highly efficient nature, and the ability to place the CAT-administered questionnaires on web-based electronic media, such as laptops, smart phones or tablets.
These electronic patient-provider portals (P3s) are changing how we deliver healthcare.138 Using P3s, patients interact with their providers within the comfort of their own home; this expands the care model beyond face-to-face visits, and allows patients to self-report symptoms and other illness experiences (e.g., "psychomarkers") to complement traditional biomarkers. Providers review the reports through an electronic health record, and then collaborate with patients to make shared health decisions. P3s can also use algorithms to prepare tailored "educational prescriptions" by drawing from a library of online resources. This process can be user-friendly, empowering to patients and ultimately improve patient-provider communication.
Figure 2 shows a sample report developed with input from GI physicians. Patients will complete the PROMIS GI item banks using a P3 system, and then the data will be converted to a report for providers to review directly from the electronic health record. This type of interface can change how we monitor patients in the context of everyday clinical practice, and translate academic theory into tools we can employ to improve patient outcomes.