
Small intestinal bacterial overgrowth (SIBO) diagnosis is usually based on non-invasive breath tests (BTs), namely lactulose BT (LBT) and glucose BT (GBT). However, divergent opinions and problems of parameter standardization are still controversial aspects. We aim to perform a meta-analysis to analyze diagnostic performance of LBT/GBT for SIBO diagnosis.
We searched in main literature databases articles in which SIBO diagnosis was achieved by LBT/GBT in comparison to jejunal aspirate culture (reference gold standard). We calculated pooled sensitivity, specificity, positive, and negative likelihood ratios and diagnostic odd ratios. Summary receiver operating characteristic curves were drawn and pooled areas under the curve were calculated.
We selected 14 studies. Pooled sensitivity of LBT and GBT was 42.0% and 54.5%, respectively. Pooled specificity of LBT and GBT was 70.6% and 83.2%, respectively. When delta over baseline cut-off > 20 H2 parts per million (ppm) was used, GBT sensitivity and specificity were 47.3% and 80.9%; when the cutoff was other than and lower than > 20 ppm, sensitivity and specificity were 61.7% and 86.0%. In patients with abdominal surgery history, pooled GBT sensitivity and specificity gave the impression of having a better performance (81.7% and 78.8%) compared to subjects without any SIBO predisposing condition (sensitivity = 40.6% and specificity = 84.0%).
GBT seems to work better than LBT. A cut-off of delta H2 expired other than and lower than > 20 ppm shows a slightly better result than > 20 ppm. BTs demonstrate the best effectiveness in patients with surgical reconstructions of gastrointestinal tract.
Small intestinal bacterial overgrowth (SIBO) is a disease characterized by an increased concentration of bacteria in the small bowel.1 In healthy subjects, less than 103 organisms/mL are found in the upper small intestine, and the majority of these are Gram-positive organisms. In addition to the absolute number of organisms, the type of microbial flora seems to play an important role in the appearance of signs and symptoms.2 Gram-negative bacteria may produce toxins that damage the intestinal mucosa, inhibiting the absorptive function.3
SIBO develops when the normal mechanisms that control the growth of enteric bacteria are compromised. Several processes predispose to bacterial overgrowth such as anatomical/structural changes of the small intestine (previous gastrointestinal surgery), motility disorders (such as gastroparesis), metabolic disorders (gastric hypochlorhydria and diabetes), organ system dysfunctions (cirrhosis, renal failure, chronic pancreatitis, Crohn’s disease, and celiac disease), medications (prolonged use of proton pump inhibitors and antibiotics), and irritable bowel syndrome (IBS).4–10 The most common symptoms are diarrhea, abdominal pain and bloating, but weight loss, malnutrition, and deficiency of vitamins (B12, D, A, and E) and minerals (iron and calcium) are possible.1
The flora of SIBO patients is mainly characterized by the prevalence of coliform bacteria and anaerobes, which cause fermentation of carbohydrates, compete with vitamin and micronutrient absorption and engender microscopic mucosal inflammation, thus leading to the above described symptoms.1,11,12 Most experts suggest that jejunal aspirate culture (with a bacterial colony count ≥ 105 colony-forming units [CFU]/mL) is the gold standard for the diagnosis of SIBO.1,13 However, culture has several drawbacks, the most important one being the invasiveness of the procedure. Consequently, other non-invasive tests have been advocated for the diagnosis of SIBO. Hydrogen breath tests (H2BT) have gained growing consensus for this purpose, and the most commonly employed in clinical practice, as well as in the literature, are the lactulose breath test (LBT) and the glucose breath test (GBT).1,13
Nevertheless, divergent opinions and problems of parameter standardization still represent controversial aspects. Although the Rome Consensus Conference on Hydrogen Breath Tests endorsed the use of GBT over LBT for SIBO diagnosis,14 LBT is still used in clinical practice. The main reason for prefer LBT to GBT is based on the presumption that GBT is unable to detect the microbiota in the distal SIBO since glucose is rapidly absorbed in the proximal small bowel.15 Conversely, lactulose is a non-absorbable sugar, which passes through the entire small bowel, and could be more appropriate for the distal SIBO diagnosis. However, LBT results are often affected by gut motility, especially in patients with diarrhea, thus hampering its widespread use.16
On these bases, we performed a systematic review with meta-analysis to investigate the diagnostic yield (sensitivity and specificity) of LBT and GBT in comparison to the recognized gold standard, ie, jejunal culture. This was the first meta-analysis on the topic, to the best of our knowledge.
Methods of analysis and inclusion criteria were based on “Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA)” recommendations,17 and its extension for diagnostic test accuracy (PRISMA-DTA) was taken into account.18 A PRISMA-DTA checklist is provided in the Supplementary Table 1. We excluded review articles, experimental in vitro studies and single case reports. In cases of studies analyzing overlapping periods from the same registry/database, we considered only the studies that examined the longest period and the largest number of patients.
A literature search was performed and updated in January 31st 2019. Relevant publications were identified by a research in PubMed, Web of Science, and Scopus. Only in extenso papers were selected, therefore abstracts or conference proceedings were excluded. The search terms were small intestinal bacterial overgrowth, SIBO, breath test, lactulose, and glucose. We used the following string, using Boolean operators AND/OR: (Small intestinal bacterial overgrowth OR SIBO) AND (culture OR breath test OR lactulose OR glucose). We selected only studies in which a breath test (glucose or lactulose) was compared to jejunal aspirate culture in the same group of patients. Titles and abstracts of papers were screened by 2 reviewers (G.L. and E.I.). Successively, data were extracted from the relevant studies by one reviewer and checked by a second reviewer, and thus inserted into dedicated tables. A third reviewer (F.P.) came to a decision on any disagreements.
Reviewers independently extracted from each paper the following data: (1) publication year; (2) country; (3) single- or multi-center study; (4) study design; (5) number of patients included; (6) patients’ age, sex, and main characteristics/symptoms; (7) cutoffs and protocols used for culture and BT; and (8) number of true positive/negative and false positive/negative results. If the study did not provide sufficient data to extract true positive/negative and false positive/negative outcomes, it was excluded from the final analysis.
The end-point was to estimate the pooled weighted sensitivity, specificity, likelihood ratio for positive and negative tests (PLR and NLR, respectively), and diagnostic odd ratio (DOR) of GBT and LBT in comparison to culture. Summary receiver operating characteristic (SROC) curves were drawn and pooled areas under the curve (AUC) were calculated. A random effect model was followed in all analyses. Indeed, according to the Cochrane handbook Chapter 9.5.4, the choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity and one model (most often, the random-effects model) should be chosen a priori for all analyses.33 We assessed heterogeneity using the χ2 test and, if statistically significant, the I2 statistic was computed. If necessary, a subgroup analysis was performed. The data were expressed as proportions/percentages, and 95% confidence intervals (CI) were calculated. A
Two reviewers (G.L. and G.Le.) independently assessed the quality of the included studies using the Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) instrument.36 This tool is designed to assess the quality of primary diagnostic accuracy studies for inclusion in the systematic review. Publication bias was conducted by the Deeks’ funnel plot asymmetry test, with
Fourteen studies, listed in Table 119–32 were selected out of 2123 articles found after the literature search. Further details about the process of article selection are reported in the PRISMA flowchart in Figure 1. The PRISMA-DTA checklist is provided as Supplementary Table 1. In all studies, GBT was used and, in 4 of them, both LBT and GBT were considered.20,23,24,27 In all studies but one29 an adult population was recruited. A total of 757 subjects were selected across all studies.
The quality of studies is reported in Supplementary Table 2. All studies achieved a good score, except for the “reference standard” regarding its applicability concern domain. Indeed, it could be expected that a routine jejunal culture cannot be always applied, being an invasive test, even for ethical reasons.
In all the 14 studies,19–32 GBT was used in comparison to jejunal culture. In total, 668 patients were considered. The pooled sensitivity was 54.5% (95% CI, 48.20–60.70), and heterogeneity was present (χ2 = 61.52,
The Deeks’ funnel plot for this analysis, reported in Supplementary Figure 1, showed a symmetrical distribution of studies, confirmed by a corresponding test for the slope, with
A comparison between LBT and jejunal culture was possible only in 4 studies,20,23,24,27 enrolling 214 patients overall. The pooled sensitivity was 42.0% (95% CI, 31.6–53.0), with presence of heterogeneity (χ2 = 12.34,
We did not detect publication bias for this analysis, since the test for funnel plot (Supplementary Fig. 2) provided a
A sample size sufficient to perform sub-analysis was possible only for GBT. Indeed, only 4 studies were focused on LBT, so this sub-analysis would have low statistical power with such a small number of articles.
The sub-category of patients who had undergone previous gastrointestinal surgery was considered in 6 studies.20,26,28,30,31 The most common surgical procedures were partial gastrectomy and colectomy. We were able to extract sufficient data for sub-group meta-analysis from only 3 studies (93 subjects in total). Pooled sensitivity and specificity were 81.7% and 78.8%, respectively. PLR was 3.23 and NLR was 0.20; overall DOR was 18.58. Finally, AUC was 0.86 ± 0.09. Further details of this sub-analysis are reported in Table 2.
In the group of patients without any predisposing conditions, we included subjects in which factors predisposing to SIBO such as abdominal surgery, celiac disease, connective tissue disorders, and others as listed in.4–10 In 6 studies the sub-category of patients without any predisposing conditions was considered.21,22,24,28,30,32 Three hundred and forty patients were enrolled in this subgroup, which was constituted mainly by healthy controls, elderly patients with diarrhea, and IBS patients. Pooled sensitivity, specificity, PLR, and NLR were 40.6%, 84.0%, 1.64, and 0.84, respectively. Pooled DOR was 2.32 and AUC was 0.59 ± 0.14. If we selected only asymptomatic healthy controls, only 30 subject could be recruited, providing a pooled sensitivity of 9.1%, a specificity of 66.9%, PLR = 0.86, NLR = 1.02, and DOR= 0.01. Further details of these sub-analyses are reported in Table 2.
We did not find publication bias in any of the sub-analyses, since the test for Deeks’ funnel plot was not statistically significant, as reported in Table 2.
Only in 2 studies a value of > 103 CFU/mL was chosen.22,30 The pooled sensitivity and specificity were 40.7% and 84.0% respectively. Additionally we found PLR = 2.54, NLR = 0.71, and DOR = 3.59. It was not possible to draw a SROC only for 2 studies.
A cut off > 105 CFU/mL was reported in most of the studies and provided an overall sensitivity of 55.3% and a specificity of 83.9%. PLR, NLR, and DOR were respectively 2.61, 0.59, and 5.88. AUC was 0.77 ± 0.08 (Table 2).
Three studies used a value > 106 CFU/mL.19,20,27 In this case we found a pooled sensitivity and specificity of 62.5% and 77.4%, respectively. PLR was 2.74, NLR was 0.54, and DOR was 5.35. AUC was 0.37 ± 0.14.
A delta value > 20 ppm was used in 7 studies, with 333 patients,19,21,22,25,27,30,32 while a value other than and lower than > 20 ppm (> 10, > 12, or > 15) was used in 7 papers (306 patients).20,23,24,26,28,29,31
A cut off > 20 ppm provided a pooled sensitivity of 47.3% and specificity of 80.9%. Overall PLR, NLR, and DOR were 1.95, 0.66, and 3.35, respectively. AUC of SROC was 0.70 ± 0.23.
On the other hand, a cut off other than and lower than > 20 ppm showed a pooled sensitivity and specificity of 61.7% and 86.0%, respectively. Pooled PLR was 3.2, NLR was 0.54, and DOR was 8.11. SROC curve analysis demonstrated an AUC = 0.79 ± 0.07.
Only one study considered the “double peak” as a diagnostic criterion, therefore a sub-analysis was not possible.27 No study considered high basal hydrogen levels as a SIBO diagnostic criterion.
We did not find publication bias in any sub-analysis, since the test for Deeks’ funnel plot was not statistically significant.
Further details about all above mentioned sub-analyses are shown in Table 2.
SIBO is a condition characterized by an abnormal colonization of the small bowel by colonic bacteria. The most commonly used diagnostic tools in clinical practice are H2BTs, however their validation, in comparison to jejunal aspirate culture has shown several pitfalls. Only one systematic review investigated this topic, showing that LBT exhibited a sensitivity ranging from 31% to 68% and a specificity of 44–100%, while GBT showed a range of sensitivity of 20–93% and a specificity of 30–86% across cited studies.38 However, a quantitative examination with a pooled analysis of such data was lacking until now. Therefore, to the best of our knowledge, the present study is the first meta-analysis regarding the performance of H2BT for SIBO diagnosis.
Our first aim is to ascertain whether GBT and LBT could give similar results in diagnostic accuracy. Indeed, despite the Rome consensus that discouraged the use of LBT,14 some opinions disagree because LBT could be more effective than GBT in the case of “distal SIBO.” Overall, our results clearly showed that GBT had higher sensitivity and specificity than LBT, and even a better AUC (0.74 versus 0.56). Therefore, our results seem to confirm the statement of the Rome consensus. The poorer performance of LBT may have different explanations. First, the results of LBT are highly influenced by the bowel transit time.39 Therefore, since lactulose is not absorbed in the small bowel, it might rapidly pass into the colon where it could be degraded by colonic bacteria with rapid hydrogen production,40 thus leading to false positivity. Additionally, patients with fast transit will display an early peak that may be misinterpreted as SIBO. It is possible that these factors could have affected our finding of low LBT specificity (70.6%) when compared to that of GBT (83.2%). False negative H2BT results may even be found in non-hydrogen producers: in this case, additional measurement of methane in the breath may improve the diagnosis.41 However, despite 3 among 14 that estimated methane,22,30,32 only one study expressed results in a way satisfactory to extract data, therefore a meta-analysis was not possible. We believe that this is a relevant limitation of our analysis, since we were not able to assess the impact of methane detection on BT performance.
Recently, the North American consensus on BT established that a rise of > 20 ppm from baseline should be the optimal cut off for GBT.42 In our analysis, 7 studies used this cutoff, providing an overall sensitivity of 47.3% and a specificity of 80.9%, with AUC = 0.70. However, we demonstrated that the performance of GBT was slightly superior, with a sensitivity and specificity of 61.7% and 86.0%, respectively, and AUC = 0.79, when a lower cut off (delta other than and lower than > 20 ppm) was established. Even the DOR was much higher (8.11 versus 3.35) in this case. Therefore, our results suggest that a cut off value other than and lower than > 20 ppm as suggested by the Rome consensus may provide better sensitivity and specificity than the cut off value of > 20 ppm proposed by North American Consensus.
Another strongly debated point is linked to the culture of jejunal aspirate. So far, the gold standard test for SIBO diagnosis has been the culture of jejunal aspirate. It is obtained by means of patient intubation and aspiration at multiple intestinal sites, more rarely during enteroscopy. The amount of liquid, the site of collection (traditionally beyond the ligament of Treitz) and the technical details of the microbiological tests (for both aerobic and anaerobic bacteria), as well as the cut-off value for definition of SIBO are not yet standardized, although many studies use a value of > 105 CFU/mL. Indeed, even if it has been considered so far as the gold standard for SIBO diagnosis, recent evidence and opinions have questioned this statement. First, the cut off value of > 105 CFU/ mL, which was the most widespread in literature, is no longer considered reliable, and the North American consensus has proposed a level of > 103 CFU/mL. In our systematic review, only 2 studies employed this cut off; as the specificity (84.0%) was high and comparable to other cut-offs, the sensitivity was very low (40.7%), but the scanty number of patients enrolled in this sub-analysis (only 197) is a limit for the discussion of this result. Most of studies used a value of 105 CFU/mL and 3 reports used a value of 106 CFU/ mL. As reported in more detail in Table 2, a value of 105 CFU/ mL provided a slightly higher sensitivity, but lower specificity, while the DOR, PLR, and NLR were comparable. Therefore, it seems that these 2 cut off values could be interchangeable but further studies using the 103 CFU/mL value are necessary to ascertain the suitability of the North American consensus recommendation. On the other hand, the clinical practice of jejunal culture has several drawbacks. First, it is an invasive procedure that requires a catheter to be placed by endoscopic or fluoroscopic guide. Therefore, patient compliance for this cumbersome procedure aimed to diagnose a hypothetical benign disease is presumably very low. Moreover, it is necessary to use a sterilized catheter and to avoid contact with the oral mucosa, which could cause contamination by oral flora. The limitation of culture for SIBO has been highlighted by Sundin et al,32 who showed that cultures may frequently underestimate the real bacterial load and that the correlation between microbiological results and BT results is weak. Moreover, metagenomic studies have shown that jejunal culture could not detect up to 80.0% of bacterial species, since there are some strains that are not culturable.43
Another interesting finding of our work was that, in patients with surgical reconstruction of the gastrointestinal tract, sensitivity was higher (81.7% versus 40.6%) than in subjects without a predisposing condition. Even the AUC was much better (0.86 versus 0.59), meaning an overall better performance of GBT in this group of patients. A possible explanation is that the prevalence of SIBO may reach the 61.6% in operated patients who are likely to suffer from leakage of anatomical or chemical mechanisms hampering the overgrowth of bacteria in the small bowel.44,45 For instance, in gastrectomy patients, reduction of acid secretion is a predisposing factor. Additionally, the lack of a “gastric brake” may lead to early stomach emptying, which is the basis of the dumping syndrome as well. Therefore, if the small bowel is rapidly filled up by a hyperosmotic fluid, the excessive sugar content will be degraded by bowel flora, thus leading to bacterial overgrowth. In this perspective, it has been observed that SIBO might be one of the most important causes of post-gastrectomy syndrome associated with intestinal symptoms and late hypoglycemia, and that gas-related symptoms (bloating and abdominal fullness) occur simultaneously with GBT hydrogen peak, suggesting that they could originate from small intestinal bacteria.46 Similarly, patients with multiple ileal resections or ileo-cecal valve resection have an increased SIBO risk because of the absence of the ileo-cecal valve, which is a mechanical structure hindering the reflux of colonic bacteria into the small bowel.47 Furthermore, we found a quite low BT performance in patients without predisposing conditions. In all the studies we selected, this group of patients was mainly represented by IBS patients. In this regard, despite various meta-analyses showing an increased risk of SIBO in IBS,7 recent evidence has shown that BTs have low sensitivity and specificity (around 60.0%) to diagnose SIBO in IBS patients, which is in agreement with our results.48,49
Some limitations of our study need to be mentioned. For example, the difference in protocols and glucose dose may be an important source of heterogeneity among studies that could not be solved by sub-analysis grouping. Furthermore, sub-analysis often led to grouping of a small number of studies, for instance the analysis of patients with surgical interventions enclosed only 3 studies. Additionally, a head-to-head comparison of GBT and LBT is lacking, and this may be an obstacle to conclude whether GBT is really superior to LBT, despite the results of our analysis. Finally, it is important to be aware that when the number of studies is low, the power of the test for publication bias is low, an event that took place in several subgroup analyses: this underlines the need for additional studies to solve this drawback.
In conclusion, our meta-analysis suggests that GBT has high values of diagnostic yield, more than LBT and, therefore, it should be preferred as suggested by guidelines.14,42 However, few studies have proposed a head-to-head comparison of these 2 tests,20,23,24 therefore this direct comparison calls for further investigations and this represents an additional limitation. Additionally, a GBT cut off value with a delta other than > 20 ppm seems to work better, in contrast to that proposed by the North American consensus, and this will be relevant food for thought in future investigations. Finally, the microbiological diagnosis of SIBO is still an unanswered question: a quantitative cut-off is not sufficient, while a qualitative analysis of the microbiota would be preferable, because SIBO is a dynamic syndrome with many differences due to several distinct triggering pathological conditions.50,51 For such reasons, BT52 remain the mainstay for the non-invasive diagnosis of SIBO even if some limitations should be taken into account. Noteworthy, the performance of BT is optimal only when performed in selected populations by using the best available protocol.53,54
We would like to thank Maria Benedetta Lorusso for linguistic revision and Vito Bellomo for graphical help.
Note: To access the supplementary tables and figures mentioned in this article, visit the online version of
![]() |
![]() |