Study aim, design, and setting
Safety and efficacy of infant formula with L. reuteri DSM 17938 and 2’FL was evaluated in a double-blind RCT of healthy term infants conducted between March 2017 and May 2019 in five centers in Belgium (KidZ Health Castle, Brussels; Clinique et Maternité Sainte-Elisabeth, Namur, Algemeen Stedelijk Ziekenhuis Aalst, Aalst; Centre Hospitalier de Wallonie Picarde, Tournai; Jessa Hospital, Hasselt) and two in Italy (University of Palermo, Palermo; University of Milan, Milan). Two randomized formula-fed groups and a non-randomized breastfed reference group (BF) were included. Formula-fed infants at ≤ 14 days of age were randomized to either the experimental group (EG) or control group (CG) in a 1:1 allocation ratio, stratified by center, sex, and mode of delivery (Cesarean- or vaginal-born). Randomization was completed using Medidata Balance with the dynamic allocation algorithm. Investigators, study staff, and parents/caregivers (hereafter, “parents”) were blinded to the study formulas. Formulas were coded by the manufacturer (Nestle Product Technology Center, Konolfingen, Switzerland) using three meaningless codes within each formula group (i.e., a total of 6 meaningless codes).
CG received a standard bovine milk-based whey predominant formula with an energy density of 670 kcal/L containing 75 g/L lactose, 34 g/L fat, 14 g/L protein (60:40 whey:casein ratio), and L. reuteri DSM 17938 at 1 × 107 CFU/g (translating to approximately 1 × 109 CFU/day in 4 months old infants). EG received the same formula supplemented with 1 g/L 2’FL which is within the range of 2'FL concentrations generally observed in breastmilk [7,8,9,10,11]. The intervention period was approximately 180 days. Formula-fed infants were required to exclusively consume the study formula until at least 4 months of age, after which progressive introduction of complementary foods or liquids was allowed. Mothers of infants enrolled in BF were asked to continue exclusive breastfeeding up to at least 4 months.
Infants completed study visits at baseline (≤ 14 days of age) and then at 1, 2, 3, 4, and 6 months of age. At baseline, demographic characteristics were collected and parents completed a 1-day GI symptom diary to retrospectively document stool characteristics, GI tolerance and associated behaviors, and formula intake for the day before the baseline visit. At each visit, anthropometrics (weight, length, and head circumference) were collected. Parents completed a 3-day GI symptom diary at home to prospectively document stool characteristics, GI tolerance and associated behaviors as well as feeding information for the 3 consecutive days prior to each post-baseline visits. Fecal samples were collected at baseline (before the intervention started) and at 1, 2, and 3 months of age. Parent-reported and physician-confirmed adverse events (AEs) were recorded throughout the study.
Participant characteristics
Healthy, term (37–42 weeks gestation) infants aged ≤ 14 days at enrollment with a birth weight between 2500 and 4500 g whose parents signed the informed consent form were included in the study. At the time of enrollment, infants in the formula groups and BF were required to be exclusively formula-fed or breastfed, respectively. Infants who received complementary foods or liquids, participated in other clinical trials, had a medical condition that could increase risks associated with study participation or interfere with interpretation of results (such as major congenital malformations, congenital infections, history of neonatal intensive care unit admission, or other severe medical or laboratory abnormality), and infants who received medications prior to enrollment that may affect study results (growth, fat digestion, absorption, and/or metabolism; stool characteristics and gastric acid secretion) were excluded.
Outcome measures
The primary outcome was weight gain between baseline and age 4 months in the formula-fed infants as recommended in guidelines from the American Academy of Pediatrics Task Force on Clinical Testing of Infant Formulas [30]. Weight gain (g/day) was calculated as the difference in infant weight between the baseline and 4 months visits divided by the number of days between the two visits. Secondary outcomes included anthropometric z-scores, stool characteristics, GI tolerance and associated behaviors, fecal microbiota, fecal metabolism, fecal markers of gut immunity and gut health, and parent-reported and physician-confirmed AEs.
Infant weight was measured without clothing or diaper on a calibrated electronic scale to the nearest 10 g. Recumbent length was recorded using a calibrated length board to the nearest 0.1 cm, and head circumference using a standard non-elastic measuring tape to the nearest 0.1 cm. Corresponding z-scores for weight-for-age, length-for-age, weight-for-length, and head circumference-for-age were calculated using the World Health Organization (WHO) Child Growth Standards [31]. Stool characteristics and GI tolerance and associated behaviors were captured in the study diaries. Stool frequency was assessed as the number of stools per day, difficulty in passing stool as number of stools difficult to pass per day, and stool consistency was recorded using a validated 5-point scale (1 = watery, 2 = runny, 3 = mushy soft, 4 = formed, 5 = hard) provided in a pictorial representation to parents [32]. The frequency of spitting-up/vomiting and flatulence episodes per day were recorded on a categorical scale (1 time; 2–3 times; 4–6 times; > 7 times). Categorical scales were also used to assess the daily durations of crying or fussing (< 10 min; 10–30 min; > 30 min to 1 h; > 1–2 h; > 2–3 h > 3 h), sleeping (0–8 8–12, 12–16, 16–20, 20–24 h) or severity of spitting-up/vomiting (1 teaspoon or less; 1 tablespoon; 2 tablespoons; about half of the feeding; more than half of the feeding). AEs were recorded during clinic visits and phone calls in between the visits throughout the study, and 14 days after completion of the study feeding. All parent-reported and physician-confirmed AEs were categorized using the Medical Dictionary for Regulatory Activities (MedDRA) preferred terms.
Fecal DNA extraction and 16S rRNA gene sequencing
DNA isolation, including vigorous bead-beating steps, was performed as described previously [33]. Barcoded amplicons from the V3–V4 region of 16S rRNA genes were generated using a 2-step polymerase chain reaction (PCR) and according to previously described methods [33]. For the library PCR step in combination with sample-specific barcoded primers, purified PCR products were shipped to BaseClear BV (Leiden, The Netherlands). PCR products were checked on a Bioanalyzer (Agilent) and quantified. This was followed by multiplexing, clustering and sequencing on an Illumina MiSeq with the paired-end (2x) 300 bp protocol and indexing. The sequencing run was analyzed with the Illumina CASAVA pipeline (v1.8.3) with de-multiplexing based on sample-specific barcodes. Sequence reads of too low quality (only “passing filter” reads were selected) and reads containing adaptor sequences or PhiX control were discarded from the raw sequencing data. On the remaining reads, a quality assessment was performed using FastQC version 0.10.0. (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Sequences of the 16S rRNA gene were analyzed using a workflow based on Qiime 1.8 [34]. On average, 29,570 (range 3,308 – 148,882) 16S rRNA gene sequences per sample were analyzed. We performed operational taxonomic unit (OTU) clustering (open reference), taxonomic assignment and reference alignment with the pick_open_reference_otus.py workflow script of Qiime, using uclust as clustering method (97% identity) and GreenGenes v13.8 as reference database for taxonomic assignment. Reference-based chimera removal was done with Uchime [35]. The RDP classifier version 2.2 was performed for taxonomic classification [36].
Pathogenic bacteria species by quantitative PCR (qPCR)
Detection and quantification of selected genes of opportunistic pathogenic bacteria species was done with isolated DNA from fecal samples using validated commercial Genesig® qPCR kits from Primerdesign Ltd™ (Klebsiella (K.). pneumonia, Salmonella species, Campylobacter (C.) jejuni, and C. coli) or based on previously described methods (Clostridioides (C.) difficile [37], Clostridium (C.) perfringens [38], Enteropathogenic Escherichia coli (EPEC), Enterotoxigenic Escherichia coli (ETEC) heat-labile toxin and ETEC heat-stable toxin [39]).
Fecal pH, organic acid and biomarker analysis
Fecal pH was assessed using an electrode-fitted pH meter after suspending 0.5 g (fresh weight) of fecal sample in 2 mL milliQ water. Organic acids (lactate, acetate, propionate, butyrate, isobutyrate, valerate, isovalerate) were determined by high performance anion‐exchange chromatography with UV and refractive index detection according to a modified and previously described method [40].
Commercially available ELISA kits were used to analyze fecal markers of intestinal immunity and health including secretory immunoglobulin A (sIgA), myeloperoxidase, calprotectin, human beta defensin (all Immundiagnostik AG, Bensheim, Germany) and neopterin (IBL, Hamburg, Germany).
Statistics
Sample size was calculated considering the primary (weight gain) and key secondary endpoints (Bifidobacterium and Peptostreptococcaceae abundance) using a hierarchical approach to control for multiplicity. A non-inferiority margin of -3 g/day was used to demonstrate non-inferiority in weight gain according to guidelines from the American Academy of Pediatrics [30] and a SD of 6.0 was assumed [18, 41]. Based on a previous study [19], superior Bifidobacterium abundance in EG vs. CG was assumed at age 3 months (difference of 0.48 in the logit of the proportion of Bifidobacterium with a SD of 1.06). Similarly, inferior abundance in EG vs. CG was assumed at age 3 months for Peptostreptococcaceae, a family to which opportunistic pathogens, such as C. difficile belong [42] (difference of -0.55 in the logit of the proportion of Peptostreptococcaceae with a SD of 1.15). The smallest effect to be demonstrated was inferiority of Peptostreptococcaceae, requiring a sample size of 210 formula-fed infants (105/formula group) to reach a power of 80% at α = 0.05. An a priori power calculation indicated 95% power to detect non-inferior weight gain and 86% power to detect superiority of Bifidobacterium with 105 completed infants per formula-fed group. Assuming 35% loss to follow-up, approximately 280 formula-fed infants were enrolled. Sample size of BF (n = 60) was not based on statistical consideration but instead determined by practical and logistical feasibility.
The primary endpoint of weight gain between baseline and 4 months of age was analyzed using analysis of covariance (ANCOVA) adjusted for baseline weight, sex, mode of delivery, and study center. Non-inferiority was determined if the lower bound of the 95% CIs for the intervention difference was above -3 g/day. The primary endpoint was analyzed in the full-analysis set (FAS) and per-protocol (PP) populations. The FAS population included all formula-fed infants randomly assigned to CG or EG who took at least one feeding of the assigned formula and who had weight measurements available at baseline and age 4 months. The PP population consisted of all infants included in the FAS that were compliant with the feeding regimen on ≥ 80% of the days from baseline until age 4 months. A compliant day was defined as a day on which only the study formula was exclusively fed (i.e. no other formulas, breastmilk, complementary foods or liquids, such as water or tea).
Secondary endpoints were analyzed in the intention-to-treat (ITT) population except for the gut microbiota or AEs. The ITT population was defined as all infants randomly assigned to EG or CG, or infants enrolled in BF, independently from the actual feeding. Gut microbiota data (16S rRNA) was analyzed in the infants who provided stool samples and were compliant with the study feeding regimen on ≥ 80% of the days until the study visit at age 3 months. AEs were analyzed in the safety analysis set which included all randomized formula-fed infants or enrolled BF infants with at least one documented feeding of the randomly assigned study formula or breastmilk, respectively. A robust ANCOVA adjusted for baseline value, mode of delivery, sex, study center, and visit was used to compare the changes from baseline in the anthropometric z-scores between the feeding regimens. A Mixed Model Repeated Measures (MMRM) adjusted for the same variables as the ANCOVA was used to compare stool consistency and frequency, spitting-up/vomiting and flatulence episodes, crying and sleep duration. Difficulty in passing stool (as number of infants having at least one stool difficult to pass over the 3-day collection period) and dichotomized severity of spitting-up/vomiting and duration of fussiness were analyzed using a logistic regression model adjusted for the aforementioned variables. A scoring approach was used for outcomes for which data was collected on a categorical scale and scores were compared between the feeding groups. Incidence of AEs and use of concomitant medications were compared between formula groups using Fisher’s exact test.
qPCR targets were analyzed using log-transformed data in a MMRM adjusted for baseline concentration, sex, mode of delivery, antibiotic use, study center, and visit. Fecal pH, acetate, butyrate, lactate, and propionate were analyzed using log-transformed data in a linear mixed model adjusted for the same variables as the qPCR targets. Due to the low number of infants with detectable concentration of valerate, isovalerate and isobutyrate, odds ratios of the presence of these fecal organic acids were calculated using a logistic regression adjusted for the aforementioned variables. Fecal biomarkers were evaluated using log-transformed data in a linear mixed model adjusted for baseline concentration, sex, mode of delivery, study center and visit. All aforementioned analyses were conducted using R version 3.2.3 (2015–12-10).
Statistical tests for the 16S rRNA gene sequences were performed as implemented in SciPy (http://www.scipy.org/), downstream of the Qiime-based workflow. We tested for between-group differences per time point in alpha phylogenetic diversity (Faith’s index, PD_whole tree metric) with the non-parametric Kruskal–Wallis test and Dunn’s posthoc test, as implemented in Graphpad Prism 5.01 (San Diego, CA, USA). Beta diversity (weighted UniFrac; for each infant in a group the average distance to all infants in another group was calculated) per time point was compared with the non-parametric Mann–Whitney U test (one-tailed), as implemented in Graphpad Prism 5.01 (San Diego, CA, USA). Between group-differences of pre-selected single taxa of importance in the studied age range, were assessed per time point using the non-parametric Kruskal–Wallis test with Dunn’s posthoc test.
We performed multivariate redundancy analyses (RDA) on the gut microbiota composition as assessed by 16S rRNA gene sequencing in Canoco version 5.12 using default settings of the analysis type “Constrained” [43]. Relative abundance values of OTUs were used as response data, and metadata as explanatory variable. For visualization purposes, families or genera, rather than OTUs, were plotted as supplementary variables. Variation explained by the explanatory variables corresponds to the classical coefficient of determination (R2) and was adjusted for degrees of freedom (for explanatory variables) and the number of cases. Canoco determines RDA significance by permutating (Monte Carlo) the sample status. Per time point and sample set, confounding factors were first identified by RDA. Statistically significant confounders were included as covariates in subsequent analyses. Hence, partial RDA was employed to correct for covariance where relevant, covariates were first fitted by regression and then “partialled out” (removed) from the ordination. All tests were performed using a significance level of 5% with a two-sided p-value (except for weighted UniFrac analysis for which a one-sided p-value was used).