Application of ordinal logistic regression analysis in determining risk factors of child malnutrition in Bangladesh

Background The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (≥-2.0). Since nutrition status is ordinal, an OLR model-proportional odds model (POM) can be developed instead of two separate BLR models to find predictors of both malnutrition and severe malnutrition if the proportional odds assumption satisfies. The assumption is satisfied with low p-value (0.144) due to violation of the assumption for one co-variate. So partial proportional odds model (PPOM) and two BLR models have also been developed to check the applicability of the OLR model. Graphical test has also been adopted for checking the proportional odds assumption. Results All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. Conclusion These findings clearly justify that OLR models (POM and PPOM) are appropriate to find predictors of malnutrition instead of BLR models.


Background
Malnutrition is one of the most important causes for improper physical and mental development of children. Child malnutrition still remains a public health problem in developing countries like Bangladesh [1,2]. It is an underlying cause of child morbidity and mortality. Twothirds of childhood deaths occurred due to malnutrition in Bangladesh [3]. From Bangladesh Demographic and Health Survey (BDHS) 2007, it is investigated that 43% children are stunted, and 41% are underweight in Bangladesh [4]. According to WHO, these levels of stunting and underweight are above the threshold of "very high" prevalence [5]. The level of wasting (17%) also shows that children in Bangladesh were in "serious severity" [4,5]. Using BDHS 2004 data, a study observed that nearly three fifths children were malnourished-either stunted, wasted or underweight [6]. The identification of factors for child malnutrition is still the interest of many researchers. Various methods are applied to uncover the factors of child malnutrition. Among them logistic regression analysis has got most preference in previous studies [7][8][9][10]. In most of the studies, the response variable was considered as binary (nourished and undernourished); consequently the binary logistic regression model was applied in all the cases. However, the nutrition status of a child is usually classified as nourished, moderately malnourished and severely malnourished. When the researchers are interested to find the determinants of malnutrition and severe malnutrition, two separate binary logistic regression (BLR) models are required to develop by grouping the response variable into two categories [7]. This task is tedious and cumbersome due to estimation and interpretation of more parameters. However, the researcher may consider the response variable as ordinal and may apply ordinal logistic regression model for the same purpose. A few studies have been done using ordinal logistic regression model (OLR) to identify the predictors of child undernutrition [11]. In many epidemiological and medical studies, OLR model is frequently used when the response variable is ordinal in nature [12][13][14][15][16][17]. The study has made an effort to identify the predictors of child malnutrition as well as severe malnutrition for under five Bangladeshi children by developing an ordinal logistic regression model.

Ordinal Logistic Regression Model
There are several occasions when the outcome variable is polychotomous. Such outcome variable can be classified into two categories-multinomial and ordinal. While the dependent variable is classified according to their order of magnitude, one cannot use the multinomial logistic regression model. A number of logistic regression models have been developed for analyzing ordinal response variables [12,[18][19][20][21][22][23][24]. Moreover, when there is a need to take several factors into consideration, special multivariate analysis for ordinal data is the natural alternative. There are various approaches, such as the use of mixed models or another class of models, probit for example, but the ordinal logistic regression models have been widely used in most of the previous research works [18,19,[25][26][27][28][29][30][31][32][33]. There are several ordinal logistic regression models such as proportional odds model (POM), two versions of the partial proportional odds model-without restrictions (PPOM-UR) and with restrictions (PPOM-R), continuous ratio model (CRM), and stereotype model (SM). The most frequently used ordinal logistic regression model in practice is the constrained cumulative logit model called the proportional odds model [18,[33][34][35].
The POM is the most widely used in epidemiological and biomedical applications but POM leads to strong assumptions that may lead to incorrect interpretations if the assumptions are violated [28]. If the data fail to satisfy the proportional odds assumption, a valid solution is fitting a partial proportional odds model [36]. Another simple and valid approach to analyze the data is to dichotomize the ordinal response variable by means of several cut-off points and use separate binary logistic regression models for each dichotomous response variable [37]. However, Gameroff suggested that the second procedure should be avoided if possible because of the loss in statistical power and the reduced generality of the analytical solution [17].

Data and Variables
The study has utilized the nationwide data of BDHS 2004 where completed and plausible anthropometric data were available for 6005 (weighted) children [38]. Weight-for-age anthropometric index is an excellent overall indicator of a population's nutritional health status. Moreover, weight-for-age is a composite index of weight-for-height and height-for-age [4]. So the study considered only weight-for-age anthropometric index instead of weight-for-height and height-for-age to measure the children nutrition status. Child nutrition status was categorized into three groups-severely undernourished (< -3.0 Z-score), moderately undernourished (-3.0 to -2.01 Z-score) and nourished (≥-2.0 Z-Score). Thus nutrition status is an ordinal response variable grouped from a continuous variable.
Several socio-economic and demographic characteristics, maternal health and nutritional information, and incidence of child diseases are considered as the independent variables to develop the POM, PPOM, and separate BLR models. Mathematical forms of the models with some indication of application are shown in Table 1. Age of children, birth interval, mothers' educational status, household wealth status, child feeding status, mothers' antenatal-postnatal care status, incidence of diarrhoea, ARI, and fever are considered as the independent variables in the study. These independent variables were found significant predictors of child undernutrition in several previous studies [7][8][9][10][39][40][41][42][43]. Household wealth status is evaluated from household wealth index which is constructed by NIPORT et al. [38]. Child feeding status and mothers' antenatal-postnatal care status are evaluated by constructing child feeding index and antenatalpostnatal care index respectively. Both the indices are constructed according to previous studies by Das et al. [9,10]. Construction procedure is not shown in this paper.
The authority of DHS maintains all kinds of ethical standards and procedures for the survey and also takes informed consent from the survey respondents before the data collection. In addition, we have obtained approval from the DHS to use the data through the website of DHS. So no ethical approval is needed for the study from any other institutions.

Model Fitting
Since the response variable "nutrition status" is ordinal in nature (grouped from continuous variable-weight-forage anthropometric index), at first POM was formed without a careful assessment of the model adequacy. The chi-squared score test for the proportional odds assumption [18,36] was employed to see whether the main model assumption was violated or not. As the score test is often anticonservative (i.e., the resulting Pvalues are far too small) [13,24,36], we use other techniques to investigate the proportional odds assumption. We calculated single score tests for each covariate for checking whether proportional odds assumption is violated [24]. Graphical method has also been employed for checking the parallel slope assumptions for all covariables. In addition, separate binary logistic regression analyses have been conducted as a basis for more careful analysis [26]. We dichotomized the response variable taking account of the ordering by using cumulative probabilities. The response variable is dichotomized as "at least moderate undernutrition" with two categories '0' = no undernutrition & '1' = at least moderate undernutrition and "at least severe undernutrition" with two categories '0' = no undernutrition or moderate undernutrition and '1' = at least severe undernutrition. The overall goodness-of-fit of the separate BLR models was assessed by "Hosmer and Lemeshow test" [33,44,45].
Though POM is suitable for analyzing ordinal variables arising from a continuous variable, the proportional odds assumption is satisfied seldom in practice. When this assumption is violated, a legal alternative is to develop a PPOM which allows some co-variables with proportional odds assumption to be modeled, but for the co-variables failed to perform the proportional odds assumption, it is augmented by a coefficient (γ), which is the effect linked with each j-th cumulative logit, adjusted by the other co-variables [33]. Thus, PPOM releases the constraint of having a common parameter across the response logits for all the predictors considered in the model [17]. Since both PPOM and separate binary logistic regression approaches are based on cumulative logit, the PPOM is directly comparable with separate BLR models [37]. In the same way, the formulation of the logit functions in POM and PPOM are identical (i.e. nourish vs. moderately & severely undernourish; nourish and moderately undernourish vs. severely undernourish), so overall fit of these two models are comparable [28]. So the study compared the results of the separate BLR models with that of the PPOM, and also compared POM with PPOM. The study fitted unrestricted PPOM model. STATA procedure OLOGIT and SPSS procedure PLUM with TPARALLEL option for POM, SPSS procedure LOGISTIC REGRES-SION for separate BLR models [46], STATA procedure GOLOGIT2 with AUTOFIT option for PPOM [47] were employed in the study.

Results
The proportion of undernourished children was 48% with 13% severely undernourished in 2004. Though both the levels reduced in 2007 (43% with 12% severely Originally continuous response variable, subsequently grouped, and valid proportional odds assumption Proportional odds assumption not valid Proportional odds assumption not valid, and linear relationship for odds ratio (OR) between a co-variable and the response variable underweight), the levels are still very high [4]. The prevalence of child malnutrition according to selected background characteristics are shown in  Table 2).
To identify the risk factors of child malnutrition, the study fitted POM, separate BLR models, and PPOM. At first competence of the models are described and then the results of the models are interpreted.

Proportional Odds Model
The results of the multiple POM are given in Table 3. All the considered variables in the POM are found significant. The score test of the proportional odds assumption is found insignificant at 5% level of significance indicating the data satisfy the proportional odds assumption. However, the p-value of the score test is found small (0.144). To confirm the conclusion regarding the assumption of POM, single score tests of the proportional odds assumption for each covariate were conducted. The p-values of the single score tests are shown in the last column of Table 3. The test results reveal that all the variables except age of children (pvalue < 0.005) were found insignificant i.e., satisfy the proportional odds assumption. Without making a final decision we proceed to analyze the data using separate binary logistic regressions for the dichotomized response. Such an analysis is required to assess the correct functional form of the covariates to build models with adequate goodness-of-fit.

Separate Binary Logistic Regressions
Results of two separate binary logistic regression models are shown in Table 4. Hosmer-Lemeshow test for both the models indicate that both the models have no lack of fit (p-value > 0.58). The regression coefficients and odds ratios in the two separate models for all the categories of each of the covariates are found homogeneous. The age of children which fails to satisfy the proportional odds assumption has the significant influence on both the models. However, significance levels varied for some covariates in the two BLR models. In the first BLR model with response variable "at least moderate undernutrition" all the variables are found significant. On the other hand, mothers' antenatal-postnatal care, incidence of ARI and diarrhea are found insignificant in the other BLR model with response variable "at least severe undernutrition". Thus the covariates show satisfactory result with some differences in significance level. Since these regression models do not consider the restriction of ordinal response and consider more parameters, we proceed to construct PPOM, which represents a joint model of the response categories [22], a powerful method based upon maximum likelihood procedures for ordinal response [23].

Partial Proportional Odds Model
The results of default GOLOGIT2 of STATA are similar to the series of binary logistic regressions and can be interpreted in the same way. The main problem with the results of both processes is that they include many more parameters than POM. These methods free all the variables from the parallel-lines constraint, even though the assumption may be violated only by one or a few of them. So the study used AUTOFIT option with GOLO-GIT2 to fit partial proportional odds models, where the parallel-lines constraint is relaxed only for those variables where the assumption was not justified and parallel-lines constraint is considered for the other variables which satisfy the assumption [46]. The results are shown in Table 5 with Wald test of parallel-lines assumption. Global Wald test for the final model indicates that final model does not violate the proportional odds assumption with high p-value: 0.7943. From Table 4 and Table 5 it is clear that only 23 unique β coefficients or odds ratios need to be explained in PPOM compared to the 42 coefficients produced by separate BLR models.
Results of PPOM show that all the covariates have significant influence on the response variable in both comparisons. In addition, the deviance (defined as the difference in the likelihood ratios between POM and PPOM) is chisquare = 15.03 (941.55-926.52) with 2 d.f. , favouring the PPOM as a better fit to the data than POM [28]. The pseudo R 2 of POM (0.1029) and PPOM (0.1046) also reflect the same result.

Graphical Test of Proportional Odds Assumption
The line diagrams of all the explanatory variables are shown in Figure 1. The graphical test of proportional odds assumption indicates that the estimated average logits for all categories in the distinct variable are almost parallel in shape except the variable "age of children". The average logits of different categories for the children age did not support the parallel assumption of POM. This picture is also revealed by the individual score test.

Determinants of Child Undernutrition
In POM and PPOM, all the considered variables are found as significant predictors of child malnutrition as in previous studies. The covariates were also found significant in both the separate BLR models except antenatal-postnatal care status, incidence of diarrhoea and ARI in the 2 nd BLR model with the response variable "at least severe undernutrition". These results support the use of POM and PPOM instead of BLR models to determine the predictors of child undernutrition as well as severe undernutrition.
The results of POM reveal that the risk of having worse nutrition status were 6.53 and 5.15 times higher among the children belonging to the age group 12-23 and 24+ months respectively, when compared with the infants (Table 3). Since this variable violated the proportional odds assumption, this interpretation may be invalid. However, from separate BLR models and PPOM it is clear that the odds ratios for the children aged 12-23 months and 24 + months compared to infants were about 6.9 and 5.4 respectively when no undernutrition state is compared with moderate and severe undernutrition states (Table 4 & 5). When no undernutrition and moderate undernutrition states are compared with severe undernutrition state, the odds ratios were found about 4.2 and 3.3 respectively for children belonging to . Since all other covariates did not violate the proportional odds assumption and PPOM performed better than POM as well as separate BLR models, the results for other covariates are described from Table 5. Children having birth interval < 24 and 24-47 months had 1.6 and 1.5 times greater risk of having worse nutrition status compared with the children having 48 + months birth interval ( Table 5). The risk of having worse nutrition status was found highest for the children having mothers with no education (about 3.0 times) when compared with highly educated mothers' children. Compared to the children of the richest households, the chances of having worse nutrition status was found to increase with decrease of household wealth condition (2.03 for the children of poorest household and 1.32 for those of richer household). The risk of having poor nutrition condition was found significantly higher for the children with poor feeding practices compared to those having better feeding practices. Mothers who received no antenatal-postnatal care had 1.62 times greater risk of having malnourished children compared   of acutely malnourished mothers had 1.74 times greater risk of being undernourished compared to those of nourished mothers. Children experienced with ARI, fever, and diarrhoea within last two weeks of the survey had 1.22, 1.27 and 1.28 times higher risk of being undernourished respectively when comparison is made with the children having no such problems (Table 5).

Discussion
At first sight the POM seems to be an appropriate model for analyzing the considered data since the pvalue of chi-squared score test for overall model is insignificant at 5% level of significance indicating proportional odds assumption is not violated. All of the considered variables were found significant in the POM. However, the p-value of the score test for overall model was very much small which compels to conduct single score test for each covariate. These tests show that only 'age of children' violates the vital assumption of POM which may lead invalid results. Separate BLR models also indicate the coefficients and the odds ratios for the each age categories varied in the models. Graphical test of proportional odds assumption reveals the same result.
In case of all other variables, coefficients and odds ratios are not identical but almost closer. In PPOM, coefficients and odds ratios for the variable 'age of children' are almost same with the result of BLR models. However, the coefficients and odds ratios for other covariates in PPOM are slightly different compared to separate binary logistic regression models, but almost identical with those of POM. Moreover, all the variables are significant in PPOM but in separate binary logistic regression models few are insignificant.

Conclusion
Despite some differences in the results of the fitted models, the results of POM and PPOM are reasonably comparable with those of BLR models. The POM and PPOM have proved adequate for data analysis of child nutritional status, due to the nature of the response variable (grouped continuous variable), in addition, the parsimony and ease of interpretation. Furthermore, PPOM is fitted better for the data than POM. From the results of POM and PPOM it is clear that all the considered variables in the study are significant predictors of child malnutrition as previous research works. Moreover, these findings clearly justify that OLR models (POM & PPOM) are appropriate to find predictors of malnutrition as well as sever undernutrition instead of using two separate binary logistic regression models.