Empirical Forecasting of Slow-Onset Disasters for Improved Emergency Response: An Application to Kenya's Arid North

Mitigating the negative welfare consequences of crises such as droughts, floods, and disease outbreaks, is a major challenge in many areas of the world, especially in highly vulnerable areas insufficiently equipped to prevent food and livelihood security crisis in the face of adverse shocks. Given the finite resources allocated for emergency response, and the expected increase in incidences of humanitarian catastrophe due to changing climate patterns, there is a need for rigorous and efficient methods of early warning and emergency needs assessment. In this paper we develop an empirical model, based on a relatively parsimonious set of regularly measured variables from communities in Kenya’s arid north, that generates remarkably accurate forecasts of the likelihood of famine with at least 3 months lead time. Such a forecasting model is a potentially valuable tool for enhancing early warning capacity.


INTRODUCTION
The ability to forecast, with reasonable accuracy, the onset, duration and severity of events such as droughts, floods, and disease outbreaks, especially in terms of their prospective human welfare effects, is critical to the design of rapid, timely and costeffective emergency response systems that can minimize the suffering of populations adversely affected by such disasters. As the consensus on climate change and its consequences grows, there is an increasing worry that climate variability is growing and that the frequency of such disasters will continue to rise (IPCC, 2001). Incidences of humanitarian catastrophe and the ensuing demand for emergency response are therefore widely expected to increase.
Given the finite resources allocated for emergency response initiatives, there is growing demand worldwide for the development of rigorous, efficient and practical methods of early warning and emergency needs assessment. The famine early warning systems network (FEWS-NET), funded by USAID, is a comprehensive effort to provide timely assessments on the impending status of food insecurity threats in 20 African countries, as well as in Afghanistan and Haiti. They are also tasked with strengthening regional and national early warning and food security organizations through networking and capacity development. The USAID Global Livestock Collaborative Research Support Program (GL-CRSP) Livestock Early Warning System (LEWS) project and its successor, the Livestock Information Network and Knowledge System (LINKS) project, represent a complementary effort to provide pastoralists in East Africa with tools to track climate and forage conditions so as to inform herd management decisions and mitigate the consequences of impending and current drought. Meanwhile, the World Food Program's emerging project on Strengthening Emergency Needs Assessment Capacity (SENAC), for example, is currently engaged in the identification of methods and tools for emergency assessments, and for evaluating the effects of different types and combinations of shocks on various livelihood groups. Such early warning systems are increasingly recognized as essential to effective emergency response (Barrett and Maxwell, 2005).
There could be great humanitarian and economic value to developing accurate forecasting tools aimed at the needs of populations vulnerable to slow-onset disasters such as drought -which are quite distinct from rapid-onset events such as those associated with many other natural disasters, including earthquakes, hurricanes or tsunamis -and the identification of where and when to intervene and at what scale. In this paper, we aim to develop such a forecasting tool and demonstrate its practical value.
We focus on the arid lands of northern Kenya, largely populated by nomadic pastoralists and particularly vulnerable to covariate shocks in the form of droughts and floods.
Our primary objective is to make use of household data collected over several years by the Arid Lands Resource Management Project (ALRMP) of the Government of Kenya, and the spatially explicit data on forage conditions, rainfall and normalized differenced vegetation index 2 (NDVI) generated by the LEWS/LINKS team to develop an empirical forecasting model that can predict the expected human impact of covariate shocks and thereby provide a useful statistical method for early warning and emergency needs assessment in the arid and semi-arid lands (ASAL) of northern Kenya. We explore the village-level joint dynamics of: (i) herd size, pastoralists' primary asset (Lybbert et al., 2004, McPeak, 2004; (ii) lactation rates, which drive pastoralists' primary income source (McPeak, 2004); (iii) climate and forage conditions, the key exogenous environmental determinants of productivity dynamics in pastoralist systems (Coppock et al., 1986;Ellis, 1994;Ellis and Swift, 1988); and (iv) child nutritional status, a key outcome variable of humanitarian and economic interest. The idea is to investigate whether patterns in climate variables, herd size and other key system variables can predict changes in village-level child nutritional indicators. We expect the model we develop to contribute to improved early warning capacity and accuracy, especially for increased precision in targeting areas for emergency assistance with sufficient lead time to mount an appropriate response.
The potential value of such a model is best recognized in light of the deficiencies and constraints of current systems. First, the resources available to respond to emergencies are extremely scarce. As Barrett and Maxwell (2006) report, only 16% of consolidated appeals for emergencies in Sub-Saharan Africa are funded. Given such scarcity, available resources need to be used wisely. Second, even when response is initiated, numerous bureaucratic and logistical obstacles cause significant delays in emergency food aid delivery that can substantially reduce its effectiveness. Third, while existing targeting of food aid in Kenya has been closely related to rainfall, Lentz and Barrett (2005) show that rainfall shocks are surprisingly weakly correlated with household-level shocks to income and assets.
Given tight resource constraints, inevitable response delays and the limited effectiveness of rainfall-based targeting, there is much potential value in improved methods of multi-month early warning approaches that offer greater accuracy and a longer forecasting horizon. By forecasting a key outcome variable, the nutritional health of children in the affected population, our model improves on existing early warning systems that merely forecast the likelihood of a drought event, which is imperfectly correlated with the human outcomes of interest to policy makers and relief agencies. It is important to clarify that the clientele for such a forecasting tool is not the subject populations, who often have their own, traditional methods of forecasting droughts and initiating personal and community level response strategies (Luseno et al., 2003), but rather government officials, operational agency managers and others tasked with responding to prospective food crises with external resources. Our objective is to illustrate how a relatively parsimonious set of regularly measured variables can generate reasonably accurate forecasts with at least three months' lead time and relatively low rates of Type I errors (i.e., false negatives, in this setting implying that the model predicts no crisis but one nonetheless occurs).
The rest of the paper is structured as follows. Section 2 summarizes the data we use. In section 3, we briefly describe the methodology we apply to develop our forecasting model. Descriptive statistics of key model variables are presented and discussed in section 4. In section 5 we present forecasting results and test the performance of our forecasts. Section 6 concludes by exploring the potential uses of the model and its policy implications.

DATA
The World Bank-funded ALRMP seeks to address the vulnerability of populations living in Kenya's arid lands and to improve their ability to manage the risks they often face (World Bank Group, 2003). As part of the project, repeated crosssectional data have been collected in various communities across Kenya's arid districts since 1996. Data relevant for our objective are sourced from household level surveys that contain detailed information on livestock including herd sizes, mortality rates, lactation rates, and managed off-take rates. Critically, child nutritional data in the form of midupper arm circumference (MUAC) was also collected.
While data have been collected monthly from 1996 and across 10 different districts, poor data organization and storage unfortunately resulted in significant quantities of lost data that rendered many areas too patchy for any rigorous analysis.
Consequently, our effective data set, while still substantial, constitutes a mere subset of what was collected.
Furthermore, no authoritative document on the collection procedures and sampling methodologies employed exists. As such, while we know that community sites were purposively selected to take into account population density and spatial distribution across a district, we are only aware that enumerators were asked to randomly select 30 households per community without being clear on the method of randomization used or whether it was enforced. Nevertheless, while this forces us to be cautious about (mis)representing our results as statistically representative of any single place, the data seem sufficiently rich to shed important light on dynamic processes that are as yet not well understood and for which quantitative evidence of any sort is distressingly scarce. In Figure 1, we show the four districts, Baringo, Marsabit, Samburu and Turkana, as well as the sample communities within each district, for which a sufficient number of continuous observations were available and therefore used to estimate our models.
[ Figure 1 about here] Although surveyed households were theoretically revisited each month for a year before a new sample was generated, insufficient record keeping prevents individual households from being linked across periods. As such, we create a community-level pseudo-panel by generating community-level summary statistics of the pertinent variables. Despite the unavoidable loss of information, such a pseudo-panel is well suited to investigating the impacts of covariate, community-level shocks whose dynamics are more relevant for external emergency interventions than are idiosyncratic, householdlevel shocks (Deaton 1985, McKenzie 2004.
We supplement the ALRMP data with a rich source of climate and forage availability data collected and produced by LEWS/LINKS researchers. LEWS/LINKS has developed a set of technologies and models that provide high-resolution, highfrequency estimates of livestock forage availability in the pastoralist-dense regions of East Africa . These data provide key variables for our model. Most major covariate shocks that hit pastoralist communities are a function of adverse forage and water conditions associated with climate fluctuations (Ellis and Swift 1988, Ellis 1994, Galvin et al. 2001. As changes in livestock fertility, mortality and lactation are closely related to forage and water quality and availability, access to the detailed dynamics of these variables greatly enhances the value and precision of models generated to investigate and predict the human impact of the climate shocks that frequently destabilize pastoralist communities. To the best of our knowledge, no available forecasting tools integrate these biophysical, socioeconomic and anthropometric data for early warning purposes.

METHODOLOGY
As adverse covariate shocks come in various forms and have varied effects on the affected populations, emergency assessments often focus on the response, or predicted response, of critical indicator variables. Proxies of food insecurity such as the real price of key staples or the level of food production, availability, or expenditure can yield estimates of the population's vulnerability to hunger and starvation. ALRMP data include sample readings of the Mid-Upper Arm Circumference (MUAC) for selected community children. Among indicators of nutritional wellbeing, MUAC is particularly well-suited for our purposes. As a measure of wasting, MUAC is capable of capturing short-term fluctuations in the presence of nutritional stress and can thus serve as a gauge of the human impact of various shocks. Furthermore, MUAC is easier and less costly to collect than weight-for-height (W/H), the most commonly used and most documented anthropometric measure for wasting. Indeed, several studies have shown MUAC to be a far better predictor of child mortality than W/H (Chen et al. 1980, Alam et al. 1989, Vella et al. 1994). As such, MUAC is a particularly appropriate indicator of the welfare impacts of a humanitarian crisis.
As we are primarily concerned with generating the most precise forecasts possible, our objective is to maximize the forecasting accuracy of the model. While estimating unbiased regression coefficients would allow us to reasonably infer the relationship between key model variables and MUAC, attempting to correct for endogeneity by using instruments, or imposing limits on the error structure, would necessarily reduce efficiency and thus the accuracy of forecasts. Consequently, we leave the potentially interesting inference-based estimation for further study and in this paper focus exclusively on optimizing the forecast capabilities of our model.
As MUAC trends are likely to display a high degree of persistence -that is, current values are highly correlated to recent historical values -it would be logical to include lagged values of MUAC as an explanatory variable to capture the inevitable dynamic component. We therefore estimate a dynamic model for our unbalanced panel 3 that takes the following form: v is the unobserved error term. The dynamic model characterized above is quite general and, as implied by the inclusion of a non-specified limit on the lag term s,, can accommodate multiple lags of both the dependent and independent variables. Lag structure is generally chosen based on the assumed data generating process, the estimators chosen and the researcher's objective. As our objective is to maximize forecasting efficiency, we will use the Root Mean Square Error (RMSE) criterion to select for optimal lag structure.
Our exclusive focus on forecasting performance considerably simplifies our estimation. One key problem with estimating a dynamic model in general is that the presence of the lagged dependent variable introduces endogeneity that causes the least squared dummy variable (LSDV) estimator to become biased and inconsistent. 4 While several estimators have been developed that deal with this problem (Anderson and Hsiao 1982, Arellano and Bond, 1991, Kiviet, 1995, Hansen 2001, they trade-off efficiency for unbiasedness and have greater variance than the LSDV estimator. The inclusion of other predetermined or endogenous explanatory variables, such as food aid, would also result in biased coefficients if not appropriate instrumented. As we are not interested in the coefficient estimates generated by the estimation, we can abstract away from these concerns and simply use the LSDV estimator which is well known to be of least variance.

DESCRIPTIVE STATISTICS
In what follows we briefly describe the data used to estimate our models and present graphical representations of the trends of key variables. The data structure, To get a sense of the dynamics of key variables in our model, we estimated kernel regressions of the variables across time. As MUAC was collected for children aged 6-59 months residing within sample households or in close proximity (for a maximum of 5 children per household), we transformed our MUAC data into standardized Z-scores. 6 We used the internationally recognized 1978 CDC/WHO growth chart whose reference population is American children sampled in the 1977 National Center for Health Statistics survey. Figure 2 depicts the trend of one of our dependent variables, villagelevel mean child MUAC z-score, across our sample districts.
[ Figure 2 about here] The generally low mean Z-scores show that even during relatively good periods, the average MUAC value of the sample population is much less than the reference population. Nonetheless, within our sample, significant temporal fluctuations in the Zscores are suggestive of the influence exogenous shocks exert on nutritional wellbeing. [ Figure 3 about here] The increasing incidence of drought and subsequent widespread severe wasting in Kenya's arid north in the past decade or so has catalyzed a significant food aid response whose intensity varies across time, depending on the degree of the crisis affecting recipient populations as well as other socioeconomic and political factors. Figure 4 shows the trends in the fraction of households that received Unimix (a micronutrient [ Figure 4 about here] Livestock represent the main stock of assets and a significant source of nutrition for pastoralists. As such, herd dynamics may also contain an important source of information pertaining to the health and wellbeing of pastoralist populations. For example, declining lactation rates would limit the availability of a key nutrient source among pastoralists which, if severe enough, will translate to lower MUAC. Increasing incidence of sales or slaughter on the other hand, may signal coping behaviors triggered by negative shocks. In Figure 5, we present herd sizes aggregated into Total Livestock Units (TLU) and compare their dynamics across districts. 7 [ Figure 5 about here] While there is a general upward trend in herd sizes across the sample period, the drought of 2000 resulted in a noticeable decline as can be clearly seen in the TLU trend.
TLU trends follow a similar pattern across districts, with the exception of Baringo, where a significant fall in herd sizes occurs toward the end of 2002. Much of this decline can be explained by the rate of managed off-take (sales and slaughter) in Baringo, which was significantly higher than the rest of the sample at this time. In Figure 6, we show the trends in lactation, mortality, sales and slaughter rates. As expected, the drought period in 2000 is characterized by a high degree of mortality across the four animals and significantly lower lactation rates among cattle and camels. 8 Trends in sales and 7 TLUs allow for a comparison of livestock quantities across species. One TLU is equivalent to 1 head of cattle, 0.7 camels, 10 goats or 11 sheep. 8 Lactation rates are calculated as daily means per herd and thus include both female and male of the species as well as young and old. This could partly explain the generally low rates of lactation posted.
slaughter rates do not seem to follow a pattern that can be readily interpreted as either post-shock coping behavior or ex-ante risk mitigation. Nonetheless, sales and slaughter, especially among small stock are noticeably higher during the drought period. 9 [ Figure 6 about here]

Estimating the Forecasting Model
Armed with a sense of the dynamics of key model variables, we now move on to estimate our model. Recall that our primary objective is to estimate a model that generates the most accurate forecasts of MUAC levels possible. As we are not interested in making inferences on the relationship between our explanatory variables and MUAC levels, we do not concern ourselves with estimating unbiased, consistent coefficients.
Instead, we focus exclusively on trying to maximize the predictive efficiency of our model. As such, we include as many explanatory variables as possible, the only condition being that they contribute to an increase in the (degrees of freedom adjusted) Root Mean Squared Error (RMSE). 10 We use the LSDV estimator which, as discussed in section 3, is the most efficient estimator for this purpose. After estimating numerous specifications of various combinations of the variables, including squared terms, crossproducts, and varying lag lengths, the specification that offered the best RMSE is shown in Table 2.
[ Table 2 about here] 9 As previously mentioned, the large increase in sales rates starting from the end of 2002 is largely driven by sales in Baringo. One possible explanation is that the exceptionally favorable forage availability in Baringo in early to mid 2002 created a particularly healthy stock of animals that fetched relatively high prices at the market. The El nino floods that then hit Baringo especially hard in late 2002 may also explain the spike in sales, as the destruction caused by floods increased the demand for cash to cope with the unexpected catastrophe. Because increased livestock mortality due to floods would also affect the denominator for the sales rate variable, this would also increase sales per livestock holdings. 10 RMSE is the most commonly used criterion for comparing the performance of forecasting models. Other popular criteria include the asymptotically efficient Akaike Information Criterion (AIC), and the asymptotically consistent Schwarz Information Criterion (SIC). The main difference between these three model selection criteria is that they penalize degrees of freedom differently. For the various specifications we estimated we also calculated the resulting AIC and SIC. In no situation did any of the three criteria disagree as to which model was best, probably because degrees of freedom represent a large fraction of the total available observations. For simplicity, we thus only present the RMSE.
The first set of explanatory variables in Table 2 Figure 6 shows, large stock trends generally follow a similar pattern across locations while small stock trends often move in tandem. As food aid receipts within sample households are quite considerable, and as they may impact MUAC levels, we include four food aid-related variables; the fraction of households receiving UNIMIX, the fraction receiving regular cereals, the amount (in kilograms) of UNIMIX received per recipient household, and the amount of cereal received per recipient household. Both the herd dynamic variables and the food aid variables were introduced with one and two month lags.
The biophysical variables, mean monthly rain (mm), forage availability (kg/hectare) and NDVI were the only variables introduced with squared terms in order to capture potential nonlinearities in their effect on MUAC. In addition, each variable was included with five lags (from a two through six-month lag). 12 To account for other sources of seasonal variation not captured by these biophysical variables, we include monthly dummy variables.
Note that we do not introduce any of the explanatory variables contemporaneously. Despite the likelihood that this would increase RMSE, we leave out contemporary variables in order to allow for a legitimate forecast. This specification enables one-month ahead forecasts of MUAC means. Given the lags in emergency food 11 Including separate variables for each of the four animal species reduced the RMSE. 12 Starting from a one-month lag, or extending to a seven-month lag does not improve results. As the effect of biophysical variables on MUAC likely occurs with a lag, the fact that including a one-month lag did not improve the forecast accuracy suggests that the effect of these variables on MUAC is felt with more than a month lag. Conversely, the effect is likely to wear off after six-months. aid response caused by bureaucratic and other constraints (Barrett and Maxwell, 2005), a one-month forecast is admittedly short, leaving little leeway for aid workers to make effective use of the forecasts. As readers will note, the inverse relationship that exists between forecasting horizon and forecast precision involves a delicate tradeoff between timely provision of the forecast and improved accuracy of forecasts, both of which determine the usefulness of the forecasting model to potential users of this information.
As an alternative, we also estimate a three-month ahead forecasting model. We utilize the same specification presented in Table 2, the difference being that each variable is lagged by an additional two months in this case. 13 We present key statistics from both the one-month ahead and the three-month ahead forecast models in Table 3. We also include a few statistics on the dependent variable. However, as both models introduce a considerable number of explanatory variables, and as we are focusing on inference or the precision of estimated coefficients, we do not include coefficient estimates or t-statistics. 14 [ Table 3 about here] The results are quite encouraging. The RMSE are 0.1666 and 0.1986 for the one and three month models, respectively. These RMSEs, which define the mean error deviation of within-sample forecasts from the actual, are relatively small, especially given that the variable being forecast has a sample standard deviation of 0.40. This gives an indication that out-of-sample forecast performance should be quite good.
Furthermore, the models fit the data surprisingly well, generating high r-squared values for both within community temporal variation and between community spatial variation.
The between r-squares are strikingly high at 0.98 and 0.88 for the one and three month models respectively. While the within r-squares, which are also relatively impressive at 0.53 and 0.34, are more relevant for forecasting, the extremely high between r-squares suggest that our explanatory variable explain the bulk of spatial variation in community level MUAC means across the sample area for a given time period. This may also explain the good model fit despite the highly unbalanced nature of the panel. If spatial 13 So, for example, the lag structure for the MUAC Moments variables, including the lags of the dependent variable, now becomes L(3,4,5). For the biophysical variables the lag structure now becomes L(4,5,6,7,8). This is necessary to allow for legitimate three-month forecasts. 14 Interested readers can obtain the full set of results from the authors.
variation is so well captured by the explanatory variables, the absence of observations for certain communities will not have as much of a negative effect on model fit and forecasting performance as missing a full set of observations for particular time periods.
As we can see from Table 1, the unbalanced nature is fortunately more a product of a few and changing set of communities missing across numerous time periods rather than whole sets of communities missing at a given time.
The next step is to operationalize the model and test its out-of-sample forecast performance. This allows us to ask how food security managers or the relevant policy makers can best make use of the forecasts. Note that the forecast is a point estimate of predicted community MUAC means. Given a forecast estimate, one might want to know, for example, what the resulting 90% confidence interval for the true value is.
Alternatively, one might want to know the degree of confidence with which a person using this forecast can claim that the true value lies below a certain critical threshold of policy interest.
In either case, if the objective is to gauge the intensity or magnitude of food insecurity, or to estimate the severity of child malnutrition, a potential weakness in our forecast is revealed. Our model forecasts the mean MUAC Z-score for sample community children. The distribution of MUAC is likely to reflect within and between community inequalities in calorie and nutrient intake and may therefore vary between and within communities. MUAC distributions could also vary across time due to differential capacities of households to support their children's nutritional wellbeing during times of food stress. Children of particularly vulnerable households are likely to suffer higher We therefore re-estimate our model with an alternative dependent variable that more precisely specifies the left-tail density of the MUAC distribution: the fraction of children in each community who have MUAC Z-scores less than -2. We set -2 as our threshold to be consistent with the benchmark often employed by key emergency relief agencies to define various levels of food stress and famine (Howe and Devereux, 2004; UN Sub-Committee on Nutrition, 1999;World Food Programme, 2000). 15 Figure 7 presents full-sample and district level kernel regression of the fraction of children per community with MUAC Z-scores below -2 across time.  Howe and Devereux (2004) offer a useful framework by which to define the intensity and magnitude of famine. The scaling system they develop uses various levels of wasting, defined by the proportion of children with Z-Score less than -2, as a key indicator variable (though the anthropometric measure they use is weight-for-height, standardizing into Z-Scores should make the use of MUAC a valid substitute). Such a scaling system would be useful in conjunction with forecast estimates as a means to classify the estimated severity of food insecurity. 16 Note that the figures display averages over communities. During 2000, 35 out of 56 of our sample communities would have been classified as experiencing severe famine conditions for one or more months. Fourteen of these suffered severe famines for 6 or more months of 2000. the sample who fall below a critically defined threshold, rather than on mean MUAC, may be of greater use. Table 4 presents key statistics for the forecasting model with proportion MUAC Z-score<-2 as its dependent variable. Other than simply substituting the relevant dependent variables and their lags, no other difference exists between the specification we employed in this case and that presented in Table 2. Again, we employed an LSDV estimator for both a one-month ahead and a three-month ahead forecast. 17 While the overall R-squares for both the one-month and the three-month forecasts are slightly lower for the proportion model presented above, RMSE, the more relevant within-sample proxy for out-of-sample forecast performance, favors the mean forecast. [ Table 4 about here]

Testing Forecast Performance
There are other methods of evaluating the forecast performance of our models than using RMSE. To do so, we move on to generate a series of rolling one and three month ahead forecasts for both the mean and proportions models in order to more precisely compare and estimate their forecast performance. Here, we use lagged information to generate a forecast of a future event in a given time period, then assess how accurate the forecast turned out to be using information in the data set on what actually happened in the future time period for which the prediction was generated. We begin by generating forecasts for January 2004. As such, one-month ahead forecasts are 17 Note that in theory the dependent variable as a proportion is doubly centered at zero from below and one from above. In practice we find that left censoring is particularly prevalent with 27% zero-observations (Less than 1% of the observations are right-censored). This suggests using a panel tobit estimator. We fit a random-effects tobit model (a sufficient statistic allowing for the fixed effects to be conditioned out of likelihood dos not exist) to our data but the resulting RMSE was larger. We thus opted for the more efficient LSDV estimator. few months before correcting. This is of significance, as it suggests that forecasting accuracy may diminish at particularly critical periods such as, for example, when a severe shock hits, causing a rapid deterioration in food security as was the case in early 2000.
For emergency response and food security managers, such a time is when accurate information would be most valuable.
We take a look at how serious this problem may be by estimating forecast bias. This is done by regressing the actual, observed value of the dependent variables on the forecasted value in a simple linear model of the form, where a X is the actual value and f X is the forecasted value. Testing the null hypotheses 1 : , 0 : , is thus a test for an unbiased forecast. Rejection of that null implies bias. The results for the mean and proportions models for both forecasting horizons are presented in Table 5.
[ Table 6 about here] We can only reject a null of an unbiased forecast for the 3-month means model.
The 3-month means model is the only one for which both α and β are estimated significantly (thus narrowing the range of the 95% confidence interval) and are both significantly different from the values consistent with an unbiased forecast. This implies that overall, the forecasting models performs quite well as unbiased and reasonably accurate predictors of both mean MUAC levels (for the one-month forecast) and proportions of children suffering severe wasting (for both forecasting horizons).
As there have only been a few abrupt changes in MUAC trends over the sample period, and the sample period is relatively short, the is likely to 'learn' and improve as more data are folded into it, especially data bracketing relatively sharp changes in MUAC. Indeed, the second salient point revealed in Figures 7 and 8 is that the models seem to learn quite fast as more data is folded into this. In both cases, and for both forecast horizons, it is clear that the forecasts track actual trends closer as we roll forward in time. While this could be a feature of the reasonably stable trends across the latter part of the forecasting period, a portion of the improved forecasts may be explained by learning. This highlights the value, not only of adding more data to the model, but also of assuring that data are continuous as the highly unbalanced nature of the ALRMP panel as it currently stands undoubtedly reduces the model's performance.
Another revealing feature of these forecasts is that despite the fact that the RMSE (when normalized for units) suggests that the mean model has superior forecast performance relative to the proportions model, this does not come across in comparing Consequently, we further test the models by gauging how they perform based on the likelihood of correct responses that the models provide to questions they might be called upon to answer.
For practical purposes, suppose that a food security manager wants to know the likelihood that a condition of 'famine' will prevail in a particular area. Given a particular forecast, he or she would then need to know the confidence with which the actual proportion of children with MUAC Z-score<-2 was greater or equal to 20%. 19 We generate such confidence levels for the one and three month forecasts of both the proportion and the means models. As none of the forecasting residuals for any of the models were normally distributed, 20 we employed percentile counts of the in-sample distribution of forecast residuals to generate our confidence levels. So, for example, if 10% of the sample residuals for the one-month proportion forecast had an error of greater than 0.15 percentage points, and the forecasted one-month ahead proportion was 0.05, then the confidence with which one could claim that in one months time, 20% or more children will register MUAC Z-scores<-2, would only be 10%.
Extracting confidence levels allows for defining trigger points that, when crossed, could set in motion a series of predetermined response mechanisms (Barrett, 1997). 19 Following Howe and Devereux (2004), we define famine to be the condition at which twenty percent or more children have MUAC Z-scores<-2. For purposes of testing the performance of the means model, we employ an alternative MUAC means-based definition of famine that we arbitrarily set as the condition at which the mean level of MUAC Z-scores is less than -1.8. The model's performance should be robust to threshold chosen to define famine. 20 Shapiro-Wilks tests for normality of a distribution rejected normality of the forecasting residuals in each of the for cases.
Thus, for example, one could imagine a set of rules such that if famine is expected to arise with 66% confidence, food aid and other forms of emergency support are immediately mobilized for the affected areas. Since such a coordinated response inevitably takes time as logistical, financial and bureaucratic concerns are sorted out, a dual-trigger system could be devised in which, at the crossing of a lower confidence band (say 33%), readiness measures are taken in preparation for a deployment should the second trigger be crossed.
We use such a hypothetical decision making system to offer another measure of the performance of the forecasts. First, we generate confidence levels for each forecasting horizon for both the means and proportions models. We then set three arbitrary trigger points, one each at the 75%, 66% and 50% confidence levels. The trigger points can be thought of as the minimum confidence level that a policy maker requires to initiate a famine emergency response. Defining emergency response when there is actually a famine, or no response when there is no famine, as a 'correct' decision, we calculate the fraction of correct decisions that would result from utilizing the different models. Table 5 presents the results. We also calculate the fraction of wrong decisions that are Type 1 errors -the proportion of total wrong decisions that result from failing to respond when a famine actually occurs.
[ Table 5 about here] The results clearly reveal the superiority of the proportions model in generating correct decisions. The magnitudes of the differences in performance are quite striking.
For all confidence thresholds, and both forecasting horizons, decisions based on the proportions model are about 15% more likely to be correct. Interestingly enough, the three-month proportions forecast also outperforms the one-month means forecast by around 8% under each confidence threshold.
Further underscoring the results from Figures 8 and 9, the fairly small depreciation in performance as we increase our forecasting horizon is promising and shows that the models can be used fairly accurately to give policy makers a reasonable, three-month early warning window to mitigate the consequences of impending shocks.
The two-month gain in early warning is arguably worth the modest loss of accuracy.
Requiring higher degrees of confidence to guide the release of emergency resources for famine support does not necessarily result in a significant additional number of correct decisions. Though higher confidence thresholds do reduce the likelihood of taking costly measures to mitigate the consequences of a famine that does not materialize, it increases the frequency of making the arguably more costly mistake of failing to initiate the requisite response when famine actually occurs. As can be seen in Table 5, which presents the fraction of total error (wrong decisions) that are Type 1 (no action is initiated when a famine even actually occurs), all four forecasting regimes err on the side of responding to a wrongly declared famine (a Type 2 error), rather than committing the mistake of failing to act when famine is real.
Deciding how to distribute incorrect decisions between Type 1 and Type 2 errors is a normative judgment best left to the policymakers. As this is nonetheless an important decision requiring the crucial tradeoff between ineffectively distributing scarce resources (committing a Type 2 error) and failing to offer emergency respite to disaster-stricken populations (a Type 1 error), we investigate this issue further. In Figure 10, we graph the relationship between the fraction of correct decisions and the desired confidence threshold, as well as the expected fraction of Type 1 errors by confidence threshold.
[ Figure 10 about here] The key feature of Figure 10 is to show that the fraction of correct decisions is relatively stable, and highest, along a wide range of confidence thresholds. What changes most, especially for the proportions model, is the relationship between expected Type 1 errors committed and the confidence threshold. Lower confidence thresholds for decision making necessarily result in a smaller fraction of Type 1 errors. The observation that the proportions model results in substantially more correct decisions over the entire range of confidence thresholds supports use of the proportions model over the means model. One decision making strategy might be to choose the trigger confidence threshold that yields the maximum fraction of Type 1 errors that emergency responders are willing to accept; another might be to choose the threshold level that maximizes the difference between the proportion of correct decisions and the fraction of Type 1 errors; another might be the threshold that maximizes the proportion of correct decisions. Such normative decisions are the domain of policymakers.

CONCLUSION AND POLICY IMPLICATIONS
Using a pseudo-panel of community level cohorts collected from primarily pastoralist communities selected across four Districts in Kenya's arid north, we set out to develop an empirical forecasting model that can predict, with reasonable accuracy, the expected welfare impact of impending drought. We find that the joint dynamics of herd composition and herd management, climate and forage availability and food aid flows are able to forecast child nutritional status dynamics with impressive precision. Forecasting the proportion of children that fall below a critical nutritional threshold (proxied by a threshold MUAC value, a relevant parameter for assessments of the intensity and severity of a humanitarian crisis), yielded particularly accurate predictions. Moreover, offering policy-makers more response leeway by forecasting three months into the future only marginally reduced forecast performance relative to a one-month ahead forecast.
The policy implications are immediately clear. It is possible to generate accurate forecasts of a key human welfare indicator that is particularly sensitive to the shocks that occasionally affect large numbers of the target population simultaneously. Moreover, the forecasts were generated from a relatively small set of variables that are not overly difficult or costly to collect. In addition, reasonable accuracy is achieved despite deficiencies in the data used. Improvements to data collection through a more systematic sampling procedure and more careful data handling would likely further improve forecast performance. This underscores the need to develop standardized data collection procedures and failsafe methods for entering, identifying and storing data.
While several early warning and emergency needs assessment guides exist, our empirical forecasting method has the advantage of demonstrable statistical rigor and outof-sample performance. Once developed, the model can be easily and regularly updated with new information, each time quickly re-estimating the relevant parameters in a learning process that results in improved performance. Such a forecasting model is an invaluable tool for emergency awareness and response needs, offering rigorous, costeffective and practical early warning capacity.