Tuesday, October 26, 2010

Scrap the National Household Survey?

I left off on my last post suggesting that if the voluntary National Household Survey is to go forward, it should be completely decoupled from the Census. (The recent court challenge brought by the Canadian Council for Social Development provides faint hope for another possible outcome, and I will comment on this in a future post).

But, assuming that the content previously collected in the census long form is now to be collected on a voluntary basis, what would be the best way to proceed if this proposition were to be considered coolly and rationally, instead of in the rushed, under the gun, last minute manner that was played out after Statistics Canada was punk'd by the government last spring? The NHS as currently planned has problems numerous and deep, to the point of being fatally flawed. Running it as part of the census operation, in the weeks immediately following the census, puts the census itself at risk. I am convinced that it will be much more difficult this time to get the cooperation of Canadians even for the mandatory census. I think we can safely predict that thousands will refuse to fill out the Census, even if it is mandatory, and openly challenge the government to coerce them to do so, under the threat of fines or jail, knowing full well that the Prime Minister and his cabinet support their resistance. It will take all of Statistics Canada's skill and energy, over a much longer period than usual, to get a good outcome just for the basic Census. Throwing the voluntary NHS into the mix as part of the Census operation will only compound the problem and risk a meltdown of the whole system. (As an aside, when Statistics Canada was ordered to bring forth a voluntary option for the long form content, I imagine they thought that this one, the NHS, was too outlandish ever to be selected by Cabinet: more response burden, less reliable data, more expensive - a loser all down the line.)

So, move the NHS as far away from the Census as possible for the sake of preserving the integrity of the census itself. There is no deadline for the NHS, and, unlike the census, no requirement that it be held at a given time. Since 1986, post-censal surveys, linked to the Census, but operationally distinct from it, have been conducted many months after the census. For example, in the case of the 1991 Health and Activity Limitation Survey, data were collected from August to October 1991, several months after the 2001 Census Day, which was June 4, 1991. More recently, data collection for the 2006 Survey on the Vitality of Official-Language Minorities took place from October 2006 to January 2007, nearly six months after the 2006 Census. So, there should be no problem shifting the NHS back in time so that it is well clear from Census data collection. Even though there may be a lag in reference dates between the Census and the follow up survey, this has not caused any quality issues for post-censal surveys in the past nor should it pose any problem for the NHS now.

The second problem will be getting any cooperation at all from Canadians to fill out this voluntary survey. Moving it several months after the census will help, but all those who resisted filling out the census proper will almost surely refuse to complete the NHS. Many of those who agree with the government's point of view that these questions are intrusive or outright silly will gladly take advantage of its voluntary nature and refuse to respond. Among those who disagree with the government's decision to discontinue the mandatory long form, a not uncommon reaction will be to show their displeasure by boycotting the NHS. And those who simply find it burdensome, who are too busy, or who feel that they have already done their civic duty by completing the mandatory census, will just let it slide. So, I believe the controversy surrounding this decision, and the confused and mixed messages coming from the government have poisoned the well and have made it nearly impossible to achieve even the modest 50% response rate that Statistics Canada now expects.

The third problem is the sheer size of the thing. One third of households: over 4 million questionnaires! That's crazy. You can get reliable estimates of social characteristics of the population for all Census Metropolitan Areas, representing over 85% of the Canadian population, with a sample of 25,000 (see the General Social Survey). The Labour Force Survey, which provides estimates of employment for all CMAs, economic regions and EI regions, uses a sample of around 54,000 households. The Canadian Community Health Survey uses a sample of 65,000 respondents annually to produce detailed health variables for 121 subprovincial health regions. How can they do it with such small numbers, orders of magnitude smaller than the planned sample size for the NHS?

Three reasons. First, the sample sizes for these surveys do not support estimates for small areas, such as city blocks, census tracts (which are like neighbourhoods) and small rural communities. The expectation, or rather the hope, is that the very large sample size of the NHS will support the production of this type of small area data (which is the main strength of the mandatory census). Second, these surveys acheive higher response rates than what is anticipated for the NHS. The sample for the voluntary NHS consists of one in three households because a response rate of no better than 50% is expected, yielding about the same number of usable responses as the 1 in 5 sample did for the mandatory census. Third, these surveys do not fear non-response bias (i.e. where the characteristics of non-respondents are systematically different from those of respondents, thus skewing estimates to represent only respondents and not the whole population). This is because, up until now, they have been able to compare their estimates of these characteristics to a benchmark, the mandatory census, and to correct for any biases they find. This is what the NHS will not be able to do, regardless of its sample size. So in summary, the very large sample size was chosen so as to support the production of small area data, it was bumped up by 50% to account for high expected non-response but in the end the resulting data will nonetheless contain unmeasurable biases that will make it suspect. Why try to produce small area data if you can't stand behind the results? Actually, why try to produce estimates for any area if you can't stand behind the results? It makes no sense.

And that takes us to the last, and most serious problem with the NHS as currently planned, non-response bias. All surveys are subject to non-response bias. In the case of sample surveys like those mentioned above (GSS, LFS, CCHS), the presence of bias can be detected by comparing their estimates to the census estimates, which is taken as an accurate benchmark, and correcting for any bias detected. How do we know that the Census itself does not contain non-response bias. Actually we don't, but because it is mandatory and response rates of 97% or 98% are achieved, any bias is so small as to have no practical effect on the estimates. So the census can confidently be used as a benchmark to detect and correct for biases in other sample surveys. As previously mentioned, this cannot be done for the NHS. With response rates of 50% or less, the potential for non-response bias is huge but without a benchmark against which to compare, such as the mandatory census, the biases cannot measured or corrected. The only remedy for this would be to have a data source with a 98% response rate, for the same population for the same reference period, a practical impossibility. (In addition, the NHS estimates containing unknown and unmeasurable bias cannot serve as a benchmark for other sample surveys, as the census did, leaving surveys such as the GSS, LFS and CCHS, and all other household sample surveys, high and dry). So this leads to a major impasse. Why conduct a massive survey, burdening one third of Canadian households, and costing $110 million, to produce data no one can trust?

So what's to be done? The long form variables are very valuable to data users, as demonstrated by the near unanimous outcry against the government's decision. So it's not a question of just scrapping the NHS and leaving it at that. If I was blue skying about what to do for a replacement, I think I would give up on the small area data. That is specifically the price of abandoning the mandatory nature of the census. It is what the mandatory census can give you that no other vehicle can. If this was not made clear when the decision was taken, it should have been. But what is done is done and there is no way to make it better. So aim for a sample design to support quality data at a higher level of geography. Maybe all CMAs and CAs and some broad rural areas per province. Then, I would try to reduce expected non-response. Divorce it from the census and the name "National Household Survey", which is now like response kryptonite. Break it down into manageable, less burdensome, content modules and spread it out over time, thus exercising the longer term strategy of using the census infrastructure for sample surveys. Then, I would try to deal with response bias by running a split panel for each content module, with a mandatory and a voluntary component, allowing the survey to essentially benchmark itself.

In the end, while still not as useful was the mandatory census, such an approach would provide more useful, usable data than the NHS as currently planned, for less money, less response burden and less damage to the national statistical system.

No comments:

Post a Comment