Abstract
Background
Population size estimation is critical for planning public health programmes for injection drug users. Estimation is difficult, as these populations are considered 'hidden’ or 'hard to reach’. The currently accepted population size estimate for greater Victoria, Canada is between 1,500 and 2,000 individuals, which is dated prior to the year 2000, and is likely an underestimate.
Methods
We used three markrecapture methods (the LincolnPetersen estimator, Huggins' model, and Pledger's model) to estimate population size using crosssectional survey data collected in 2003 and 2005. Data come from a closed population with two timeordered samples from the same source. We compare our estimates with the currently accepted estimate that is based on the registry of a Victoria needle exchange.
Results
All methods provided population size estimates that were higher than the currently accepted estimate. Huggins' method produced wider confidence intervals. Point estimates of population size from the three methods ranged from 3,329 to 3,342.
Conclusions
Our estimates will aid health authorities in planning for harm reduction programmes. Repeating the methods as further phases of ITrack data become available will ensure that the population estimates remain up to date.
Keywords:
Injection drug user; Public health; Capturerecapture; Population sizeBackground
Prevalence of HIV and hepatitis C in many injection drug user (IDU) populations is higher than in the general population; the same can be said for the injection drug user population of greater Victoria, British Columbia, Canada (City of Victoria and the 12 other members of the Capital Regional District) where both open and hidden use are known to occur [1]. IDUs are faced with many other challenges to their wellbeing, and public health authorities are charged with the duty of providing various harm prevention services from basic health care, addictions treatment, and counselling, to harm reduction education. Knowledge of the number of injection drug users within a population would aid both health authorities and community organisations in assessing coverage of existing programmes and in the planning and delivery of a range of public health services.
AIDS Vancouver Island (AVI)'s needle exchange programme was established in 1988, providing clean syringes for IDU residents of Victoria and surrounding areas including the Gulf Islands. The client load of AVI's needle exchange programme [2] was used to produce the only estimate available for the number of IDUs in the Capital Health Region. This estimate published in 2000 was 1,500–2,000 individuals [1]; however, there are no specific details on how this estimate was determined. In 2008, the fixedsite needle exchange location in Victoria was closed, and needle exchange services are now provided on a mobile basis. Other agencies have also started offering clean supplies to IDU clients in Victoria since the client load estimate was generated. It is therefore unknown how reliable the use of the current needle exchange programme registry is for assessing the size of the IDU population in greater Victoria. An accurate estimate of IDUs is vital to the planning of health services for this population.
To track changes in the prevalence of HIV and hepatitis C as well as risk behaviours, the Public Health Agency of Canada in collaboration with regional health authorities developed the national, crosssectional ITrack survey [3]. Phase I and phase II of the ITrack survey were completed in Victoria in 2003 and 2005, respectively. With only two samples from the ITrack survey (phase I and phase II), a closed population model must be implemented, as three or more samples are required to implement open population models. We use three closed population markrecapture models to estimate the number of IDUs in greater Victoria, BC and compare the estimates obtained.
Markrecapture models
Markrecapture or capturerecapture models come from the desire to estimate demographic parameters of wildlife populations. Their use in epidemiology is most prevalent from a multilist standpoint where data from several sources are combined to serve as samples from a population of interest [4]. Multiple data lists are typically collected over the same time frame but from different sources. For example, Hickman et al. studied injection drug use in Brighton, Liverpool, and London from five sources, namely arrest referrals, drug treatment reports, syringe exchange programmes, accident and emergency records, and a community recruitment survey [5]. For twodata source studies, samples may be dependent and there is no means to test for independence unless three data sources are obtained. This is the advantage of timeordered samples—one can model dependence through the behaviour of the injection drug users (see discussion on traphappy or trapshy behaviour). Hook and Regal provide an overview of the use of markrecapture multilist methods [6]. In multilist studies, there is no natural time ordering to the lists; thus, not all wildlife estimation techniques are valid [7]. It is less common to see epidemiological studies that sample the population over time, likely due to the logistics and resources required for such an undertaking (see [8,9] for examples). However, if done, the time ordering of samples offers an opportunity to use different estimation procedures than in multilist studies.
In wildlife studies, individuals are captured, marked with a unique identifier, and returned to mix back into the population. In subsequent samples, marked individuals are identified (recaptured) and unmarked individuals are given marks before release. Thus, an animal's capture history is recorded and is represented by a sequence of 0's (not captured) and 1's (captured) for each sample occasion. For example, in a twosample study, an animal with a history of {11} was caught at time 1, tagged, and released back into the population and was recaptured at time 2. In contrast, an animal with a capture history of {10} was caught at time 1, tagged and released, and was not seen again.
In studies of human populations, individuals are contacted (captured), and unique identifiers are obtained (marks). Here unique identifiers could be some combination of a person's date of birth, initials, age, etc. In subsequent samples, individuals are again contacted, and unique identifiers are obtained. Individuals whose identifiers match those from the first sample are considered to be resampled (recaptured). Once more, a capture history is developed for each individual in the study. For example, in a twosample study, an individual with a capture history of {11} was contacted in the first sample, marks were obtained, and the individual was contacted in the second sample. An individual with a capture history of {01} was only contacted in the second sample. For the purposes of this paper, the markrecapture terminology used in wildlife models will be used to refer to human populations.
The three estimators we implemented were the LincolnPetersen estimator [10], a conditional likelihood estimator [11,12], and a maximum likelihood estimator with finite mixtures [13]. The LincolnPetersen (LP) estimator (see [10]) is widely used in epidemiological twosample studies (for example, see [8,9]). Its limitations are largely due to model assumptions, which are similar for the other methods that we explored and are as follows:
1. The population is closed (no births or deaths, immigrations or emigrations).
2. The probability of capture is the same for each individual in the population within a sample.
3. Samples are independent.
4. Marks are not lost.
The other two estimators share these assumptions but provide methods to relax assumption 2.
Assumption 2 leads to the assumption that samples are independent. Chao describes causes for dependent samples, which include behavioural responses (e.g. traphappy or trapshy—see discussion) and heterogeneity in capture probabilities [7]. Incorporation of dependence among samples can be done by relaxing assumption 2 [7], implementing methods reviewed by Otis et al. [14].
The LP estimator violates assumption 2 when behaviour and/or heterogeneity affects the probability of capture. One method of dealing with heterogeneity in the data is to incorporate covariates into the estimation procedure. To do so, Huggins introduced a conditional likelihood procedure where capture probabilities can vary according to age, sex, or other factors [11]. Because the covariates for uncaptured individuals are unknown, Huggins constructed a likelihood conditional on the captured individuals so that characteristics of uncaptured individuals are not required [11]. Huggins' method also allows capture probabilities to depend on an individual's prior capture history [11]. The population size is then estimated indirectly using the capture probability estimates.
Another estimation procedure that models capture probabilities dependent on time, behaviour, and/or heterogeneity was proposed by Pledger, introducing finite mixture models to partition the individuals into two or more groups with relatively homogeneous capture probabilities [13]. Pledger's method relaxes assumption 2 but does not condition on captured individuals [13]. Rather, the likelihood models both captured and noncaptured individuals, allowing the size of the population (N) to be a parameter that is estimated directly. Xu and Cowen detail these three methods [15].
Methods
Itrack survey
The ITrack survey in Victoria is thoroughly described elsewhere [3]. Briefly, consenting participants were recruited in the downtown core of Victoria through a needle exchange programme run by AVI and at shelter services run by the Victoria Cool Aid Society. Other recruitment attempts were done using posters, flyers, word of mouth, and through contact with Vancouver Island Health Authority staff. Participants were not required to have a residence in Victoria or to have resided in Victoria for any specific period of time. Monetary compensation ($20.00) was provided for answering a questionnaire and providing a blood spot sample. Demographic and risk behaviour statistics resulting from these surveys are reported elsewhere [3]. Phase I completed in November 2003 had 254 participants, while phase II completed in June 2005 had 250 participants.
Eligibility criteria included being at least 15 years of age, being capable of informed consent, having an understanding of English or French, having injected nontherapeutic drugs in the past 6 months, and participation only once per phase. Parental consent was not needed, as it is possible to have mature minor consent in British Columbia.
Survey participants were asked to provide their initials, gender, and birth date (no proof of identification was required for ITrack participation). A computer encryption program used these inputs to create a unique identifier that would be replicated if the same data were entered again in a future phase of the study. This allowed the subjects to be linked between different study phases and preserved anonymity. This identifier (analogous to a unique tag in a wildlife study) is the tool that allows for a markrecapture study, resulting in the estimation of the number of injection drug users in greater Victoria, BC.
To establish that respondents were injection drug users, subjects were recruited only after an exchange of needles had taken place at the needle exchange. In other locations, screening questions were used (e.g. Where on your body do you inject? Where do you get your rigs? What size needle do you use? When did you last inject?). If during the interview the subjects' responses suggested a lack of familiarity with terms, their eligibility would be questioned.
Statistical analysis
We discuss the details of the statistical models in the Appendix. We implemented models in Program MARK [16]. Model selection was done by forming a set of plausible models and using Akaike's information criterion corrected for small sample sizes (AICc) to choose a model from among this candidate set [17]. Goodness of fit for closed population models has not yet been resolved [18] (see the Appendix for a discussion). However, we did compare observed with expected counts of each capture history in the form of Pearson chisquare residuals (i.e., X^{2} = (observed  expected)^{2}/expected) [19].
Results
Table 1 provides basic demographic characteristics of the two phases of ITrack data.
Table 1. Demographics of phases I and II of the ITrack survey
A thorough discussion of the model selection process for each method is discussed in Xu and Cowen [15]. Briefly, we examined the eight standard closed population models outlined by Otis et al. [14]. These models allow capture probabilities to vary by time, behaviour, and/or heterogeneity. The LincolnPetersen estimator is the model where capture probabilities vary by sample time. Of the 254 individuals sampled in phase I and 250 individuals sampled in phase II, there were 19 individuals in both samples. The population size estimate is 3,329 individuals using the LincolnPetersen estimator (Table 2). For Huggins' method, both the individual's sex and previous capture history were used as covariates for modelling the heterogeneity in capture probability. We also examined models to see if there was additional group heterogeneity. However, AICc chose the model with constant capture probabilities and no group heterogeneity. Similar results occurred with Pledger's method; there were no time, behaviour, nor heterogeneity responses in the capture probabilities. The model with constant capture probabilities had the lowest AICc value.
Table 2. The estimated number of injection drug users in greater Victoria, BC
Table 2 compares the estimation results from all three estimation methods. In terms of the point estimate for population size and the confidence intervals, all three methods produced similar results; however, none of the confidence intervals contains the upper bound estimate of '2,000’ provided by Stajduhar et al. [1]. Further, the estimated standard errors for all methods were also similar.
Pearson chisquare residuals for the model with constant capture probability (Pledger's model) are provided in Table 3. Based on these results, we find no evidence for outliers or concerns with fit of the model. Similar results were seen for residuals of Huggins' model and the LP estimator.
Table 3. Observed count, expected count, and Pearson chisquare residual for the model with constant capture probability
Discussion
There was some concern that the number of recaptures in our study was lower than expected, resulting in population estimates that were higher than the currently accepted estimate of 1,500–2,000 individuals. The ITrack survey aimed to recruit from a broad spectrum of user groups. Forty percent of the ITrack participants were recruited at locations other than the needle exchange. An IDU population estimate based on the needle exchange programme registry prior to the year 2000 (the 1,500–2,000 estimate) would miss people who were not clients of the needle exchange programme and is therefore likely an underestimate. Our estimate represents approximately 0.9% of the greater Victoria population, whereas the proportion of IDUs is approximately 0.2%–0.9% nationwide [20,21]. However, Victoria has a comparably mild climate that may attract streetinvolved people from other areas. We therefore argue that it is reasonable for our estimate to be at the upper end of national estimates. Because the national estimate is based on a population survey that covers both urban and rural locations, it is not directly comparable to our estimate.
A lowered recapture rate could also be the result of a 'trapshy’ response where individuals from the first survey avoid being captured in the second survey. For the LincolnPetersen estimate, this would have resulted in an overestimate of population size [7]. However, as behaviour was modelled in Huggins' and Pledger's models, we would have seen a reduction in the population estimate; this was not the case.To explore this issue further, we varied the number of recaptures in the LincolnPetersen estimator to see how this affected the population size estimate (Figure 1). To get a LincolnPetersen estimate of around 2,000 individuals, the number of recaptures would have to be at least 32 individuals. Similarly, having 43 recaptures would produce an estimate of around 1,500 individuals.
Figure 1. Estimated population size using the LincolnPetersen estimator varying the number of recaptures. Error bars represent the point estimate ± two estimated standard errors.
The closure assumption is likely violated for the ITrack data. Deaths could have occurred between the two ITrack surveys, people could have moved into or out of the region, and initiation or cessation of injection could have occurred between samples. To look at the stability of injection, we define the average number of years of injection as average age minus average age of first injection (Table 1); we find this to be 11.6 and 16.0 years for phases I and II, respectively. The stability of the client groups associated with recruitment sites is unknown and may have had some impact on the closure assumption.
Violation of the closure assumption can result in biased estimates, which increases with increased mobility into and out of the population [22]. Kendall studied the effect of closure violations on closed population models from the viewpoint of individuals in the population being a subset of a superpopulation [23]. For situations where individuals are able to move randomly in and out of the study area throughout the study, Kendall considered each of the survey samples to be random samples from a superpopulation of size N^{0}[23]. Individuals in the study area are drawn from the superpopulation with probability τ_{j} and captured with probability p_{ij} on occasion j. The closed population estimators are biased for the group of individuals in the study area on occasion j, but unbiased for the superpopulation. Arguably, the superpopulation is of more interest than the number of individuals in the study area at a particular occasion. The superpopulation for our study would be all individuals that entered the study area between 2003 and 2005.
The assumption of homogeneity of capture probabilities is rarely met in epidemiological studies [24]. This can be affected by the behaviour of an individual. For example, in animal studies, an animal that enjoyed the experience of being caught can become 'traphappy’. Similarly, if an IDU enjoyed the experience of the first ITrack study or was positively impacted by the $20.00 remuneration, the person might have looked for opportunities to participate in the second. On the other hand, if an individual did not have a good experience with the first ITrack survey, the person might avoid the second survey ('trapshy’). Further, different individuals could have intrinsically different capture probabilities, causing heterogeneity. Otis et al. specified models that incorporated potential sources of variation by modelling capture probabilities as dependent on time, behaviour, and/or heterogeneity [14]. All of the models we used were based on Otis et al.'s work [14]. The conditional model approach modelled capture probabilities dependent on the sex of the individual, thereby having the potential to further reduce heterogeneity. These models cannot account for individuals who have a null probability of being captured. If such individuals exist in the population, then our estimates would be considered conservative.
As mentioned, the assumption of independent samples can be relaxed and modelled through incorporation of behavioural effects or heterogeneity in capture probabilities. This assumption was not likely violated as models that included behaviour or heterogeneity effects were not selected.As no formal identification is required to participate in the study, it is possible for unique identifiers to change from one survey to the next, violating the assumption of no tag loss. This could happen if an individual forgot the information that results in their unique identifier or if an individual's unique identifier changed between survey phases due to unusual cases such as a name being changed (due to marriage for example). If a subject provided different identifiers, it would not be possible to link them. We argue that this would be rare and would result in a reduced number of recaptured individuals producing overestimates of population size (see Figure 1).
When the data from the latest ITrack survey are available, we would like to use an open population JollySeber model to remove this assumption altogether in future work [25,26].
As our estimate is quickly becoming a decade old, further estimates to determine if the population size remained the same over the last 10 years would be beneficial. Moving into an open population framework with more data would also allow us to assess whether the population size has changed over time. Once established, application of this model to future phases of the data would be relatively straightforward.
Conclusions
For the Vancouver Island Health Authority, our population estimates will be helpful in the planning of services to meet the health care needs of the IDU population. When harm reduction programmes such as fixedsite needle exchanges are implemented to help control the transmission of HIV and hepatitis C, knowing the number of potential clients will aid in programme development.
Local experience in Victoria has demonstrated that when services are insufficient to meet demand, higher risk drug use practices may take place, including needle sharing. These higher risk practices may result in threats to health such as bloodborne pathogen infections, abscesses, and overdoses.
Improved estimates of the population size will assist in securing resources required to meet service demands and planning the mix of services that may best meet these needs. This could include adjustments to number and types of locations providing harm reduction services, hours of operation, and numbers of staff. Improved estimates will also better enable an assessment of the impact of programmes and policies for this population.
Appendix
Statistical model details
As the ITrack data have a natural time ordering, we were not limited to multilist models. We were able to relax the constant probability assumption using Huggins' model with capture probabilities dependent on the covariate sex [10]. For our model, the capture probabilities were modelled using a linear logistic formulation as
where p_{ij} denotes the probability that individual i is captured at occasion j, sex_{i} is an indicator variable for the sex of individual i, and z_{ij} is equal to 1 if individual i was captured before occasion j and 0 otherwise. Thus, covariates for sex and previous capture history (behaviour of the individual) were introduced into the model.
Using Pledger's method, capture probabilities were modelled dependent on time, behaviour, and/or heterogeneity. The capture probabilities were modelled with a linear logistic formulation as
where θ_{jba} is the probability of capture for individual i at occasion j with behaviour b in group a; b = b_{ij} is equal to 1 if individual i was not caught before occasion j and 2 otherwise; τ_{j} is the effect of time for occasion j; β_{b} is the effect of behaviour for an individual with behaviour b; η_{a} is the effect of heterogeneity for an individual in group a = 1, 2,…, A with probability π_{1}, π_{2},…, π_{A}; and μ is a constant unknown parameter.
Goodness of fit
Goodness of fit in closed population models is problematic and is still a current statistical issue [18]. One of the main problems is that when heterogeneity is considered in the capture probabilities, there is an infinite number of saturated models due to the fact that individuals that are not captured cannot have their covariates measured; in other words, the saturated model is not uniquely determined due to the missing covariates. This is a problem for a formal goodnessoffit test based on the deviance, which requires a uniquely specified saturated model. A goodnessoffit test based on the conditional distribution of the observed data does not suffer from this problem. However, Link pointed out that very different capture probability models can give rise to an identical conditional distribution [27], rendering any goodnessoffit test based on the conditional distribution powerless in distinguishing these capture probability models.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
YX and LLEC developed the study design, analysed the data, and wrote the first draft of the manuscript. LW and MF provided insight into the interpretation of the results and edited the manuscript. All authors contributed to and have approved the final manuscript.
Acknowledgements
The Vancouver Island Health Authority coordinated the data collection for the Victoria site of the ITrack surveys. E. Roth provided helpful comments that improved the manuscript. The University of Victoria (UVic) provided funding to LLEC to establish collaboration with the Vancouver Island Health Authority. The Public Health Agency of Canada (PHAC) funded the ITrack surveys.
References

Stajduhar K, Poffenroth L, Wong E, Archibald C, Sutherland D, Rekart M: Missed opportunities: injection drug use and HIV/AIDS in Victoria, Canada.
Int J Drug Policy 2004, 15:171181. Publisher Full Text

Stajduhar KI, Poffenroth L, Wong E: Missed Opportunities: Putting a Face on Injection Drug Use and HIV/AIDS in the Capital Health Region. Vancouver, BC: Centre for Health Evaluation and Outcome Sciences (CHÉOS) Scientific Monograph; 2002.
Monograph 10

Vancouver Island Health Authority: ITrack survey: enhanced surveillance of risk behaviours and prevalence of HIV and hepatitis C among people who inject drugs. http://www.viha.ca/NR/rdonlyres/5E14E20553984267AA6134F9D671F22B/0/Final_ITRACK_Report_Victoria_20060605.pdf webcite

DomingoSalvany A, Hartnoll RL, Maguire A, Brugal MT, Albertin PA, Caylà JA, Casabona J, Suelves JM: Analytical considerations in the use of capturerecapture to estimate prevalence: case studies of the estimation of opiate use in the metropolitan area of Barcelona, Spain.
Am J Epidemiol 1998, 148:732740. PubMed Abstract  Publisher Full Text

Hickman M, Higgins V, Hope V, Bellis M, Tilling K, Walker A, Henry J: Injecting drug use in Brighton, Liverpool, and London: best estimates of prevalence and coverage of public health indicators.
J Epidemiol Community Health 2004, 58:766771. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hook EB, Regal RR: Capturerecapture methods in epidemiology: methods and limitations.
Epidemiol Rev 1995, 17:243264. PubMed Abstract  Publisher Full Text

Chao A: An overview of closed population capturerecapture models.
J Agric Biol Environ Stat 2001, 6:158175. Publisher Full Text

Khan SI, Bhuiy A, Uddin ASMJ: Application of the capturerecapture method for estimating number of mobile male sex workers in a port city of Bangladesh.
J Health Popul Nutr 2004, 22:1926. PubMed Abstract

Minh TT, Nhan DT, West GR, Durant TM, Jenkins RA, Huong PT, Valdiserri RO: Sex workers in Vietnam: how many, how risky?
AIDS Educ Prev 2004, 16:389404. PubMed Abstract  Publisher Full Text

Seber GAF: The Estimation of Animal Abundance. 2nd edition. London: Griffin; 1982.

Huggins RM: On the statistical analysis of capture experiments.
Biometrika 1989, 76:133140. Publisher Full Text

Huggins RM: Some practical aspects of a conditional likelihood approach to capture experiments.
Biometrics 1991, 47:725732. Publisher Full Text

Pledger S: Unified maximum likelihood estimates for closed capturerecapture models using mixtures.
Biometrics 2000, 56:434442. PubMed Abstract  Publisher Full Text

Otis DL, Burnham KP, White GC, Anderson D: Statistical inference from capture data on closed animal populations.

Xu Y, Cowen L: Use of closed population models to estimate the number of injection drug users in Victoria, B.C. University of Victoria, Victoria, B.C: Department of Mathematics and Statistics; [Mathematics and Statistics Technical Report #DMS865IR]
https://dspace.library.uvic.ca:8443//handle/1828/3361 webcite

White GC, Burnham KP: Program MARK: survival estimation from populations of marked animals.

Burnham KP, Anderson DR: Model Selection and Multimodel Inference: A Practical InformationTheoretical Approach. 2nd edition. New York: Springer; 2002.

Lukacs P: Closed population capturerecapture models.
In Program MARK: A Gentle Introduction 9th edition. Edited by Cooch E, White G. 2011, 138.
http://www.phidot.org/software/mark/docs/book/ webcite

Williams BK, Nichols JD, Conroy MJ: Analysis and Management of Animal Populations. Modeling Estimation, and Decision Making. San Diego: Academic; 2002.

Health Canada: Canadian alcohol and drug use monitoring survey.
2011. [Drug and Alcohol Use Statistics]
http://www.hcsc.gc.ca/hcps/drugsdrogues/stat/_2011/summarysommaireeng.php webcite

Canadian Centre on Substance Abuse (CCSA): Injection drug users overview.
2011. [Canadian Centre on Substance Abuse]
http://www.ccsa.ca/Eng/Pages/default.aspx webcite. accessed in August 2011

Larson A, Bammer G: Why? Who? How? Estimating numbers of illicit drug users: lessons from a case study from the Australian Capital Territory.
Aust N Z J Public Health 1996, 20:493499. PubMed Abstract  Publisher Full Text

Kendall WL: Robustness of closed capturerecapture methods to violations of the closure assumption.

Stephen C: Capturerecapture methods in epidemiological studies.
Infect Control Hosp Epidemiol 1996, 17:262266. PubMed Abstract  Publisher Full Text

Jolly GM: Explicit estimates from capturerecapture data with both death and immigrationstochastic model.
Biometrika 1965, 52:225247. PubMed Abstract

Seber GAF: A note on the multiple recapture census.
Biometrika 1965, 52:249259. PubMed Abstract

Link WA: Nonidentifiability of population size from capturerecapture data with heterogeneous detection probabilities.
Biometrics 2003, 59:11231130. PubMed Abstract  Publisher Full Text