For each of the studies we are aiming to analyze all possible sampling-estimation method pairs given the sampling strategies used. Table below shows all sampling-estimation method pairs we consider in the meta-analysis. Note that the first three sampling-estimation pairs aim directly to estimate prevalence of hidden group within broader population, while the rest of the methods estimate size of the hidden group that can be transformed into prevalence estimate under assumption that there is no uncertainty about the size of the population within which the prevalence is being estimated.
Sampling | Estimation | Method references | [R] Package |
---|---|---|---|
Prevalence estimation | |||
PPS | HT: Horvitz-Thompson estimator of prevalence with re-scaled bootstrap standard errors | Rust and Rao (1996) | Self-coded + surveybootstrap (Feehan and Salganik
2023)
|
TLS | HT: Horvitz-Thompson estimator of prevalence with re-scaled bootstrap standard errors (Rust and Rao 1996) | Rust and Rao (1996) | Self-coded + surveybootstrap (Feehan and Salganik
2023)
|
RDS | SS: Sequential sampling estimator of prevalence | Gile (2011) |
RDS (Mark S. Handcock et al.
2023)
|
Population size estimation | |||
PPS | NSUM: Network scale-up estimatior of population size with re-scaled bootstrap standard errors | Killworth et al. (1998); Rust and Rao (1996) |
networkreporting (Feehan and Salganik
2016) + surveybootstrap (Feehan and
Salganik 2023)
|
TLS | NSUM: Network scale-up estimatior of population size with re-scaled bootstrap standard errors (Killworth et al. 1998; Rust and Rao 1996) | Killworth et al. (1998); Rust and Rao (1996) |
networkreporting (Feehan and Salganik
2016) + surveybootstrap (Feehan and
Salganik 2023)
|
TLS | RECAP *: Mark-recapture estimator of population size with parametric standard errors | Hickman et al. (2006) |
Rcapture (Baillargeon and Rivest
2007)
|
RDS | SSPSE: Bayesian sequential sampling model of population size | Mark S. Handcock, Gile, and Mar (2014) |
sspse (Mark S. Handcock et al.
2022)
|
RDS | CHORDS: Epidemiological model for population size estimation | Berchenko, Rosenblatt, and Frost (2017) |
chords (Berchenko, Rosenblatt, and Frost
2017)
|
RDS | MULTIPLIER *: Service-multiplier population size estimator with bootstrap re-sampling standard errors | Hickman et al. (2006); Salganik (2006) | Self-coded + surveybootstrap (Feehan and Salganik
2023)
|
LTS * | LINK: Link-tracing population size estimator based on special type of RDS | Vincent and Thompson (2017); Vincent and Thompson (2022) | Self-coded based on Vincent and Thompson (2022) |
In the meta-analysis part we focus on comparison of prevalence
estimates produced by all the methods listed in the table. The sampling
and estimation methods marked with *
in the table indicate
need for additional data or special type of sampling required for
corresponding method. Specifically, estimation using TLS-RECAP
requires recapture data to be collected beyond standard NSUM and direct
indicators. RDS-MULTIPLIER estimation requires data from a
provider of a service that is likely to be used by hidden group of
interest as well as direct questions about that service use asked from
hidden group members in RDS sample. Finally LTS-LINK estimation
requires a special type of RDS sample, Link-Tracing Sample, with more
initial seeds but less waves and collection of data on recaptures in RDS
network.
We use a Bayesian model to assess the relative bias of methods. We describe a model for \(N\) sites and \(K\) estimation methods which could be used at each site.
We are interested in the true prevalence, an \(N\)-dimensional vector \(\alpha\) and the bias associated with each estimate relative to the benchmark procedure: a \(K\)-dimensional vector \(\beta\) (with the first element constrained to unity).
The following code block describes the stan
model used
to estimate \(\alpha\) and \(\beta\) give the findings across
studies.
model_code <- "
data {
int<lower=0> N;
int<lower=0> K;
int<lower=0> n_ests;
real<lower=0> alpha_prior_mean[N];
real<lower=0> alpha_prior_sd[N];
real<lower=0> bias_prior_mean[K];
real<lower=0> bias_prior_sd[K];
int<lower=0> site[n_ests];
int<lower=0> design[n_ests];
vector<lower=0>[n_ests] ests;
vector<lower=0>[n_ests] ses;
}
parameters {
vector<lower=0.01>[K] bias;
vector<lower=1>[N] alpha;
}
model {
target += normal_lpdf(ests | bias[design].*alpha[site], ses);
alpha ~ normal(alpha_prior_mean, alpha_prior_sd);
bias ~ lognormal(log(bias_prior_mean), bias_prior_sd);
}
"
The model block has three main elements:
The model thus builds in a critical assumption that the relative bias of a method is constant across sites. It is this bias that is estimated. To be clear the assumption is not that one method will always produce a higher or lower estimate than another but simply that it is potentially biased in that in expectation it produces a higher or lower estimate than another. Bias is relative to the bias of the benchmark method and also to the truth to the extent that the benchmark method is itself unbiased.