Individual studies

For each of the studies we are aiming to analyze all possible sampling-estimation method pairs given the sampling strategies used. Table below shows all sampling-estimation method pairs we consider in the meta-analysis. Note that the first three sampling-estimation pairs aim directly to estimate prevalence of hidden group within broader population, while the rest of the methods estimate size of the hidden group that can be transformed into prevalence estimate under assumption that there is no uncertainty about the size of the population within which the prevalence is being estimated.

Sampling-estimation pairs
Sampling Estimation Method references [R] Package
Prevalence estimation
PPS HT: Horvitz-Thompson estimator of prevalence with re-scaled bootstrap standard errors Rust and Rao (1996) Self-coded + surveybootstrap (Feehan and Salganik 2023)
TLS HT: Horvitz-Thompson estimator of prevalence with re-scaled bootstrap standard errors (Rust and Rao 1996) Rust and Rao (1996) Self-coded + surveybootstrap (Feehan and Salganik 2023)
RDS SS: Sequential sampling estimator of prevalence Gile (2011) RDS (Mark S. Handcock et al. 2023)
Population size estimation
PPS NSUM: Network scale-up estimatior of population size with re-scaled bootstrap standard errors Killworth et al. (1998); Rust and Rao (1996) networkreporting (Feehan and Salganik 2016) + surveybootstrap (Feehan and Salganik 2023)
TLS NSUM: Network scale-up estimatior of population size with re-scaled bootstrap standard errors (Killworth et al. 1998; Rust and Rao 1996) Killworth et al. (1998); Rust and Rao (1996) networkreporting (Feehan and Salganik 2016) + surveybootstrap (Feehan and Salganik 2023)
TLS RECAP *: Mark-recapture estimator of population size with parametric standard errors Hickman et al. (2006) Rcapture (Baillargeon and Rivest 2007)
RDS SSPSE: Bayesian sequential sampling model of population size Mark S. Handcock, Gile, and Mar (2014) sspse (Mark S. Handcock et al. 2022)
RDS CHORDS: Epidemiological model for population size estimation Berchenko, Rosenblatt, and Frost (2017) chords (Berchenko, Rosenblatt, and Frost 2017)
RDS MULTIPLIER *: Service-multiplier population size estimator with bootstrap re-sampling standard errors Hickman et al. (2006); Salganik (2006) Self-coded + surveybootstrap (Feehan and Salganik 2023)
LTS * LINK: Link-tracing population size estimator based on special type of RDS Vincent and Thompson (2017); Vincent and Thompson (2022) Self-coded based on Vincent and Thompson (2022)

In the meta-analysis part we focus on comparison of prevalence estimates produced by all the methods listed in the table. The sampling and estimation methods marked with * in the table indicate need for additional data or special type of sampling required for corresponding method. Specifically, estimation using TLS-RECAP requires recapture data to be collected beyond standard NSUM and direct indicators. RDS-MULTIPLIER estimation requires data from a provider of a service that is likely to be used by hidden group of interest as well as direct questions about that service use asked from hidden group members in RDS sample. Finally LTS-LINK estimation requires a special type of RDS sample, Link-Tracing Sample, with more initial seeds but less waves and collection of data on recaptures in RDS network.

Meta level

We use a Bayesian model to assess the relative bias of methods. We describe a model for \(N\) sites and \(K\) estimation methods which could be used at each site.

We are interested in the true prevalence, an \(N\)-dimensional vector \(\alpha\) and the bias associated with each estimate relative to the benchmark procedure: a \(K\)-dimensional vector \(\beta\) (with the first element constrained to unity).

The following code block describes the stan model used to estimate \(\alpha\) and \(\beta\) give the findings across studies.

model_code <- "

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> n_ests;
  real<lower=0> alpha_prior_mean[N];
  real<lower=0> alpha_prior_sd[N];
  real<lower=0> bias_prior_mean[K];
  real<lower=0> bias_prior_sd[K];
  int<lower=0> site[n_ests];
  int<lower=0> design[n_ests];
  vector<lower=0>[n_ests] ests;
  vector<lower=0>[n_ests] ses;
}
parameters {
  vector<lower=0.01>[K] bias;
  vector<lower=1>[N] alpha;
}

model {

  target += normal_lpdf(ests | bias[design].*alpha[site], ses);
  alpha ~ normal(alpha_prior_mean, alpha_prior_sd);
  bias ~ lognormal(log(bias_prior_mean), bias_prior_sd);

}
"

The model block has three main elements:

  • We assume that the likelihood of a given estimate \(\hat{\alpha}_{ij}\) for estimand \(\alpha_i\) using method \(j\) in location \(i\) is \(\phi(\hat{\alpha}_{ij}|\beta_j\alpha_{i}, \hat{\sigma_{ij}})\).
  • We have a (wide) prior on \(\alpha\): \(\alpha_i \sim f(\mu_{\alpha, i}, {\sigma}_{\alpha, i})\), where the \(\mu_{\alpha, i}\) is gathered from teams, \(\sigma\_{\alpha}\) is large, and \(f\) denotes the normal distribution. Note that we assume no relation between different elements of \(\alpha\)—that is we do not make use of a prior distribution on prevalence “in general.”
  • We have a (wide) prior on \(\beta_j\): \(\beta_j \sim g(\mu_{\beta_j}, {\sigma}_{\beta_j})\), where the \(\mu_{\beta_j}\) is gathered from teams, \(\sigma_{\alpha}\) is large, and \(g\) denotes the log-normal distribution. Note that we assume no relation between different elements of \(\beta\)—that is we do not make use of a prior distribution about biases of different methods “in general.”

The model thus builds in a critical assumption that the relative bias of a method is constant across sites. It is this bias that is estimated. To be clear the assumption is not that one method will always produce a higher or lower estimate than another but simply that it is potentially biased in that in expectation it produces a higher or lower estimate than another. Bias is relative to the bias of the benchmark method and also to the truth to the extent that the benchmark method is itself unbiased.

Baillargeon, Sophie, and Louis-Paul Rivest. 2007. “Rcapture: Loglinear Models for Capture-Recapture in r.” Journal of Statistical Software 19: 1–31.
Berchenko, Yakir, Jonathan D. Rosenblatt, and Simon D. W. Frost. 2017. “Modeling and Analyzing Respondent-Driven Sampling as a Counting Process.” Biometrics 73 (4): 1189–98. https://doi.org/10.1111/biom.12678.
Feehan, DM, and MJ Salganik. 2016. “Networkreporting: Tools for Using Network Reporting Estimators [r Package Version 0.1.1].”
———. 2023. “Surveybootstrap: Tools for the Bootstrap with Survey Data [r Package Version 0.0.3].”
Gile, Krista J. 2011. “Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation.” Journal of the American Statistical Association 106 (493): 135–46.
Handcock, Mark S., Krista J. Gile, and Corinne M. Mar. 2014. “Estimating Hidden Population Size Using Respondent-Driven Sampling Data.” Electronic Journal of Statistics 8 (1): 1491–521. https://doi.org/10.1214/14-EJS923.
Handcock, Mark S, Krista J Gile, Ian E Fellows, and W Whipple Neely. 2023. “RDS: Respondent-Driven Sampling Package.”
Handcock, Mark S, Krista J Gile, Brian Kim, and Katherine R McLaughlin. 2022. “Sspse: Estimating Hidden Population Size Using Respondent Driven Sampling.”
Hickman, Matthew, Vivian Hope, Lucy Platt, Vanessa Higgins, Mark Bellis, Tim Rhodes, Colin Taylor, and Kate Tilling. 2006. “Estimating Prevalence of Injecting Drug Use: A Comparison of Multiplier and Capture-Recapture Methods in Cities in England and Russia.” Drug and Alcohol Review 25 (2): 131–40.
Killworth, Peter D, Eugene C Johnsen, Christopher McCarty, Gene Ann Shelley, and H Russell Bernard. 1998. “A Social Network Approach to Estimating Seroprevalence in the United States.” Social Networks 20 (1): 23–50.
Rust, Keith F, and JNK Rao. 1996. “Variance Estimation for Complex Surveys Using Replication Techniques.” Statistical Methods in Medical Research 5 (3): 283–310.
Salganik, Matthew J. 2006. “Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling.” Journal of Urban Health 83 (Suppl 1): 98.
Vincent, Kyle, and Steve Thompson. 2017. “Estimating Population Size with Link-Tracing Sampling.” Journal of the American Statistical Association 112 (519): 1286–95.
———. 2022. “Estimating the Size and Distribution of Networked Populations with Snowball Sampling.” Journal of Survey Statistics and Methodology 10 (2): 397–418.