RDS/RDS+ methods

  • Use the coupons, the timing of surveys and the structure of recruitment network to approximate the size of the population

  • Relies on network connectedness, i.e. if the seeds are within a “small world” the method will likely depart significantly from the truth

  • Do not rely on proportional sampling and likely are cheaper

Crawford, Wu, and Heimer (2018)

  • Use data from RDS \(\mathbf{Y} = (G_{R}, \mathbf{d}, \mathbf{t}, \mathbf{C})\), where:
    • \(G_{R}\) – Directly observed recrutiment graph with \(n\) (out of \(N\)) vertices (not the same as \(G_{S}\) – full subgraph of graph of hidden population \(G\) induced by recruitment)
    • \(\mathbf{d}\) – vector of number of connections of each recruited respondents in hidden population \(G\) from survey
    • \(\mathbf{t}\) – directly observed vector of recruitment timings
    • \(\mathbf{C}\) – coupon matrix in which \(C_{ij} = 1\) iff subject \(i\) has at least one coupon just before the \(j\)-th subject recruitment event, and zero otherwise
  • Using this data we are trying to estimate \(N\) using the likelihood of \(N | G_{S}\) marginalizing over all \(G_{S}\) consistent with \(G_{R}\)
  • Assumptions:
    1. \(G\) is finite graph with no self-loops
    2. Use Erdos-Renyi Model assumptions for hidden population graph \(G\): assume \(d_{i} \sim \mathrm{B} (N-1, p)\) (similar to (Killworth et al. 1998) and thus to (Maltiel et al. 2015))
    3. The edges connecting the newly recruited respondent \(i\) to other unrecruited respondents do not affect \(i\)’s probability of being recruited [this is very strong assumption]
    4. Vertices become recruiters immediately upon entering the study and receiving one or more coupons. They remain recruiters until their coupons or susceptible neighbors are depleted, whichever happens first [this is very strong assumption]
    5. When a susceptible neighbor \(j\) of a recruiter \(i\) is recruited by any recruiter, the edge connecting \(i\) and \(j\) is immediately no longer susceptible [this is very strong assumption]
    6. The time to recruitment along an edge connecting a recruiter to a susceptible neighbor has exponential distribution with rate \(\lambda\), independent of the identity of the recruiter, neighbor, and all other waiting times
  • Assumptions 2 and 3 imply that the “unrecruited degree” of subject recruited in \(i\)-th order can be represented as \(d^{u}_{i} \sim \mathrm{B} (N - i, p)\)
  • This in turns allows us to write \(\mathcal{L} (N, p \,|\, G_{S}, \mathbf{Y}) = \prod_{i}^{n} {N - i \choose d^{u}_{i}} p^{d^{u}_{i}} (1-p)^{N-i-d^{u}_{i}}\)
  • Assumptions 4-6 imply \(\mathcal{L} (G_{S}, \lambda \,|\, \mathbf{Y}) = \left( \prod_{j \notin M} \lambda \mathbf{s}_{j} \right) \exp (-\lambda \mathbf{s}' \mathbf{w})\), where \(s = \mathrm{lowerTri} (\mathbf{AC})'\mathbb{1} + \mathbf{C}'u\) – vector of the number of susceptible edges just before each recruitment event
  • Appendix also has very useful ideas on network structure simulation using block approach similar to (Feehan and Salganik 2016)

Handcock, Gile, and Mar (2014)

  • Uses the same RDS data (no need for supplementary data from general population)
  • Models degrees of un-sampled vertices as being drawn independently from a pre-specified parametric distribution
  • Does not follow strict topological rules on \(G_{S}\) as the ones imposed by (Crawford, Wu, and Heimer 2018)
  • This lack of graphical constraints in the SS-size model suggests a view of RDS recruitment that is not network-based: subjects’ reported degrees might be regarded as surrogate measures of “visibility” in the population which in turn refers to (Maltiel et al. 2015) and standard HT estimator
  • The SS-size model also assumes that the degrees of recruited subjects should decrease over time as the sample accrues (Johnston et al. 2017)
  • We can basically estimate this approach and the one in (Crawford, Wu, and Heimer 2018) at the same time: While the models are very different the data used is very similar. It seems that the NSUM methods and specifically (Maltiel et al. 2015) model has a better “compatibility” with (Handcock, Gile, and Mar 2014)

Berchenko, Rosenblatt, and Frost (2017)

  • Current RDS estimators resort to modeling recruitment as a homogeneous random walk, which culminates in the assumption that the sampling probability is proportional to degree, that is, \(\pi_{k} \propto k\) which in turns allows for use of HT estimator
  • Uses the same RDS data (no need for supplementary data from general population) in the context of epidemiological model setup in (Andersson and Britton 2000)
  • Assumptions:
    1. Sampling is done without replacement, with \(n_{k,t}\) being the (right-continuous) counting process representing the number of people with degree \(k\) recruited by time \(t\).
    2. Between times \(t\) and \(t + \Delta t\), an individual with degree \(k\) is sampled with probability \[\lambda_{k,t} = \frac{\beta_{k}}{N} I_{t} (N_{k} - n_{k,t}) \Delta t + \mathcal{o} (\Delta t)\] where \(I_{t}\) is the number of people actively trying to recruit new individuals, and the constant \(\beta_{k}\) is a degree-dependent recruitment rate
    3. The multivariate counting process \(n_{t} \equiv (n_{1,t}, \dots, n_{k_{\max}, t})\) has intensity \[\lambda_{t} = N^{-1} \left( \beta_{1} I^{-}_{t} (N_{1} - n^{-}_{1,t} ), \dots, \beta_{k_{\max}} I^{-}_{t} (N_{k_{\max}} - n^{-}_{k_{\max},t} ) \right)\] such that \(m_{t} \equiv n_{t} - \int_{0}^{t} \lambda_s \mathrm{d} s\) is a multivariate martingale.
  • We are interested in the estimator \(\hat{P}_{CP} = \sum_{k \geq 1} \hat{f}_{k} \hat{p}_{k}\) where \(\hat{f}_{k}\) is the estimate of proportion of degree \(k\) in the population and \(\hat{p}_{k}\) is the estimate of prevalence in the degree \(k\) group

TLS methods

  • Karon and Wejnert (2012)
  • Raymond et al. (2007)
  • Muhib et al. (2001)
  • MacKellar et al. (1996)

PPS methods

Horvotz-Thomson

NSUM

  • The NSUM approach is based on the idea that for all individuals, the probability that someone they know is in a given subpopulation is the size of that subpopulation divided by the overall population size. The estimate of the sub population size is then based on the share of the subpopulation in the respondent’s network. For example, if a respondent knows \(100\) people total and knows \(2\) intravenous drug users, then it is inferred that \(2\%\) of the total population are intravenous drug users. The number of people in a given subpopulation that the respondent knows is assumed to follow a binomial distribution.
  • However, the total number of people known by a respondent, also called his or her degree or personal network size, also needs to be estimated. [why can’t we ask about this directly?] A person’s degree is estimated by asking the respondents about the number of contacts he or she has in several subpopulations of known size, such as twins, people named Nicole, or women over \(70\), using the same assumption that an individual should know roughly their degree times the proportion of people in a given subpopulation. [why asking about hidden population directly will work?]
  • …Does not take account of the different propensities of people to know people in different groups, such as people’s tendency to know people like themselves; these are called barrier effects.
  • Transmission bias arises when a respondent does not count his or her contact as being in the group of interest, for example because the respondent does not know that the contact belongs to the group. This bias may be particularly large when a group is stigmatized, as is the case of most of the key affected populations in which we are interested.
  • Recall bias refers to the tendency for people to underestimate the number of people they know in larger groups because they forget some of these contacts, and to overestimate the number of people they know in small or unusual groups.

G-NSUM (Killworth et al. (1998))

Basic setup:

  • \(y_{ik}\) be the number of people known by individual \(i\), \(i = 1, \dots, n\) , in group \(k\), \(k = 1, \dots, K\)
  • Groups \(1, \dots, K − 1\) are of known size and group \(K\) – of unknown size (in general number of unknown size groups can be more than one)
  • \(N_{k}\) – size of group \(k\), and let \(N\) be the total population, which is assumed to be known
  • \(d_{i}\) – number of people that respondent \(i\) knows, also called his or her degree or personal network size
  • Core assumption: \(y_{ik} \sim \mathrm{B} (d_{i}, \frac{N_{k}}{N})\)
  • Main finding: \(\widehat{d_{i}} = N \frac{\sum_{k = 1}^{K} y_{ik}}{\sum_{k = 1}^{K} N_{k}}\) which then leads to \(\widehat{N_{K}} = N \frac{\sum_{i = 1}^{n} y_{iK}}{\sum_{i = 1}^{n} \widehat{d_{i}}}\)

G-NSUM (Maltiel et al. (2015))

Additional assumptions (on top of basic setup above):

  • Number of people respondents \(i\) knows is distributed log normal: \(d_{i} \sim \textrm{Lognormal} (\mu, \sigma^{2})\) (Salganik et al. 2011)
  • Priors are \(\pi (N_{k}) \propto \frac{\mathbb{1}_{N_{k} < N}}{N_{K}}\); \(\mu \sim \mathrm{U} (3,8)\); \(\sigma \sim \mathrm{U} (\frac{1}{4}, 2)\)
  • Existence of barrier effect implies that \(\frac{N_{k}}{N}\) can be over or underestimated by respondent depending on the group \(k\), so instead of using this term in binomial distribution assume:
    • \(y_{ik} \sim \mathrm{B} (d_{i}, q_{ik})\) and \(q_{ik} \sim \textrm{Beta} (m_{k}, \rho_{k})\) (mean-sd parametrization), where \(m_{k} = \mathbb{E} [q_{ik}] \equiv \frac{N_{k}}{N}\)
    • priors are \(\pi (m_{k}) \propto \frac{1}{m_{k}}\) and \(\rho_{k} \sim \mathrm{U} (0,1)\)
  • Existence of transmission bias implies that for the group of interest \(K\) respondents might underestimate number of connections due to stigma. Thus they assume:
    • \(y_{ik} \sim \mathrm{B} (d_{i}, \tau_{k}\frac{N_{k}}{N})\), \(\tau_{K} \sim \textrm{Beta} (\eta_{K}, \nu_{K})\) and \(\forall k\neq K:\: \tau_{k} \equiv 1\)
    • standard priors on \(\eta_{K}\) and \(\nu_{k}\)
  • Recall bias usually implies that respondents under-report large groups and over-report small groups. Use adjustment method instead of implicit parametrisation to adjust for this bias.
  • If we combine two we get:
    • \(y_{ik} \sim \mathrm{B} (d_{i}, \tau_{k} q_{ik})\),
    • \(d_{i} \sim \textrm{Lognormal} (\mu, \sigma^{2})\),
    • \(q_{ik} \sim \textrm{Beta} (m_{k}, \rho_{k})\),
    • \(\tau_{K} \sim \textrm{Beta} (\eta_{K}, \nu_{K})\),
    • \(\mu \sim \mathrm{U} (3,8);\; \sigma \sim \mathrm{U} (\frac{1}{4}, 2);\; \rho_{k} \sim \mathrm{U} (0,1)\)
    • \(\pi (N_{k}) \propto \frac{\mathbb{1}_{N_{k} < N}}{N_{K}} ;\; \pi (m_{k}) \propto \frac{1}{m_{k}}\)

G-NSUM (Feehan and Salganik (2016))

  • Focus on in-reports and out-reports:
    • two people are connected by a directed edge \(i \rightarrow j\) if person \(i\) would count person \(j\) as a member of group of interest (e.g. drug injector). Whenever \(i \rightarrow j\), we say that \(i\) makes an out-report about \(j\) and that \(j\) receives an in-report from \(i\).
    • \(K\) - target group, \(U\) - whole population, \(F\) - frame population (actually surveyed)
    • Requires relative probability sampling from hidden population with structure of questions related to the group of known size. E.g. “How many widowers do you know?” combined with “How many of these widowers are aware that you inject drugs?”
  • Total number of out-reports from \(i\) to group \(K\) are \(y_{iK}\) and total number of in-reports about \(i\) from the whole population is \(\nu_{iU}\) and from the frame population is \(\nu_{iF}\)
  • Total out-reports = total in-reports: \(\sum_{i \in F} y_{iK} = \sum_{i \in U} \nu_{iF} \Leftrightarrow N_{K} = \frac{\sum_{i \in F} y_{iK}}{\sum_{i \in U} \nu_{iF} \big/ N_{K}}\)
  • Assume that the out-reports from people in the frame population only include people in the hidden population, then it must be the case that the visibility of everyone not in the hidden population is 0: \(\forall i \notin K:\: \nu_{iF} \equiv 0\)
    • then \(N_{K} = \frac{\sum_{i \in F} y_{iK}}{\sum_{i \in K} \nu_{iF} \big/ N_{K}} = \frac{y_{FK}}{ \overline{\nu_{KF}}}\)
  • Estimate \(y_{FK}\) using HT estimator with known sampling probability from \(F\)
  • Estimate \(\overline{\nu_{KF}}\) using the probe alters, \(\mathcal{A}\) , groups of known size for which we collect relational data from hidden population sample. Using this data the estimator is \(\widehat{\overline{\nu_{KF}}} = \frac{N}{N_{\mathcal{A}}} \frac{\sum_{i \in s_{K}} \sum_{j} \tilde{\nu}_{i, A_{j}} \big/ (c \pi_{i} )}{\sum_{i \in s_{K}} 1 \big/ (c \pi_{i})}\), where \(\tilde{\nu}_{i, A_{j}}\) is the self-reported visibility of member of hidden population to group of known size \(A_{j}\).
  • Use bootstrap methods to estimate the standard errors/CIs
  • Very useful: Provide comparison with the basic NSUM method (Killworth et al. 1998) and show that \(N_{K} = \underbrace{\left( \frac{y_{FK}}{\overline{d}_{UF}} \right)}_{\text{standard NSUM estimand}} \times \frac{1}{\phi_{F} \delta_{F} \tau_{F}}\), where \(\phi_{F}\) – frame ratio (average connection within \(F\) to connection from \(U\) to \(F\)), \(\delta_{F}\) – degree ratio (average connection from \(K\) to \(F\) to connection within \(F\)), and \(\tau_{F}\) – true positive ratio (in-reports from \(F\) to \(K\) to edges connecting \(F\) and \(K\))
  • Appendix G to the paper also has very useful ideas on network structure simulation using block approach, where the chances of link between two individuals depend on their membership in \(F\) and \(K\) (there is no code, but it sounds like something straightforward to do)

Vincent and Thompson (2019)

The Link-Tracing method combines ideas from RDS and capture re-capture estimation to approximate population size.Applying a multi-wave snowball sampling frame over the population, it constructs a Gibbs sampling estimator of population size given an observed network structure. The central idea behind the construction of this estimator is that links between units sampled in the waves up to but not including the last wave and all other units can be regarded as iid Bernoulli-type outcomes. The occurrences of network links thus represent binomial type experiments.

Population and Assumptions

  • population is represented through a stochastic block model where all units in the population \(U = (1,2,...,N)\) are assigned to one of G strata (known if unit is observed and latent otherwise) with probability \(\underline{\lambda} = (\lambda_1, \lambda_2,...,\lambda_G)\) where \(C_i\) is the stratum to which unit \(i\) belongs.
  • strata may be considered groupings or factors that affect (explain or predict) the probability of links arising between units.
  • all links between units are reciprocated, yielding a symmetric adjacency matrix \(Y\) of the population, where for all \(i,j = 1,2,...,N,Y_{ij} = Y_{ji} = 1\) if a link is present between units \(i\) and \(j\) and 0 otherwise.
  • conditional on stratum membership, links occur independently between all pairs of units, such that for any \(i,j = 1,2,...,N\) if \(i \neq j\) then \(P(Y_{ij} = 1|\underline{C} = P(Y_{ij} = 1|C_i,C_j) = \beta_{C_i,C_j}\) and if \(i = j\) then \(P(Y_{ii} = 1) = 0\).

Sampling

  • sampling commences with the selection of an initial sample/wave \(S_0\) from which all nominated links are traced.
  • nominations outside of the initial sample comprise the first wave \(S_1\).
  • this sampling procedure is continued until \(W\) waves are reached (units added in wave \(w\) are thus \(S_w\)).
  • \(S = \cup_{w = 0}^{W}S_w\) is thus the final sample, and \(S \setminus S_w = \cup_{w = 0}^{W-1}S_w\) is the final sample minus the units selected in the last wave.\(\overline{S} = U \setminus S\) denotes the set of units not included in the final sample.

Required Data

The following data needs to be collected for estimation:

  • stratum assignment of each unit
  • sampling wave assignment of each unit
  • adjacency matrix

We denote the observed data \(d_{I_{gn}}\)

Estimation

Given that \(n = n_0 + n_1 + ... + n_w\) and the population can be partitioned into \(W + 2\) sets (\(S_0,S_1,...,S_w\) and \(\overline{S}\) (i.e. initial sample, all waves, non sampled units)) of sizes \(n_0,n_1,...,n_w\) and \(N-n\) in \({N \choose n_0,n_1,...,n_w,N-n}\) ways, the following likelihood function for the parameters of interest can be written:

\(\displaystyle L(N,\underline{\lambda},\underline{\beta}|d_{I_{gn}},|S_0|)\) \(\displaystyle = {N \choose n_0,n_1,...,n_w,N-n} \times \prod_{i=1}^{n-n_w}\lambda_{C_i} \times \prod_{i,j = 1,2,...,n-n_w:i<j}\left(\beta_{C_i,C_j}^{Y_{ij}}(1-\beta_{C_i,C_j})^{(1-Y_{ij})}\right) \times\) \(\displaystyle \left[\prod_{j = n-n_w+1}^n\left(\lambda_{C_j} \prod_{i = 1}^{n - n_w}\left( \beta_{C_i,C_j}^{Y_{ij}}(1-\beta_{C_i,C_j})^{(1-Y_{ij})}\right) \right)\right] \times \left[\sum_{k = 1}^G \left(\lambda_k \prod_{i = 1}^{n-n_w}(1 - \beta_{C_i,k})\right)\right]^{N-n}\)

Where:

  • the first component corresponds to reorderings of the data
  • the second component corresponds to the stratum assignment of the units within the fist \(W-1\) waves of the reordered data
  • the third component corresponds to the occurrence of links within the fist \(W-1\) waves of the reordered data
  • the fourth component corresponds to the stratum assignment of the units within wave \(W\) and the links between units in wave \(W\) and the first \(W-1\) waves of the reordered data
  • the fifth component corresponds to the unobserved stratum memberships of units outside the final sample and the observed absence of links between units inside and outside of the final sample.

As the given likelihood corresponds to a multivariate distribution, Link-tracing employs a Gibbs sampling procedure for parameter estimation. That is, it iteratively samples from each parameters conditional distribution. Estimation thus proceeds as follows:

  1. initialize values for \(N,\underline{\lambda}\) and \(\underline{\beta}\)
  2. draw from \(P(N|d_{I_{ng}},\underline{\lambda},\underline{\beta})\) given the values of \(\underline{\lambda}\) and \(\underline{\beta}\) from the previous step
  3. draw from \(P(\underline{\lambda}|N,\underline{\beta})\) given the values of \(N\) and \(\underline{\beta}\) from the previous steps
  4. draw from \(P(\underline{\beta}|N,\underline{\lambda})\) given the values of \(N\) and \(\underline{\lambda}\) from the previous steps
  5. repeat steps 2-4
Population Size:
  • Prior distribution of \(N\) defined as \(\pi(N) \propto N^{-a}, a = 0,1,2,...\)

  • \(P(N|d_{I_{gn}},\underline{\lambda},\underline{\beta}) = \frac{P(d_{I_{gn}}|N,\underline{\lambda},\underline{\beta})\pi(N)}{P(d_{I_{gn}}|\underline{\lambda},\underline{\beta})} \propto \frac{{N - n_0 \choose n_1,...,n_w,N-n}(1-P)^{N - n} \frac{1}{N^a}}{\sum_{N' \geq n} \left({N' - n_0 \choose n_1,...,n_w,N'-n}(1-p)^{N'-n \frac{1}{N'^a}}\right)}\) \(\propto {N - n_0 \choose n_1,...,n_w,N-n}(1-p)^{N-n \frac{1}{N^a}}\)

  • where \(1-p = \sum_{k=1}^{G}\left(\lambda_k \prod_{i=1}^{n - n_w}(1 - \beta_{C_i,k})\right)\)

Stratum Memberships:
  • Prior distribution of \(\underline{\lambda}\) defined as \(\pi(\underline{\lambda}) \propto \prod_{k = 1}^{G}\lambda_{k}^{a_k-1}\)

  • Given the observed data Link-tracing imputes the missing stratum assignments by sampling from the following distribution:

  • \(P(C_i = k|S,\underline{C}_{S \setminus S_W}, Y_{S \setminus S_W,i}) = \frac{\lambda_k \prod_{j=1}^{n-n_W}(1 - \beta_{C_j,k})}{\sum_{l = 1}^{G}\left(\lambda_l \prod_{j=1}^{n-n_W}(1-\beta_{C_j,l})\right)}\)

  • New \(\underline{\lambda}\) values are then drawn from \(Dirichlet(N_1 + 1,...,N_G + 1)\)

  • Prior distribution of of \(\underline{\beta}\) defined as \(\pi(\underline{\beta}) \propto \prod_{k,l = 1: k \leq l}^{G}\left(\beta_{k,l}^{\gamma_1 - 1}(1- \beta_{k,l})^{\gamma_2-1}\right)\)

  • The imputed stratum assignments and population size from the previous steps update the observed data to a hypothetical full graph realization (knowledge about stratum assignment for each unit) \(d_1 = \{S,\underline{C},Y_{S \setminus S_W,U}\}\) where \(U\) is the population size estimate. For any \(i,j \in S_W \cup \overline{S}\) where \(i \ne j\) links arise independently via \(P(Y_{ij} = 1|d_1) = \beta_{C_i,C_j}\) and can be imputed through sampling from this. This process yields a hypothetical full graph realization (knowledge about stratum membership and links for all units) \(d = \{\underline{C},Y\}\).

  • New \(\underline{\beta}\) values for between stratum links \(\beta_{k,l}\) for \(k,l = 1,2,...,G,k \ne l\) are drawn from \(Beta(M_{k,l} + 1, N_k N_l - M_{k,l} + 1)\).

  • New \(\underline{\beta}\) values for within stratum links \(\beta_{k,k}\) for \(k = 1,2,...,G\) are drawn from \(Beta(M_{k,k} + 1,{N_k \choose 2} - M_{k,k} + 1)\).

Andersson, Håkan, and Tom Britton. 2000. “Stochastic Epidemics in Dynamic Populations: Quasi-Stationarity and Extinction.” Journal of Mathematical Biology 41 (6): 559–80. https://doi.org/10.1007/s002850000060.
Berchenko, Yakir, Jonathan D. Rosenblatt, and Simon D. W. Frost. 2017. “Modeling and Analyzing Respondent-Driven Sampling as a Counting Process.” Biometrics 73 (4): 1189–98. https://doi.org/10.1111/biom.12678.
Crawford, Forrest W., Jiacheng Wu, and Robert Heimer. 2018. “Hidden Population Size Estimation from Respondent-Driven Sampling: A Network Approach.” Journal of the American Statistical Association 113 (522): 755–66. https://doi.org/10.1080/01621459.2017.1285775.
Feehan, Dennis M., and Matthew J. Salganik. 2016. “Generalizing the Network Scale-up Method: A New Estimator for the Size of Hidden Populations.” Sociological Methodology 46 (1): 153–86. https://doi.org/10.1177/0081175016665425.
Handcock, Mark S., Krista J. Gile, and Corinne M. Mar. 2014. “Estimating Hidden Population Size Using Respondent-Driven Sampling Data.” Electronic Journal of Statistics 8 (1): 1491–521. https://doi.org/10.1214/14-EJS923.
Johnston, Lisa G., Katherine R. McLaughlin, Shada A. Rouhani, and Susan A. Bartels. 2017. “Measuring a Hidden Population: A Novel Technique to Estimate the Population Size of Women with Sexual Violence-Related Pregnancies in South Kivu Province, Democratic Republic of Congo.” Journal of Epidemiology and Global Health 7 (1): 45–53. https://doi.org/10.1016/j.jegh.2016.08.003.
Karon, John M, and Cyprian Wejnert. 2012. “Statistical Methods for the Analysis of Time–Location Sampling Data.” Journal of Urban Health 89 (3): 565–86. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3368048&tool=pmcentrez&rendertype=abstract.
Killworth, Peter D., Eugene C. Johnsen, Christopher McCarty, Gene Ann Shelley, and H. Russell Bernard. 1998. “A Social Network Approach to Estimating Seroprevalence in the United States.” Social Networks 20 (1): 23–50. https://doi.org/10.1016/S0378-8733(96)00305-X.
MacKellar, Duncan, Linda Valleroy, John Karon, George Lemp, and Robert Janssen. 1996. “The Young Men’s Survey: Methods for Estimating HIV Seroprevalence and Risk Factors Among Young Men Who Have Sex with Men.” Public Health Reports 111 (Suppl 1): 138. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1382056&tool=pmcentrez&rendertype=abstract.
Maltiel, Rachael, Adrian E. Raftery, Tyler H. McCormick, and Aaron J. Baraff. 2015. “Estimating Population Size Using the Network Scale-up Method.” The Annals of Applied Statistics 9 (3): 1247–77. https://www.jstor.org/stable/43826420.
Muhib, F. B., L. S. Lin, A. Stueve, R. L. Miller, W. L. Ford, W. D. Johnson, P. J. Smith, and Community Intervention Trial for Youth Study Team. 2001. “A Venue-Based Method for Sampling Hard-to-Reach Populations.” Public Health Rep, 216–22. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1913675&tool=pmcentrez&rendertype=abstract.
Raymond, H Fisher, Theresa Ick, Michael Grasso, Jason Vaudrey, and Willi McFarland. 2007. “Resource Guide: Time Location Sampling (TLS).” San Francisco: San Francisco Department of Public Health HIV Epidemiology Section, Behavioral Surveillance Unit. http://globalhealthsciences.ucsf.edu/sites/default/files/content/pphg/surveillance/modules/global-trainings/tls-res-guide-2nd-edition.pdf.
Salganik, Matthew J., Dimitri Fazito, Neilane Bertoni, Alexandre H. Abdo, Maeve B. Mello, and Francisco I. Bastos. 2011. “Assessing Network Scale-up Estimates for Groups Most at Risk of HIV/AIDS: Evidence from a Multiple-Method Study of Heavy Drug Users in Curitiba, Brazil.” American Journal of Epidemiology 174 (10): 11901196.
Vincent, Kyle, and Steve Thompson. 2019. “Estimating the Size and Distribution of Networked Populations with Snowball Sampling.” https://arxiv.org/abs/1402.4372.