Use the coupons, the timing of surveys and the structure of recruitment network to approximate the size of the population
Relies on network connectedness, i.e. if the seeds are within a “small world” the method will likely depart significantly from the truth
Do not rely on proportional sampling and likely are cheaper
Basic setup:
Additional assumptions (on top of basic setup above):
The Link-Tracing method combines ideas from RDS and capture re-capture estimation to approximate population size.Applying a multi-wave snowball sampling frame over the population, it constructs a Gibbs sampling estimator of population size given an observed network structure. The central idea behind the construction of this estimator is that links between units sampled in the waves up to but not including the last wave and all other units can be regarded as iid Bernoulli-type outcomes. The occurrences of network links thus represent binomial type experiments.
The following data needs to be collected for estimation:
We denote the observed data \(d_{I_{gn}}\)
Given that \(n = n_0 + n_1 + ... + n_w\) and the population can be partitioned into \(W + 2\) sets (\(S_0,S_1,...,S_w\) and \(\overline{S}\) (i.e. initial sample, all waves, non sampled units)) of sizes \(n_0,n_1,...,n_w\) and \(N-n\) in \({N \choose n_0,n_1,...,n_w,N-n}\) ways, the following likelihood function for the parameters of interest can be written:
\(\displaystyle L(N,\underline{\lambda},\underline{\beta}|d_{I_{gn}},|S_0|)\) \(\displaystyle = {N \choose n_0,n_1,...,n_w,N-n} \times \prod_{i=1}^{n-n_w}\lambda_{C_i} \times \prod_{i,j = 1,2,...,n-n_w:i<j}\left(\beta_{C_i,C_j}^{Y_{ij}}(1-\beta_{C_i,C_j})^{(1-Y_{ij})}\right) \times\) \(\displaystyle \left[\prod_{j = n-n_w+1}^n\left(\lambda_{C_j} \prod_{i = 1}^{n - n_w}\left( \beta_{C_i,C_j}^{Y_{ij}}(1-\beta_{C_i,C_j})^{(1-Y_{ij})}\right) \right)\right] \times \left[\sum_{k = 1}^G \left(\lambda_k \prod_{i = 1}^{n-n_w}(1 - \beta_{C_i,k})\right)\right]^{N-n}\)
Where:
As the given likelihood corresponds to a multivariate distribution, Link-tracing employs a Gibbs sampling procedure for parameter estimation. That is, it iteratively samples from each parameters conditional distribution. Estimation thus proceeds as follows:
Prior distribution of \(N\) defined as \(\pi(N) \propto N^{-a}, a = 0,1,2,...\)
\(P(N|d_{I_{gn}},\underline{\lambda},\underline{\beta}) = \frac{P(d_{I_{gn}}|N,\underline{\lambda},\underline{\beta})\pi(N)}{P(d_{I_{gn}}|\underline{\lambda},\underline{\beta})} \propto \frac{{N - n_0 \choose n_1,...,n_w,N-n}(1-P)^{N - n} \frac{1}{N^a}}{\sum_{N' \geq n} \left({N' - n_0 \choose n_1,...,n_w,N'-n}(1-p)^{N'-n \frac{1}{N'^a}}\right)}\) \(\propto {N - n_0 \choose n_1,...,n_w,N-n}(1-p)^{N-n \frac{1}{N^a}}\)
where \(1-p = \sum_{k=1}^{G}\left(\lambda_k \prod_{i=1}^{n - n_w}(1 - \beta_{C_i,k})\right)\)
Prior distribution of \(\underline{\lambda}\) defined as \(\pi(\underline{\lambda}) \propto \prod_{k = 1}^{G}\lambda_{k}^{a_k-1}\)
Given the observed data Link-tracing imputes the missing stratum assignments by sampling from the following distribution:
\(P(C_i = k|S,\underline{C}_{S \setminus S_W}, Y_{S \setminus S_W,i}) = \frac{\lambda_k \prod_{j=1}^{n-n_W}(1 - \beta_{C_j,k})}{\sum_{l = 1}^{G}\left(\lambda_l \prod_{j=1}^{n-n_W}(1-\beta_{C_j,l})\right)}\)
New \(\underline{\lambda}\) values are then drawn from \(Dirichlet(N_1 + 1,...,N_G + 1)\)
Prior distribution of of \(\underline{\beta}\) defined as \(\pi(\underline{\beta}) \propto \prod_{k,l = 1: k \leq l}^{G}\left(\beta_{k,l}^{\gamma_1 - 1}(1- \beta_{k,l})^{\gamma_2-1}\right)\)
The imputed stratum assignments and population size from the previous steps update the observed data to a hypothetical full graph realization (knowledge about stratum assignment for each unit) \(d_1 = \{S,\underline{C},Y_{S \setminus S_W,U}\}\) where \(U\) is the population size estimate. For any \(i,j \in S_W \cup \overline{S}\) where \(i \ne j\) links arise independently via \(P(Y_{ij} = 1|d_1) = \beta_{C_i,C_j}\) and can be imputed through sampling from this. This process yields a hypothetical full graph realization (knowledge about stratum membership and links for all units) \(d = \{\underline{C},Y\}\).
New \(\underline{\beta}\) values for between stratum links \(\beta_{k,l}\) for \(k,l = 1,2,...,G,k \ne l\) are drawn from \(Beta(M_{k,l} + 1, N_k N_l - M_{k,l} + 1)\).
New \(\underline{\beta}\) values for within stratum links \(\beta_{k,k}\) for \(k = 1,2,...,G\) are drawn from \(Beta(M_{k,k} + 1,{N_k \choose 2} - M_{k,k} + 1)\).