the prevalence package
tools for prevalence assessment studies.

## Estimating true prevalence based on a single test

To assess the importance of pork as a carrier of zoonotic agents, we performed a cross-sectional study in the Kathmandu Valley of Nepal, in which we serologically determined the infection status of slaughtered pigs with regard to Taenia solium and other parasites transmitted through pork consumption.

Serum samples were tested for the presence of circulating Taenia spp. cysticercal ES antigens, using the B158/B60 Ag-ELISA. This test shows a good performance for the detection of viable Taenia spp. cysticerci, but is not able to differentiate the different porcine Taenia species, i.e., T. solium, T. hydatigena, and T. asiatica.

### Data

From 2007 to 2010, 742 pigs were sampled at slaughter, of which 142 tested positive on Ag-ELISA.

Because no confirmation test was available to assess the true infection status, an estimate of the true prevalence was obtained by combining the apparent prevalence with prior information on the test characteristics in a Bayesian model. Two cases were considered, i.e., the performance of the test for detecting viable Taenia spp. cysticerci, and the performance for detecting T. solium cysticerci. In the first case, test sensitivity was modeled as a Uniform(0.60, 1.00) distribution, and test specificity as a Uniform(0.90, 1.00) distribution. In the second case, the specificity was modeled as a Uniform distribution ranging from 75% to 100%, reflecting the potentially high proportion of positive test results due to infection with T. hydatigena or T. asiatica, which are both known to occur in Nepal.

### Apparent prevalence

We start our analysis by calculating the apparent prevalence (i.e., not corrected for imperfect test characteristics). We can use the propCI() function from the prevalence package to obtain confidence intervals for the apparent prevalence:

propCI(142, 742)
#>     x   n         p        method level     lower     upper
#> 1 142 742 0.1913747 agresti.coull  0.95 0.1646432 0.2212853
#> 2 142 742 0.1913747         exact  0.95 0.1636684 0.2215588
#> 3 142 742 0.1913747      jeffreys  0.95 0.1643017 0.2208498
#> 4 142 742 0.1913747          wald  0.95 0.1630697 0.2196796
#> 5 142 742 0.1913747        wilson  0.95 0.1646876 0.2212409

Different methods exist for estimating the confidence interval of a proportion. Function propCI() provides the five most common methods. I typically report the exact binomial confidence intervals, because this method never leads to lower bounds < 0 or upper bounds > 1.

### True prevalence (scenario 1)

We will now estimate the true (or “informed”) prevalence, taking into account assumptions on the test sensitivity and specificity. In a first scenario, we will model test sensitivity as a Uniform(0.60, 1.00) distribution, and test specificity as a Uniform(0.90, 1.00) distribution:

cysti <-
truePrev(x = 142, n = 742,
SE = ~dunif(0.60, 1.00),
SP = ~dunif(0.90, 1.00))

print(cysti)
#>     mean median  mode    sd  2.5% 97.5%
#> TP 0.195  0.192 0.190 0.050 0.109 0.299
#> SE 0.785  0.777 0.631 0.116 0.607 0.988
#> SP 0.948  0.948 0.917 0.029 0.902 0.997
#>
#> Multivariate BGR statistic = 1.0007
#> BGR values substantially above 1 indicate lack of convergence

The true prevalence is estimated as 19%, close to the apparent prevalence, but this is mere coincidence. The 95% uncertainty interval on the true prevalence ranges from 11% to 30%.

By default, the truePrev function uses 2 chains of 20,000 samples, of which the first 10,000 are discarded as burnin. It is important to assess whether or not the chains have converged. The Multivariate Brooks-Gelman-Rubin statistic gives an indication that this was indeed the case.

Further insights can be gained from looking at diagnostic plots:

par(mfrow = c(2, 2))
plot(cysti)

The density plot does not show any signs of multimodality; the trace plot shows proper mixing of the two chains.

### True prevalence (scenario 2)

In a second scenario, we will again model test sensitivity as a Uniform(0.60, 1.00) distribution, but now model test specificity as a Uniform(0.75, 1.00) distribution, taking into account the possible cross-reactions:

cysti2 <-
truePrev(x = 142, n = 742,
SE = ~dunif(0.60, 1.00),
SP = ~dunif(0.75, 1.00))  # note the larger uncertainty

print(cysti2)
#>     mean median  mode    sd  2.5% 97.5%
#> TP 0.139  0.141 0.168 0.079 0.007 0.288
#> SE 0.778  0.768 0.639 0.115 0.607 0.986
#> SP 0.903  0.904 0.919 0.057 0.804 0.995
#>
#> Multivariate BGR statistic = 1.0009
#> BGR values substantially above 1 indicate lack of convergence

The true prevalence is now estimated as 14%, lower than in the first scenario. This is logical, as we accounted for a higher possibility of false positives. Because of the additional uncertainty, the 95% uncertainty interval on the true prevalence estimate has become wider, i.e., 1% to 29%.

The same diagnostic plots:

par(mfrow = c(2, 2))
plot(cysti2)

Again, this indicates proper mixing of the two chains. Both scenarios could thus be equally valid. Indeed, the DIC values of both models do not differ substantially:

cysti@diagnostics$DIC #> Mean deviance: 7.566 #> penalty 0.995 #> Penalized deviance: 8.561 cysti2@diagnostics$DIC
#> Mean deviance:  7.596
#> penalty 1.004
#> Penalized deviance: 8.6

### R session

#> R version 3.2.3 (2015-12-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7601) Service Pack 1
#>
#> locale:
#> [1] LC_COLLATE=Dutch_Belgium.1252  LC_CTYPE=Dutch_Belgium.1252
#> [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
#> [5] LC_TIME=Dutch_Belgium.1252
#>
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base
#>
#> other attached packages:
#> [1] prevalence_0.4.0 rjags_3-13       coda_0.16-1      lattice_0.20-33
#>
#> loaded via a namespace (and not attached):
#>  [1] formatR_0.10    tools_3.2.3     htmltools_0.2.6 yaml_2.1.13
#>  [5] bd_0.0.7        rmarkdown_0.9.2 grid_3.2.3      knitr_1.12.3
#>  [9] stringr_0.6.2   digest_0.6.3    evaluate_0.8