the **prevalence** package

tools for prevalence assessment studies.

tools for prevalence assessment studies.

This tutorial is based on Devleesschauwer et al. (2013), *Seroprevalence of zoonotic parasites in pigs slaughtered in the Kathmandu Valley of Nepal*.

To assess the importance of pork as a carrier of zoonotic agents, we performed a cross-sectional study in the Kathmandu Valley of Nepal, in which we serologically determined the infection status of slaughtered pigs with regard to *Taenia solium* and other parasites transmitted through pork consumption.

Serum samples were tested for the presence of circulating *Taenia* spp. cysticercal ES antigens, using the B158/B60 Ag-ELISA. This test shows a good performance for the detection of viable *Taenia* spp. cysticerci, but is not able to differentiate the different porcine *Taenia* species, i.e., *T. solium*, *T. hydatigena*, and *T. asiatica*.

From 2007 to 2010, 742 pigs were sampled at slaughter, of which 142 tested positive on Ag-ELISA.

Because no confirmation test was available to assess the true infection status, an estimate of the true prevalence was obtained by combining the apparent prevalence with prior information on the test characteristics in a Bayesian model. Two cases were considered, i.e., the performance of the test for detecting viable *Taenia* spp. cysticerci, and the performance for detecting *T. solium* cysticerci. In the first case, test sensitivity was modeled as a `Uniform(0.60, 1.00)`

distribution, and test specificity as a `Uniform(0.90, 1.00)`

distribution. In the second case, the specificity was modeled as a Uniform distribution ranging from 75% to 100%, reflecting the potentially high proportion of positive test results due to infection with *T. hydatigena* or *T. asiatica*, which are both known to occur in Nepal.

We start our analysis by calculating the apparent prevalence (i.e., not corrected for imperfect test characteristics). We can use the `propCI()`

function from the **prevalence** package to obtain confidence intervals for the apparent prevalence:

```
propCI(142, 742)
#> x n p method level lower upper
#> 1 142 742 0.1913747 agresti.coull 0.95 0.1646432 0.2212853
#> 2 142 742 0.1913747 exact 0.95 0.1636684 0.2215588
#> 3 142 742 0.1913747 jeffreys 0.95 0.1643017 0.2208498
#> 4 142 742 0.1913747 wald 0.95 0.1630697 0.2196796
#> 5 142 742 0.1913747 wilson 0.95 0.1646876 0.2212409
```

Different methods exist for estimating the confidence interval of a proportion. Function `propCI()`

provides the five most common methods. I typically report the exact binomial confidence intervals, because this method never leads to lower bounds < 0 or upper bounds > 1.

We will now estimate the true (or “informed”) prevalence, taking into account assumptions on the test sensitivity and specificity. In a first scenario, we will model test sensitivity as a `Uniform(0.60, 1.00)`

distribution, and test specificity as a `Uniform(0.90, 1.00)`

distribution:

```
cysti <-
truePrev(x = 142, n = 742,
SE = ~dunif(0.60, 1.00),
SP = ~dunif(0.90, 1.00))
print(cysti)
#> mean median mode sd 2.5% 97.5%
#> TP 0.195 0.192 0.190 0.050 0.109 0.299
#> SE 0.785 0.777 0.631 0.116 0.607 0.988
#> SP 0.948 0.948 0.917 0.029 0.902 0.997
#>
#> Multivariate BGR statistic = 1.0007
#> BGR values substantially above 1 indicate lack of convergence
```

The true prevalence is estimated as 19%, close to the apparent prevalence, but this is mere coincidence. The 95% uncertainty interval on the true prevalence ranges from 11% to 30%.

By default, the truePrev function uses 2 chains of 20,000 samples, of which the first 10,000 are discarded as burnin. It is important to assess whether or not the chains have converged. The Multivariate Brooks-Gelman-Rubin statistic gives an indication that this was indeed the case.

Further insights can be gained from looking at diagnostic plots:

```
par(mfrow = c(2, 2))
plot(cysti)
```

The density plot does not show any signs of multimodality; the trace plot shows proper mixing of the two chains.

In a second scenario, we will again model test sensitivity as a `Uniform(0.60, 1.00)`

distribution, but now model test specificity as a `Uniform(0.75, 1.00)`

distribution, taking into account the possible cross-reactions:

```
cysti2 <-
truePrev(x = 142, n = 742,
SE = ~dunif(0.60, 1.00),
SP = ~dunif(0.75, 1.00)) # note the larger uncertainty
print(cysti2)
#> mean median mode sd 2.5% 97.5%
#> TP 0.139 0.141 0.168 0.079 0.007 0.288
#> SE 0.778 0.768 0.639 0.115 0.607 0.986
#> SP 0.903 0.904 0.919 0.057 0.804 0.995
#>
#> Multivariate BGR statistic = 1.0009
#> BGR values substantially above 1 indicate lack of convergence
```

The true prevalence is now estimated as 14%, lower than in the first scenario. This is logical, as we accounted for a higher possibility of false positives. Because of the additional uncertainty, the 95% uncertainty interval on the true prevalence estimate has become wider, i.e., 1% to 29%.

The same diagnostic plots:

```
par(mfrow = c(2, 2))
plot(cysti2)
```

Again, this indicates proper mixing of the two chains. Both scenarios could thus be equally valid. Indeed, the DIC values of both models do not differ substantially:

```
cysti@diagnostics$DIC
#> Mean deviance: 7.566
#> penalty 0.995
#> Penalized deviance: 8.561
cysti2@diagnostics$DIC
#> Mean deviance: 7.596
#> penalty 1.004
#> Penalized deviance: 8.6
```

```
#> R version 3.2.3 (2015-12-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7601) Service Pack 1
#>
#> locale:
#> [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252
#> [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
#> [5] LC_TIME=Dutch_Belgium.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] prevalence_0.4.0 rjags_3-13 coda_0.16-1 lattice_0.20-33
#>
#> loaded via a namespace (and not attached):
#> [1] formatR_0.10 tools_3.2.3 htmltools_0.2.6 yaml_2.1.13
#> [5] bd_0.0.7 rmarkdown_0.9.2 grid_3.2.3 knitr_1.12.3
#> [9] stringr_0.6.2 digest_0.6.3 evaluate_0.8
```