Journal article
Nature Conservation, vol. 35, 2019, pp. 97-116
APA
Click to copy
Wunderlich, R., Lin, Y.-P., Anthony, J., & Petway, J. R. (2019). Two alternative evaluation metrics to replace the true skill statistic in the assessment of species distribution models. Nature Conservation, 35, 97–116. https://doi.org/10.3897/natureconservation.35.33918
Chicago/Turabian
Click to copy
Wunderlich, R., Yu-Pin Lin, Johnathen Anthony, and Joy R. Petway. “Two Alternative Evaluation Metrics to Replace the True Skill Statistic in the Assessment of Species Distribution Models.” Nature Conservation 35 (2019): 97–116.
MLA
Click to copy
Wunderlich, R., et al. “Two Alternative Evaluation Metrics to Replace the True Skill Statistic in the Assessment of Species Distribution Models.” Nature Conservation, vol. 35, 2019, pp. 97–116, doi:10.3897/natureconservation.35.33918.
BibTeX Click to copy
@article{r2019a,
title = {Two alternative evaluation metrics to replace the true skill statistic in the assessment of species distribution models},
year = {2019},
journal = {Nature Conservation},
pages = {97-116},
volume = {35},
doi = {10.3897/natureconservation.35.33918},
author = {Wunderlich, R. and Lin, Yu-Pin and Anthony, Johnathen and Petway, Joy R.}
}
Model evaluation metrics play a critical role in the selection of adequate species distribution models for conservation and for any application of species distribution modelling (SDM) in general. The responses of these metrics to modelling conditions, however, are rarely taken into account. This leads to inadequate model selection, downstream analyses and uniformed decisions. To aid modellers in critically assessing modelling conditions when choosing and interpreting model evaluation metrics, we analysed the responses of the True Skill Statistic (TSS) under a variety of presence-background modelling conditions using purely theoretical scenarios. We then compared these responses with those of two evaluation metrics commonly applied in the field of meteorology which have potential for use in SDM: the Odds Ratio Skill Score (ORSS) and the Symmetric Extremal Dependence Index (SEDI). We demonstrate that (1) large cell number totals in the confusion matrix, which is strongly biased towards ‘true’ absences in presence-background SDM and (2) low prevalence both compromise model evaluation with TSS. This is since (1) TSS fails to differentiate useful from random models at extreme prevalence levels if the confusion matrix cell number total exceeds ~30,000 cells and (2) TSS converges to hit rate (sensitivity) when prevalence is lower than ~2.5%. We conclude that SEDI is optimal for most presence-background SDM initiatives. Further, ORSS may provide a better alternative if absence data are available or if equal error weighting is strictly required.