Can you give more details about the probabilistic species identification algorithm you use and are your methods public?
Our identification pipeline uses a published probabilistic algorithm. The algorithm accounts for gaps in the sequence reference databases by comparing the content of that database with expected taxonomic diversity, allowing the probability of a query sequence arising from an unreferenced species to be calculated correctly. This reduces the likelihood of overconfident assignments to species level due to database gaps. As the taxonomy is a key input to the algorithm, the probability of assignment can be estimated at each level and an acceptance threshold can be applied to ensure only high confidence assignments are retained.
Share Now, Choose Your Platform!