All published articles of this journal are available on ScienceDirect.
Tool FindCrispr: An Accurate Identification of Crisprs
Abstract
Introduction
The accurate identification of repeats and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) has a profound impact on studying and understanding prokaryotic immune systems.
Methods
A model with feature extraction and scoring is trained, solved made as a tool. The Welch’s t-test is conducted.
Results
The length of the repeater, the copy number of the repeater, the starting position sequence of the repeater, and the repeater sequence as the features. The scoring formula The sequence with overlapping starting points and the highest score among the absolutely repeat sequences is selected as CRISPR, which is used as a tool to find CRISPR. Among 302 archaea, 199 obtained the same results as pilerCR using findCrispr; 86 obtained more CRISPRs than pilerCR. The Welch’s t-test shows that the count of Crisprs recognized by the findCrispr tool is significantly different, with t-stat > 0.
Discussion
The feature extraction is effective. The model performs well, and the tool findCrispr is inclined to find more repeaters. The algorithm is a specialized algorithm that is sensitive to finding CRISPR with a small number of duplicates and has low tolerance for long, scattered repeats.
Conclusion
Features are extracted, and a scoring system is established using the tool findCrisprrealized, which performs superiorly to pilerCR in the identification of CRISPRs with multiple calibration repeaters. The tool findCrispr is of great significance for studying the biological function and mechanism of CRISPR.