Data structures based on kmers for querying large collections of sequencing data sets

Genome Res Marchet C, Boucher C, Puglisi SJ, Medvedev P, Salson M, Chikhi R

High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.

More info at: http://www.genome.org/cgi/doi/10.1101/gr.260604.119

Differential stress responsiveness determines intraspecies virulence heterogeneity and host adaptation in Listeria monocytogenes

Nature Microbiology Lukas Hafner, Enzo Gadin, Lei Huang, Arthur Frouin, Fabien Laporte, Charlotte Gaultier, Afonso Vieira, Claire Maudet,...

Assessing the effect of model specificationand prior sensitivity on Bayesian tests oftemporal signal

PLOS COMPUTATIONAL BIOLOGY John H. Tay, Arthur Kocher, Sebastian Duchene* Abstract Our understanding of the evolution of many microbes...

Expanding the diversity of origin of transfer-containing sequences inmobilizable plasmids

Nature Microbiology Manuel Ares-Arroyo, Amandine Nucci & Eduardo P. C. Rocha Abstract Conjugative plasmids are important drivers of...

Data structures based on kmers for querying large collections of sequencing data sets

Recent Posts

Comments