ResearchGate: How might this project prove useful to researchers in the future?
David Paez Espino: With this paper, we are opening for the first time a wide door to the study of viruses. Until now we were looking at less than 5,000 viruses, and there were no large-scale multihabitat studies of viral discovery. With this work, we have increased the number of viral sequences by 50 times, and through that we identified 99 percent more viral diversity compared to what was known before. This provides an enormous amount of new data that will be studied in more detail in the years to come. We have more than doubled the number of microbial phyla that serve as hosts to viruses, and have created the first global viral distribution map. The amount of analysis and discoveries that we anticipate will follow this dataset cannot be overstated.
RG: What were your goals when you started the project?
Paez Espino: This work had several aims. First, we wanted to overcome the limitations of the current narrow and biased collection of isolate viruses. We also sought to shed light on environmental viral taxonomic diversity and to fill in some of the existing gaps in our understanding of host-virus interactions and host range specificity. Finally, we wanted to explore patterns of biogeographic distribution of the predicted metagenomic viruses.
RG: Why hasn’t this been done before?
Paez Espino: This is the first time that anyone has looked systematically across all habitats and across such a large compendium of data: more than 3,000 different samples. A lot of those viruses have been missed before, because this is the first time that a systematic and very sensitive approach has been performed.
RG: How did you obtain samples to analyze?
Paez Espino: All the samples were obtained from the public IMG/M system, a computational platform developed by some of the co-authors that supports comparative analysis of microbial community aggregate genomes, or metagenomes, in the context reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments.
RG: Did anything you learned surprise you?
Paez Espino: We were surprised with several findings, for example the detection of the largest phage reported to date (~600 Mb), the remarkable habitat specificity for the vast majority of the viral sequences, the presence of the same viruses shared across many different individuals, and the number of viral sequences able to infect microbial phyla previously unknown to be infected by viruses… Read More>>