Departmental Colloquium

Yiming Bao
National Center for Biotechnology Information
Abstract: The Viral Genome Project ( at the National Center for Biotechnology Information (NCBI) is a collection of completed genome sequences of viruses with the aim to provide molecular standards for viral genomic research. The project has produced over 4,000 records for more than 2,780 different species. For each virus species, one complete genome is selected as the reference sequence and the rest are marked as its neighbors. The reference sequences are manually curated to correct/update annotations of the original sequence records. Analytical tools provide researchers with the ability to analyze and compare viral genomes and proteomes in different scale in a fast and convenient manner. These tools include Global Alignment of Genome Neighbors, Pairwise Sequence Comparison (PASC), gMAP and Protein Clusters for viral genomes. The Viral Genomes Project is a collaborative effort between NCBI staff and many dedicated scientists worldwide. In addition, resources for important viruses such as influenza viruses were created. The NCBI Influenza Virus Resource ( provides a curated database that contains nucleotide, protein and coding region sequences of influenza viruses extracted from GenBank. It also has sequence analysis tools that are integrated with the database, such as multiple sequence alignment, clustering of protein sequences, and influenza genome annotation.
Pairwise sequence comparison (PASC): a web tool for virus classification
Pairwise sequence comparison is a sequence-based virus classification method. It calculates the pairwise identities of virus sequences within a virus family and displays their distributions, and can help determine demarcations at different taxonomic levels such as strain, species, genus and subfamily levels. Although this method cannot be used as the single criterion for virus classification in some cases, it is a quantitative method and has many advantages over conventional virus classification methods. It has been applied to polioviruses, coronaviruses, potyviruses, geminiviruses, flexiviruses, papillomaviruses and poxviruses. There is an increasing interest to use this method for other virus families/groups. The PAirwise Sequence Comparison (PASC) tool was created at National Center for Biotechnology Information (NCBI). The tool's database was established with distributions of identity for complete genomes/segments of about 50 virus families/groups. Data in the system are updated periodically to reflect changes in virus taxonomy and additions of new virus sequences to the public database. The web interface of the tool ( makes it easy to navigate and perform analyses. Up to 25 new viral genome sequences can be tested simultaneously with this system within a few minutes to suggest the taxonomic position of the virus isolates in a specific family. This system eliminates potential discrepancies in the results caused by different algorithms and/or different data used by researchers. The NCBI's PASC analysis result for the family Polyomaviridae has been adopted by the ICTV study group as one of the demarcation criteria for new species in the family.
Friday February 3, 2012 at 3:00 PM in SEO 636
UIC LAS MSCS > seminars >