8/31 1:00 - 4:00, Arrillaga Alumni Center, McCaw Hall

ProtozoaDB: Towards a Knowledgebase Approach for a Systems Biology Database

Alberto M. R. Dávila1, Rodrigo Jardim1, Milene P. Guimarães1,2, Daniel Loureiro1,2, Sérgio M. S. Cruz2, Pablo N. Mendes3, Glauber Wagner4,5, Diogo A. Tschoeke1, Rafael R. Cuadrat1, Kary A. Ocaña1, M. Ruiz-Olazar1,2, Floriano P. Silva Jr1, Christian M. Probst1, Edmundo C. Grisard4, Maria C. Cavalcanti6, Maria L. Campos2 and Marta Mattoso2

1.FIOCRUZ. 2.UFRJ. 3.Wright State University University. 4.UFSC. 5.UNOESC. 6.IME.

ProtozoaDB ( is being developed to host both genomics and post-genomics data from Protozoan species (Kinetoplastida and Apicomplexa). The 5 protozoa are represented with added-value obtained from similarity- and phylogeny-based analyses. ProtozoaDB also contains a collection of ESTs from different life cycle stages of the distinct species. ProtozoaDB is using the Genomics Unified Schema on top of PostGRES open-source relational database system. This database complements related databases by providing further analyses with emphasis on (a) distant similarities (HMM-based), (b) phylogeny-based annotations including orthology analysis, and (c) druggability. ProtozoaDB is being progressively linked to several important databases as PDB and KEGG, focusing in performing a multi-source dynamic combination of 3D-structure, metabolic pathways, evolutionary and literature information through advanced interoperable Web tools such as Web services.

ProtozoaDB has a modern Web-based interface for user-friendly data visualization and exploration, in addition to providing adding/linking other pathogenic species databases. ProtozoaDB database can be queried on genes related to metacyclogenesis and to compare genes expressed in different parasite species or stages. For example, it offers a variety of query-based search tools to explore genes among the 5-protozoan genomes through: (i) keyword, (ii) gene ID, (iii) product, (iv) protein motifs and (v) sequence type (coding sequences, mRNA, rRNA, tRNA, snRNA, snoRNA, transcript primary, precursor RNA and untranslated sequences) searches. Also, protozoan sequences can be visualized separately or be compared to each other. Individual chromosomes can be explored using Gbrowse. In addition to queries, Web services help third-party software to retrieve and use data from ProtozoaDB in automated pipelines (workflows) or other interoperable Web technologies, promoting better information reuse and integration. Functionalities for multiple ontologies manipulation are currently being added, in order to support annotation based on some of the OBO ontologies related to the projects domain. ProtozoaDB has been successfully accessed by two production workflows, OrthoMCL and OrthoSearch, which were specified using a scientific workflow management system (SWfMS). This SWfMS is gradually being interconnected with the ProtozaDB environment, allowing for provenance metadata to be captured and stored in the database. It is also expected that ProtozoaDB will catalyze the development of local and regional bioinformatics capabilities (research and training), and therefore promote/enhance scientific advancement in developing countries.

Financial support FAPERJ, CNPq, FINEP, FIOCRUZ, IAEA.