Andrew J. Schaumberg

Project: DOE JGI MGS

In 2013 I designed and implemented the Joint Genome Institute's (JGI) internal "Microbial GenBank Submissions" (MGS) system. MGS is a web-based tool for cross-referencing genomes across JGI databases, submitting genomic records to GenBank, retracting records, tracking progress, and collecting statistics. At the database level, MGS is closely tied to my backend pipeline: all prokaryotic genome submissions were (and continue today to be) prepared through my quality control and annotation pipeline, so I am co-author (example here) on many thousands of GenBank genomic records (GenBank's tools only show the most recent 100-200k records). MGS also integrates with several other JGI tools, e.g. IMG and GOLD. Below I describe the 2014 state of my MGS project.


MGS's home page graphically depicts JGI's prokaryotic genome submission pipeline, from an extramural Principal Investigator's (PI) research proposal to sequence one or more genomes (red arrow at left), through JGI processing (green arrows at center), to submission ultimately to NCBI GenBank as genomic records (at right).


For ease of use, MGS is succinctly self-descriptive, with a one-line explanation (at top) to direct JGI researchers and staff to click tabs or pink targets to interact with MGS. Clicking a target brings up the associated web page (at insets).


MGS automates a number of workflows related to and including the submission of JGI-sequenced genomes to NCBI GenBank. The submission tab lists submissions, cross references each submission to identifiers across a number of JGI databases, includes search functions for various IDs, and links to further submission-related actions e.g. reprepare, resubmit, and retract.


One submission-related workflow that MGS automates is retraction, where a genomic record is withdrawn from NCBI GenBank. Various identifiers and contact information are listed for retractions.


Measuring production continuously is important for a user-facility such as DOE JGI. MGS computes a matrix of how many genomes are in various stages of curation (x axis, columns, e.g. Unknown, Standard Draft, ..., Finished) and submission (y axis, rows, e.g. Deleted, Unprepared, ..., Retracted).

Copyright (C) Andrew Schaumberg 2022-2024. All rights reserved.