MicrobesOnline Operon Predictions

MicrobesOnline includes operon predictions for every bacterial and archaeal genome. In any genome, a pair of adjacent genes is predicted to be in the same operon, or not, based on:

  • The distance between them in nucleotides
  • Whether the genes are conserved near each other in other genomes, based on MicrobesOnline Ortholog Groups
  • The correlation of their expression patterns, if gene expression data is available
  • Whether they both belong to a narrow GO category
  • Whether they share a COG functional category

For each genome, we first train a model to distinguish same-strand pairs from opposing-strand pairs, using the comparative genomics features and the expression data, and then use these preliminary results for same-strand pairs to train a genome-specific model of which distances are likely to indicate operons. The final predictions are based on all features and on the proportion of pairs of adjacent genes that are expected to be in the same operon. This proportion is estimated by counting the surplus of adjacent pairs that are on the same strand, or, for draft genomes without large scaffolds, is set to 50%. Exceptions: adjacent CRISPRs or CRISPR spacers are assumed to be co-transcribed but are excluded from the statistical parts of the method.

Accessing the predictions:

Publications: Note: Although which features we use and how we compute them has changed, the statistical method is virtually unchanged from our original publication. Further information for the operon prediction paper:
By Morgan N. Price, Katherine H. Huang, Eric J. Alm, and Adam P. Arkin