Gene Ontology (GO): selected portions of the GO ontologies

GO comes from the Gene Ontology consortium, and contains the Directed Acyclic Graphs (DAGs) representing the is_a relationships among nodes for the three GO ontologies:

  • Biological process,
  • Cellular component, and
  • Molecular function.

downloaded during October of 2008.

Schema in CLSD

There are 2 tables for each GO ontology kept within CLSD, although they all emanate from the GO text distribution file, gene_ontology_edit.obo. The first of each pair is the ontology subsumption DAG (based on is_a link data within the distribution file). Its table structure is:

Field nameType
CHILD_IDVARCHAR
CHILD_NAMEVARCHAR
PARENT_IDVARCHAR
PARENT_NAMEVARCHAR

The second file of each pair is a closure on the DAG containing a list of pairs where the first entry is a child of the second. The entries are presented as IDs rather than as names.

These "closure" tables can be used to determine whether an entry is a relative (descendent or ancestor) of any other entry within each DAG, thereby enabling or simplifying some kinds of queries. However, the closures are currently only transitive on the is_a relationship, and not on the part_of relationship.

  • BIOLOGICAL_PROCESS_DAG
  • BIOLOGICAL_PROCESS_CLOSURE

  • CELLULAR_COMPONENT_DAG
  • CELLULAR_COMPONENT_CLOSURE

  • MOLECULAR_FUNCTION_DAG
  • MOLECULAR_FUNCTION_CLOSURE

Suppose you wish to find all the "is_a children" of the nucleus (GO:0005634). You could use a query like:

select * from GO.CELLULAR_COMPONENT_dag where parent_id like '%0005634'
to get a response like:

CHILD_ID (VARCHAR) CHILD_NAME (VARCHAR) PARENT_ID (VARCHAR) PARENT_NAME (VARCHAR)
GO:0031039 macronucleus GO:0005634 nucleus
GO:0031040 micronucleus GO:0005634 nucleus
GO:0043073 germ cell nucleus GO:0005634 nucleus
GO:0043076 megasporocyte nucleus GO:0005634 nucleus
GO:0045120 pronucleus GO:0005634 nucleus
GO:0048353 primary endosperm nucleus GO:0005634 nucleus
GO:0048555 generative cell nucleus GO:0005634 nucleus
GO:0048556 microsporocyte nucleus GO:0005634 nucleus

However, if you wish to find all the cellular locations that are "is_a" descendents of the nucleus (GO:0005634), you could use a query like:

select * from GO.CELLULAR_COMPONENT_CLOSURE where parent_id like '%0005634'
to get a table like

CHILD_ID (VARCHAR) PARENT_ID (VARCHAR)
GO:0001673 GO:0005634
GO:0001674 GO:0005634
GO:0001939 GO:0005634
GO:0001940 GO:0005634
GO:0005634 GO:0005634
GO:0031039 GO:0005634
GO:0031040 GO:0005634
GO:0042585 GO:0005634
GO:0043073 GO:0005634
GO:0043076 GO:0005634
GO:0043078 GO:0005634
GO:0043079 GO:0005634
GO:0043082 GO:0005634
GO:0045120 GO:0005634
GO:0048353 GO:0005634
GO:0048555 GO:0005634
GO:0048556 GO:0005634

Note that there would be many more descendents in this list if the "part_of" transitivity were included in this list.