Help on how to use VIRGO
This web page documents how a biologist can use
VIRGO. The biologist uploads a gene expression
data set of interest to her on the
VIRGO
upload page. VIRGO invokes the GAIN
system (
publication
describing GAIN,
GAIN
software package) to predict novel functions for genes assayed in
the gene expression data set. When GAIN completes, VIRGO adds the
predictions to its database and allows the biologist to
query and browse the predictions. In
more detail, the steps are the following:
-
Store your gene expression data in a text file. Please ensure
that the gene expression data set is tab-delimited and the first
column contains an identifier for a gene. All other columns
should contain expression values. The first row should contain
the names of the samples.
For S. cerevisiae, please ensure that a gene identifier is an
ORF name (e.g., YFL039C). For H. sapiens, please ensure that a
gene identifier is an Entrez-Gene identifier.
If you simply want to try out VIRGO or want to see the format of
such a data set, please take a look at one of the gene expression data sets we have collected.
- Visit the VIRGO
upload page. All fields on this page marked with a red
star are required.
- Select the correct species. If you use the wrong species,
VIRGO will not be able to make any predictions since it will
use incorrect information for molecular interactions and GO
functional annotations.
- Select your gene expression file by browsing your file
system.
- Enter a short description of the experiment you performed.
This information is useful when VIRGO presents the results of
queries you make on the predictions.
- Enter your email address. VIRGO needs a valid email
address to communicate with you.
- If this data has been published, please enter the PubMed
ID (PMID) of the paper that published the data. In the query
results, VIRGO will hyperlink the description of the
experiment to the PubMed entry for the paper.
- Select whether you want to keep the predictions
based on your dataset private or public. VIRGO's
default policy is keep the predictions computed from your
dataset private.
- Finally, press the "Analyze" button to upload your dataset.
After you upload the dataset, VIRGO will give you a
randomly-generated key. You
need this key to query VIRGO for the predictions computed from
your data set.
If you desire, periodically visit the status web page,
and enter your key to check the status of your dataset.
After VIRGO finishes
processing your data, it will insert predictions into its
database. VIRGO will send you an email message to inform that when
it has completed making predictions. the email message will
contain the key. You can query your predictions as soon as this
step completes.
At this stage, if you enter your key on the status page, you have the option of
performing leave-one-out cross validation on your dataset. If you
choose this option, VIRGO will email you when it completes cross
validation. However, you can query your predictions while VIRGO is
performing cross validation. Precision and recall results will
automatically appear in the query results when VIRGO completes
cross validation.
To query your predictions, visit the page for
searching predictions. On this web page, select "Predictions"
under search type. Enter your key. You need not select a species
if you provide the key. However, if you do select a species,
please make sure you select the right species! In addition, you
can restrict your search by gene name, function name, or
function id (in the Gene Ontology). You can also search for
predictions with estimated confidence greater than a threshold.
You can also specify lower bounds on precision and/or recall;
VIRGO will return predictions only for those GO functions for
which GAIN achieves cross validation performance satisfying your
constraints.
Browse the search results. We hope the predictions are useful
and suggest further experiments you can perform and analyse
using VIRGO. If you find VIRGO useful, please let us know.
Potentially Asked Questions
- What is VIRGO's privacy policy?
Do no evil. Sorry, that is another organisation's policy. While
VIRGO does not have any evil intentions, our default policy is
that predictions stored in VIRGO based on a biologist's dataset
are available only to that biologist (upon entering the right
VIRGO key). On the data upload page, the biologist has the
option of declaring that all computed predictions based on the
uploaded gene expression dataset are public.
- Another user may discover my VIRGO key! Why don't you
set up a login-based authentication system?
Since VIRGO keys are long and we generate them at random, the
probability that one user can accidentally access private
predictions generated for another user is small.
Our goal is to ensure that a user can use VIRGO without the
hassle of having to register and log in. We believe the current
design allows users to keep their data private. We are open to
changing this design based on feedback we receive from users.
- I have changed my mind. How do I delete the
predictions based on one of my datasets or change the privacy
status of the predictions?
Just send us email with the VIRGO key and the email address you
entered when you uploaded the dataset. We need the original
email address to verify that you are indeed the person who
uploaded the dataset (just in case you are sending email from
another address). If you ask us to delete the predictions, we
are unable to guarantee that we will delete the predictions from
any backups that may be stored on our servers. If the
authorities ask us for copies of these backups ... sorry, we are
again sounding like someone else.
- Where can I read about the GAIN
algorithm?
Our approach is described in detail in Whole
Genome Annotation using Evidence Integration in Functional
Linkage networks, Ulas Karaoz, T. M. Murali, Stan Letovsky,
Yu Zheng, Chunming Ding, Charles R. Cantor, and Simon Kasif,
Proceedings of the National Academy of Sciences, vol 101,
pp.2888--2893, 2004. We will add pointers to papers describing
some recent improvements to GAIN as soon as the papers are
published.
- Where can I obtain the GAIN software?
The software
package implementing the GAIN algorithm, which is the
function prediction engine underlying VIRGO, is available under
the GNU General Public Licence. The current version is 1.6.
- What format should my gene expression dataset be
in?
VIRGO can analyse tab-delimited gene expression data sets.
Please ensure that the first column contains an identifier for a
gene. Your file contain columns entitled "NAME" and "GWEIGHT";
VIRGO will ignore these columns. All other columns should contain
expression values. The first row should contain the names of the
samples. For S. cerevisiae, please ensure that a gene identifier
is an ORF name (e.g., YFL039C). For H. sapiens, please ensure
that a gene identifier is an Entrez-Gene identifier. We have
collected several publicly-avalable gene expression data sets in this format.
- Do you have any test datasets I can try out VIRGO on?
We have collected a number of publicly-available gene
expression data sets. All these files are in the
tab-delimited format described above.
- Will VIRGO support other formats such as MIAME, NCBI
GEO's SOFT format, or
Excel spreadsheets?
We will add support for MIAME-compliant data and data in SOFT
format in the future. We are unlikely to support Excel
spreadsheets since the Excel format is proprietary.
- How long will VIRGO/GAIN take to analyse my
dataset?
The answer depends on the species and size of functional
annotation and molecular interaction datasets. Currently, our
yeast interaction network contains 4711 genes and 13453
interactions. The yeast functional annotations contain 71813
gene-function pairs and annotations for 1710 GO biological processes.
The human interaction network contains 6274 genes and 34087
interactions. The human annotation dataset contains 131832
gene-function pairs and 2645 GO biological processes.
GAIN processes each function in the Gene Ontology
independently. It makes predictions at the rate of
approximately 1.5 functions per second for yeast and roughly
0.6 functions per second for human. The prediction
stage takes about 20-30 minutes for yeast and about an hour
for human. In addition, we automatically lay out propagation
diagrams using the Graphviz package. This step
increases the running time considerably, by as much as a
factor of 10. It is difficult to predict how
long all the layout steps will take since the time depends on
the number of nodes and edges in each propagation diagram.
- I am only interested in a subset of the functions in
GO.
We are working on adding a feature that will allow the user to
select specific functions of interest. This feature has the
potential to considerably speed up VIRGO.
- Can I define my own function?
Not currently. In the future, we will allow the user to upload a
text file containing functions and functional annotations
defined by her in a simple text format.
- When will you support functional predictions for
My favouritespecies?
We anticipate a major revision of VIRGO in June 2006, which
will support a number of other species including A thaliana,
C. elegans, D. melanogaster, and mouse.
-
How can I input my own functional linkage
network?
Once again, we are working on this feature.
- How do I interpret the confidence value VIRGO computes
for each prediction?
We suggest that you treat the confidence value as a relative
measure of how sure VIRGO is that a prediction is correct. In
other words, by VIRGO's estimation, a prediction with a
confidence value of 0.9 is more plausible than a prediction
with a confidence value of 0.5. Evaluating the prediction by
also considering the associated propagation diagram may help you
understand the rationale behind a prediction.
- How do I interpret a propagation diagram?
The propagation diagram above supports the prediction that gene
YNL016W (PUB1) is annotated with the biological process ``RNA
binding'' (GO:0000023). Red rectangles denote genes annotated with
this function. Blue diamonds represent genes annotated with a
different function. Octagons represent genes that either have no
known function or are annotated with a function that is an ancestor
of ``RNA binding.'' Of these, the red octagon is the gene of
interest (YNL016W). Other octagons represent genes that are also
predicted to have this function. Red edges are incident on annotated
nodes and help to visualise the flow of information in this network.
In addition, VIRGO's propagation diagrams display edge weights,
which are computed as described in the supplementary material.
Large edge weights indicate greater belief that the genes
connected by the edge share the same function.
- Propagation diagrams are missing for some of my
predictions?!
We do not lay out a propagation diagram if it contains more than
50 nodes, since the Graphviz software tends to take a long time
and use a lot of memory to lay out large graphs.
- Why are edge weights obscured on the propagation
diagrams?
Unfortunately, we have to live with this situation. VIRGO uses
the excellent Graphviz
software package to lay out propagation diagrams. However,
laying out edge labels correctly is known to be a very hard
problem. See the discussion in the FAQ for Graphviz
under the question "Edge label placement in neato is bad." Any
improvements implemented in Graphviz will automatically make
their way into VIRGO's propagation diagrams.
If there are any questions that you do not see answered, please
contact T. M. Murali (murali AT cs DOT vt DOT edu).
T. M. Murali
Last modified: Sat Mar 25 12:16:41 EST 2006