Using the snoscan Web Server
The snoscan server is accessed via the Lowe Lab Webserver Interface
at http://lowelab.ucsc.edu/snoscan/.
User Interface
The snoscan interface consists of four major components:
* Search mode selection
* Query sequence selection
* Target sequence selection
* Configuration of search-mode and options for displaying results
The search mode determines which probabilistic model to use in searches
– each model is based on snoRNA training data from selected species or
phylogenetic groups (i.e., mammals, yeasts, archaea). If no explicit
model for the species of interest is available in the user interface,
specifying either a general model or a model from a related species
generally yields good results. Different search modes can offer varying
speed and sensitivity.
Query sequence selection is used to specify the sequences to be
searched for snoRNAs. Raw or formatted sequence data can be pasted
directly into the query sequence box or can be uploaded from a local
file. Links to sample query
sequence data are also available for demonstration purposes.
snoscan also expects “target sequences” – i.e. sequences that may
base-pair and be modified by the query snoRNA sequence. Preloaded
target sequences may be chosen, including rRNA from human, yeast, and
other model organisms. Alternatively, the user can specify a
custom target RNA sequence. As with the query sequence, a custom target
sequence can be pasted into a box in either raw or formatted form, or
can be uploaded from a file. When using a custom target sequence, by
default, every nucleotide in the sequence is treated as a potential
target. Alternatively, the user can specify a subset of the
target sequence nucleotides by uploading a custom “methylation
file” that indicates which nucleotides to use as target sites.
Sample human and yeast methylation files are
included on the server. When methylation positions are known,
restricting the search space to these known target sites has the
advantage of decreasing search time and the number of false positive
“hits”.
The server also has a set of program-specific search and output-display
options such as limits on the distances between some of the sequence
motifs (e.g. C and D boxes). In addition, the server has an adjustable
cutoff score enabling tradeoffs between scan sensitivity and
specificity. In most cases, the default parameter choices will be
satisfactory and should be selected – especially by new users. However,
more experienced users are able to exert some control over the
program’s results by manipulating these parameters.
Output format
The snoscan output consists of a summary information line for each
predicted C/D box snoRNA sequence, followed by the candidate in FASTA
format. The summary listing for each hit includes:
* Query sequence name and snoRNA start and end positions within the
query sequence
* snoscan overall bit score
* Target sequence name and target methylation position
* Total number of base pairings and mismatches in the guide region
* Whether the guide region is adjacent to the D' box or D box
* The length of the candidate subsequence
* Whether or not a terminal stem was detected
Also included in the display are graphical representations of the
base-pairing in the target-guide region and the secondary structure of
the terminal stem motif. Snoscan scores for known snoRNA sequences for
various species are available on the website for comparison.
Sample (abbreviated) snoscan Output
>> snR24 26.40
(1-87) Cmpl: ySc-25S-Am1447 (U24) 12/0 bp Gs-DpBox:
18 (18) Len: 87 TS
Meth site found: 1447 (U24)
Guide Seq Sc: 11.88 (21.36 -1.12 -7.36 -1.00)
*
Db seq:
5'- AGUAGCAAAUAU -3'
ySc-25S (1444-1456)
||||||||||||
Qry seq: 3'- AGACUUCAUCGUUUAUA
-5' snR24 (29-18)
Terminal stem:+- [C Box]-N- ACU -
5' Stem Sc: 0.84 (3 bp)
|
|||
+---[D Box] - UGAA -
3' Stem Transit Sc: -1.11
>Summary
[ C Box ] -- -- [ Cmpl/
Mism ] X [D'Bx] -- -- [D
Bx] Length
>Meth Am1447 [AUGAUGU]
-- 6 bp -- [ 12 / 0 ] 1
[CAGA] -- 47 bp -- [CUGA] 87 bp
>Sc
26.40 [ 7.48 ] -- -1.59 -- [ 21.36 bits
] [3.94] -2.44
[8.05]
Candidate sequence:
>snR24 26.40
(1-87) Cmpl: ySc-25S-Am1447 Len: 87
TCAAATGATGTAATAACATATTTGCTACTTCAGATGGAACTTTGAGTTCCGAATGAGACA
TACCAATTATCACCAAGATCTCTGATG
The snoscan output consists of a summary text information line for each
predicted C/D box snoRNA (starting with ">>""), followed by other
information, including graphical representations of base-pairing
in the target-guide region and the terminal
stem motif. A sample header line is labelled below:
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
>> My-query 26.40 (11-97) Cmpl: ySc-25S-Am1447 (U24) 12/0 bp Gs-DpBox: 28 (18) Len: 87 TS
Key:
- Query sequence name
- Snoscan overall score (in bits)
- Start and end coordinates of the predicted snoRNA within the
query sequence
- Target sequence name (ySc-25S in this example)
- Target methylation nucleotide and position in target sequence (A
at position 1447 here)
- If the methylation site matches a position in methylation file,
this is the annotation provided for that
methylaiton site (in this case, this site is annotated as known to be
guided by U24). If no methylation file
was provided, or there is no site match in the methylation file, this
will appear as a dash "-".
- Total number of base pairings / mismatches in the guide region
(G-U pairs count as a base pair)
- Whether the guide region is adjacent to the D' box ("Gs-DpBox")
or D box ("Gs-D box")
- Position of the start of the guide region in the snoRNA
candidate (relative to the entire query)
- Position of the start of the guide region in the snoRNA
candidate (relative to beginning of the snoRNA hit)
- The length of the candidate snoRNA
- Whether or not a terminal stem was detected (TS=terminal stem
present, blank=not present)
An example of the middle part of the output for each hit follows
below, and is fairly self-explanatory. Abbreviations: "Db seq" =
query sequence, "Sc" = bit score for that feature of the model, "Meth
site found" means that this position matches a position found in the
methylation file, and "(U24)" is the annotation provided for this site
within the methylation file (same as described above). Also note that
an asterisk (*) appears above the nucleotide predicted to be
methylated by this snoRNA candidate:
Meth site found: 1447 (U24) Guide Seq Sc: 11.88 (21.36 -1.12 -7.36 -1.00)
*
Db seq: 5'- AGUAGCAAAUAU -3' ySc-25S (1444-1456)
||||||||||||
Qry seq: 3'- AGACUUCAUCGUUUAUA -5' My-query (39-28)
Terminal stem: +-[C Box] -N- ACU - 5' Stem Sc: 0.84 (3 bp)
| |||
+---[D Box] - UGAA - 3' Stem Transit Sc: -1.11
The next part of the output is a graphical summary of the same
information above, where
- the first line is the label for each feature
- the second line gives the target
methylation site, the sequences for each box feature, the length of
spaces between features, and the overall length of the snoRNA
candidate
- the third line gives the overall
score, the scores for each box feature, and scores for the distances
between features
>Summary [ C Box ] -- -- [ Cmpl/ Mism ] X [D'Bx] -- -- [D Bx] Length
>Meth Am1447 [AUGAUGU] -- 6 bp -- [ 12 / 0 ] 1 [CAGA] -- 47 bp -- [CUGA] 87 bp
>Sc 26.40 [ 7.48 ] -- -1.59 -- [ 21.36 bits ] [3.94] -2.44 [8.05]
And finally, the candidate sequence is given in FASTA format:
Candidate sequence:
>snR24 26.40 (1-87) Cmpl: ySc-25S-Am1447 Len: 87
TCAAATGATGTAATAACATATTTGCTACTTCAGATGGAACTTTGAGTTCCGAATGAGACA
TACCAATTATCACCAAGATCTCTGATG
Further information
The snoscan algorithm is described in:
Lowe, T.M. & Eddy, S.E. (1999) "A computational screen for
methylation guide snoRNAs in yeast", Science 283:1168-71
Additional information can be found in the documentation to the
stand-alone version of the program available at:
http://lowelab.ucsc.edu/software/snoscan.tar.gz