Seqanswers Leaderboard Ad

**mendl7** · 09-10-2010, 02:05 PM

bumpbumpbump

**nilshomer** · 09-10-2010, 04:47 PM

Originally posted by mendl7 View Post

bumpbumpbump

Please do not bump, your question will be answered in time. Keep searching the forums.

**malachig** · 09-13-2010, 04:20 PM

You should have both a SAM file and a wig file in addition to the junctions.bed you mention ... From the TopHat manual:

TopHat Output
1. accepted_hits.sam. A list of read alignments in SAM format. SAM is a compact short read alignment format that is increasingly being adopted. The formal specification is here.

2. coverage.wig. A UCSC BedGraph wigglegram track, showing the depth of coverage at each position, including the spliced read alignments.

3. junctions.bed. A UCSC BED track of junctions reported by TopHat. Each junction consists of two connected BED blocks, where each block is as long as the maximal overhang of any read spanning the junction. The score is the number of alignments spanning the junction.

Have you tried converting the SAM file to BAM format, indexing it, and then viewing in IGV?

TopHat manual
http://tophat.cbcb.umd.edu/manual.html

IGV
http://www.broadinstitute.org/igv/AlignmentData

**mendl7** · 09-15-2010, 08:23 AM

The issue is that the accepted_hits.sam file do not contain reads that are spliced. I believe these are the reads that are simply mapped by Bowtie in the beginning of the TopHat run to determine regions of expression that TopHat tries to link later in the algorithm by spliced reads. I say this because there are no results in accepted_hits.sam that contain the XO: flag which gives the number of gap openings.

The coverage.wig file does contain results from the spliceds read but only as the depth of coverage or number of reads that mapped to the position.

If I've got something confused I would greatly appreciate the help.

What I am seeking is simply a list of reads that TopHat has called spliced and the reference positions that the read positions were mapped to.

**Uwe Appelt** · 09-15-2010, 09:35 AM

@mendl7:

accepted_hits.sam indeed does contain split-read-alignments. At least for me, but i might miss your point. Does the following deliver want you're looking for?

Code:

fgrep --invert-match "50M" accepted_hits.sam > spliced_hits.sam

(you probably have to adjust the search string from 50M to whatever your read length was - you can check it by "head -n 100 accepted_hits.sam" [see column 6])

For my data this delivers for example:

Code:

HWUSI-EAS618_0001:3:63:1663:315#0       147     1       14788   255     42M140N8M       =       14783   0       CTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGC      =1791>/79<==5'>>9<@/8>6AA=6=A<>@AA8AA?B@B?AB@BBA==        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:11:1021:354#0       147     1       14791   255     39M140N11M      =       14720   0       CGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTC      =>13;?=?>>::AA=A=A?;@?A>BBA@ABAB@AA@BBBABA@A@AB<@B        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:34:352:749#0        147     1       14791   255     39M140N11M      =       14786   0       CGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTC      A;>A4BAABB?B@ABB@BA@@A@AB@BAA?A@AB@BBBB@9BBBB@BBAB        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:113:114:1346#0      163     1       14794   0       36M140N14M      =       14818   0       GTCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATG      B=A@>BB@>B>5@B@A@?A?BAABBAABBA@ABBA===BAA@B@=B?A>6        NM:i:1  XS:A:-  NH:i:6  CC:Z:15 CP:i:102516182
HWUSI-EAS618_0001:3:91:1318:920#0       73      1       14796   0       34M140N16M      *       0       0       CCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGAC      B@@BA@A=AAAAABB@B=BABBAA=A<BABABA<<=BA@=A7>A??BA:7        NM:i:0  XS:A:-  NH:i:6  CC:Z:15 CP:i:102516180
HWUSI-EAS618_0001:3:51:933:611#0        83      1       14809   255     21M140N29M      =       14743   0       CAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAG      CB@9@BBAA@BAC?@BBBBBBBBABBC=CABCCBBABBC@BCBBACBBCB        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:117:1392:1226#0     147     1       14814   255     16M140N34M      =       14764   0       CCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATC      7=<>?==49A9@AA?AAA@AAB?BBB?ABB;B@B;B?A?AB>BA@A=BAA        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:113:114:1346#0      83      1       14818   0       12M140N38M      =       14794   0       TCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCCGAC      ;:)?B<A=>A6;-37A==>?A=;A@A<=??=/65A>A=BCAC<B>A>?BB        NM:i:0  XS:A:-  NH:i:6  CC:Z:15 CP:i:102516158
HWUSI-EAS618_0001:3:100:997:670#0       137     1       15000   0       39M757N11M      *       0       0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      73A?A;=A@?AB=<9=BA<;;@AA?7@AB@BB9B@AA@?BBB@B@9B@@@        NM:i:0  XS:A:-  NH:i:8  CC:Z:12 CP:i:89817
HWUSI-EAS618_0001:3:101:1284:137#0      83      1       15000   255     39M757N11M      =       15101   0       GATCCGACATCAAGTCCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      ###########==55;=>744A107:=;?>AAA@B?>B@B8@BBABCAAB        NM:i:1  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:110:1622:914#0      83      1       15000   255     39M757N11M      =       14920   0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      #:::==>?A;BAA=>=?==5?=?AA?BB@?BB=BA@@A?ACBBAA?B@BB        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:29:668:1845#0       137     1       15000   0       39M757N11M      *       0       0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      709:93>A?@?>=>>>A??;AA@?=?8?@ABAABB>AA>B?@@BBAABAA        NM:i:0  XS:A:-  NH:i:8  CC:Z:12 CP:i:89817
HWUSI-EAS618_0001:3:52:794:1776#0       83      1       15000   1       39M757N11M      =       14743   0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      =;B<BBAABB@BBABBABA;BBBBBBBBBBBBBBBB@>;<ACCC>C;BCB        NM:i:0  XS:A:-  NH:i:4  CC:Z:12 CP:i:89817
HWUSI-EAS618_0001:3:65:849:538#0        73      1       15000   0       39M757N11M      *       0       0       GATCCGACATCAAGTCCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      ?@?7?@@AA+A@BBBB<AB,BB<B<?@A9BA<?BCCBBBC@BBCCBC6@B        NM:i:1  XS:A:-  NH:i:8  CC:Z:12 CP:i:89817
HWUSI-EAS618_0001:3:69:1365:376#0       83      1       15000   255     39M757N11M      =       15085   0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      ###=:4:=;;=;44=56=;49=7;=;?;?=AA?AAA?A=AAB@BABBABB        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:78:894:857#0        73      1       15000   0       39M757N11M      *       0       0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCAGA      BCBBCBBB>BBBA?<BAABBABBB;@:><>=<?<<<B#############        NM:i:2  XS:A:-  NH:i:8  CC:Z:12 CP:i:89817
HWUSI-EAS618_0001:3:79:517:552#0        147     1       15000   255     39M757N11M      =       15096   0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      57@6A<:>??BA=84;A=A6BBBBA=BBB@B@BBBCBB=CBC?BBBBBAB        NM:i:0  XS:A:-  NH:i:1
HWUSI-EAS618_0001:3:97:1036:226#0       73      1       15000   0       39M757N11M      *       0       0       GATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGC      A9@BBBBB@@BBBBBBBB;4;BB4B@B5B@BB<BCCBCACCCBC@CCCBB        NM:i:0  XS:A:-  NH:i:8  CC:Z:12 CP:i:89817

Cheers
Uwe

**mendl7** · 09-18-2010, 10:22 AM

yep, that's exactly what I needed. I had FALSELY assumed that XO flag would be used. Thank you very much for your assistance Uwe.

**mendl7** · 09-20-2010, 10:53 AM

Just a quick update in case this is useful to anyone. I trim my reads to get rid of obviously lower quality 3' bases. Thus, fgrep --invert-match is not a useful filter since my read lengths are variable.
Instead I now use "awk '$6 ~/N/{print $_;}' accepted_hits.sam >spliced-hits.sam" which obviously looks for the presence of "N", inserted bases, in the CIGAR string column of the tophat SAM output.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Obtain Spliced Read Names/Mapping with TopHat??

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News