Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How Tophat tell junctions from deletion?

    HI,

    I wrote to Cole for how Tophat tell the junctions (with CIGAR line xxxMxxxNxxxM) from deletions (xxxMxxxDxxxM). I guess they might be busy to answer all emails, so I dig the code myself a bit further and found a related part (I guess):

    -------------- long_spanning_reads.cpp --------------

    /*
    * FIXME, currently just differentiating between a deletion and a
    * reference skip based on length. However, would probably be better
    * to denote the difference explicitly, this would allow the user
    * to supply their own (very large) deletions
    */
    if ((lb->right - lb->left - 1) <= max_deletion_length)
    {
    new_cigar.push_back(CigarOp(DEL, lb->right - lb->left - 1));
    antisense_closure = prev_hit->is_spliced() ? prev_hit->antisense_splice() : curr_hit->antisense_splice();
    }
    else
    {
    new_cigar.push_back(CigarOp(REF_SKIP, lb->right - lb->left - 1));
    antisense_closure = lb->antisense;
    }

    However, when I change the value of max_deletion_length option, the output does not change.

    To reproduce what I said, I've made a small example file here:
    -- test.fa (http://zlab.umassmed.edu/~dongx/temp/test.fa) contais 4 reference sequences for index (which are identical in sequences with different ID, in order to get multiple hit)
    -- test.reads.fa (http://zlab.umassmed.edu/~dongx/temp/test.reads.fa) contains 3 reads (as below), where reads "read00" is directly extracted from the reference sequence (so should be perfectly matched). As you see, read1 and read2 are part of read00 (I marked the common part as upper case and the part deleted from read1/2 are marked as lower case). So, ideally, read1 should have CIGAR line as xxxM9DxxxxM, read2 has xxxM8DxxxM and read00 has xxxM.

    $ cat test.reads.fa
    >read1
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGgAATACTAGAGGCCACTGCAT
    >read2
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGagAATACTAGAGGCCACTGCAT
    >read00
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGtagagaggagAATACTAGAGGCCACTGCAT

    I made bowtie2 index from test.fa, called 'test', then run Tophat (v2.0.4):

    tophat -o test --max-deletion-length 10 --no-convert-bam -p 8 test test.reads.fa; grep read test/accepted_hits.sam

    read2 256 116 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1168 CP:i:1 HI:i:0
    read00 256 116 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1168 CP:i:1 HI:i:0
    read2 0 1168 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1176 CP:i:1 HI:i:1
    read00 256 1168 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1176 CP:i:1 HI:i:1
    read2 256 1176 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1916 CP:i:1 HI:i:2
    read00 256 1176 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1916 CP:i:1 HI:i:2
    read2 256 1916 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 HI:i:3
    read00 0 1916 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 HI:i:3

    As you see, read1 is not in the output. I also change to use different value of --max-deletion-length, but never can get read1 included. I also tried to use "-i 5" for example, but still no read1 there. I've also tested different deletion size in read2, and found only when the size < or = 8, it can be output; otherwise, it will be filtered. Why??

    Hopefully I can get your help on this.

    I also try to compile the source code myself. Everything works fine except an error in the make step. I made a note here:
    A blog about Tips and Tricks for Unix, Perl, R, HTML, Javascript, Google API and mostly Bioinformatics

    Appreciated if you can help by the way.

    Best,
    -sterding

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X