Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How Tophat tell junctions from deletion?

    HI,

    I wrote to Cole for how Tophat tell the junctions (with CIGAR line xxxMxxxNxxxM) from deletions (xxxMxxxDxxxM). I guess they might be busy to answer all emails, so I dig the code myself a bit further and found a related part (I guess):

    -------------- long_spanning_reads.cpp --------------

    /*
    * FIXME, currently just differentiating between a deletion and a
    * reference skip based on length. However, would probably be better
    * to denote the difference explicitly, this would allow the user
    * to supply their own (very large) deletions
    */
    if ((lb->right - lb->left - 1) <= max_deletion_length)
    {
    new_cigar.push_back(CigarOp(DEL, lb->right - lb->left - 1));
    antisense_closure = prev_hit->is_spliced() ? prev_hit->antisense_splice() : curr_hit->antisense_splice();
    }
    else
    {
    new_cigar.push_back(CigarOp(REF_SKIP, lb->right - lb->left - 1));
    antisense_closure = lb->antisense;
    }

    However, when I change the value of max_deletion_length option, the output does not change.

    To reproduce what I said, I've made a small example file here:
    -- test.fa (http://zlab.umassmed.edu/~dongx/temp/test.fa) contais 4 reference sequences for index (which are identical in sequences with different ID, in order to get multiple hit)
    -- test.reads.fa (http://zlab.umassmed.edu/~dongx/temp/test.reads.fa) contains 3 reads (as below), where reads "read00" is directly extracted from the reference sequence (so should be perfectly matched). As you see, read1 and read2 are part of read00 (I marked the common part as upper case and the part deleted from read1/2 are marked as lower case). So, ideally, read1 should have CIGAR line as xxxM9DxxxxM, read2 has xxxM8DxxxM and read00 has xxxM.

    $ cat test.reads.fa
    >read1
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGgAATACTAGAGGCCACTGCAT
    >read2
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGagAATACTAGAGGCCACTGCAT
    >read00
    CCATCTGTAGAAGTGGAAGGGCTGAAGGAGtagagaggagAATACTAGAGGCCACTGCAT

    I made bowtie2 index from test.fa, called 'test', then run Tophat (v2.0.4):

    tophat -o test --max-deletion-length 10 --no-convert-bam -p 8 test test.reads.fa; grep read test/accepted_hits.sam

    read2 256 116 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1168 CP:i:1 HI:i:0
    read00 256 116 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1168 CP:i:1 HI:i:0
    read2 0 1168 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1176 CP:i:1 HI:i:1
    read00 256 1168 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1176 CP:i:1 HI:i:1
    read2 256 1176 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 CC:Z:1916 CP:i:1 HI:i:2
    read00 256 1176 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 CC:Z:1916 CP:i:1 HI:i:2
    read2 256 1916 1 1 29M8D23M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-29 XN:i:0 XM:i:0 XO:i:1 XG:i:8 NM:i:8 MD:Z:29^GTAGAGAG23 YT:Z:UU NH:i:4 HI:i:3
    read00 0 1916 1 1 60M * 0 0 CCATCTGTAGAAGTGGAAGGGCTGAAGGAGTAGAGAGGAGAATACTAGAGGCCACTGCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:60 YT:Z:UU NH:i:4 HI:i:3

    As you see, read1 is not in the output. I also change to use different value of --max-deletion-length, but never can get read1 included. I also tried to use "-i 5" for example, but still no read1 there. I've also tested different deletion size in read2, and found only when the size < or = 8, it can be output; otherwise, it will be filtered. Why??

    Hopefully I can get your help on this.

    I also try to compile the source code myself. Everything works fine except an error in the make step. I made a note here:
    A blog about Tips and Tricks for Unix, Perl, R, HTML, Javascript, Google API and mostly Bioinformatics

    Appreciated if you can help by the way.

    Best,
    -sterding

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:35 AM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 02:44 PM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-11-2024, 06:55 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
111 views
0 likes
Last Post seqadmin  
Working...
X