Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyze Tophat Fusion Output

    Do you have any experience in analyzing the output candidated from Fusion?

    How do you go about to find the true positives and potention de novo fusions?

    Best

  • #2
    I am sure you have run the tophat-fusion-post to remove the potential false positive. Beyond that, PCR experiments are the only way to validate the potential fusions.

    "How do you go about to find the true positives and potention de novo fusions?"

    That's the question I have been trying to answer for. There is no way TopHat fusion can tell whether a fusion is due to a de novo rearrangement or result of trans-splicing or some cDNA library artifact.

    Comment


    • #3
      I have been looking into this a bit lately. My results are dominated by "fusions" of two genes on the same chromosome, generally about 200kbp apart. After reading the methods of the tophat-fusion paper, I'm fairly convinced these are read-through transcripts. I didn't know what that was, so I read a few papers on the topic, this is a particularly interesting one: http://genome.cshlp.org/content/16/1/30.full

      Once those are eliminated, it's less clear. The tophat-fusion authors used different filters for different datasets, determined empirically so that they found the known fusions in their datasets. For example they varied the minimum supporting pairs and spanning reads needed to call a fusion. It makes some sense to me that these requirements should get more restrictive as your number of reads goes up, but picking the right filters for my data is guesswork at this point, as I have no known fusions as a positive control.

      As vinay points out, PCR in the wet lab is probably the only way to make a convincing argument that you've got a fusion. Hopefully you have DNA from your samples in addition to the RNA.

      Comment


      • #4
        Confusion in understand​ing the tophat fusion results

        can any one explain me the tophat output file.

        For each predicted fusion candidate in the first row, column 9 and column 10 are the number of bases in both left and right side of the fusion, ie their sum should be 100, as I used 100b PE illumina data.
        I couldnt figure out the 11 th column of first row for each candidate fusion.
        Even I am unable to understand the meaning of second row too .

        Kindly help me out to figure out these confusions and oblige.

        Comment


        • #5
          bharati,

          Although I can't help with column 11 or the second row, I may be able to help with columns 9 and 10.

          These are "the number of bases on the left and right sides of a fusion, respectively, covered by spanning reads". So in the case where your left read only spans the fusion by 1 base, you should have 100-1=99 bases covered on the left side. Likewise for the right side. I believe that with good coverage these should each be about equal to the size of your read length.

          Comment


          • #6
            Hi NKAkers,
            This is not the case always, as some of the reads having 133 and 54 or 6 and 66 or 17 and 259 or even 36 and 710 at their 9 and 10 columns respectively (as shown below). I am unable to figure out this.
            can u please guide and help.
            chr7-chr9 20414 141122659 fr 1 2 0 17 17 259 33.000000 @ 2 2 2 2 2 @ AGATCAGTGATAGGGCATGGTGTGGATATTATTACATTAGTATTGGAAGC GATGGTGTGGATTAGATCAGTGATAGGGCATGGTGTGGATATTATTACAT @ AGATCAGTGATAGGGCATGGTGTGGATATTATTACATTAGTATTGGAAGT GATGGTGTGTATTAGATCAGTGATAGGGCATGGTGTGGATATTATTACAT @ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 @ -55:-288 -127:-893

            chr16-chr1 65607 15922 ff 1 3 0 1 36 710 14.000000 @ 2 3 6 6 6 @ CAGCTTGGCGGATGGACTCTAGCAGAGTGGCCCAGCCACCGGAGGGGTCG ACCACTTCTCTGGGAGCTCCCTGGACTGGAGCCGGGAGGTGGGGAACAGG @ CCAGCTTGGCGGATGGACTCTAGCAGAGTGGCCAGCCACCGGAGGGGTCA ACCACTTCCCTGGGAGCTCCCTGGACTGGAGCCGGGAGGTGGGGAACAGG @ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 @ -249:253 -249:310 -1470:1446
            Last edited by bharati; 11-08-2012, 03:28 AM.

            Comment


            • #7
              Hi bharati,

              The manual has a whole section on this output. http://tophat.cbcb.umd.edu/fusion_manual.html

              In your specific cases, you're getting weird numbers because you're looking at false positives. The strings of 1s after the sequences are the number of reads that support the fusion at that site. And 1 read depth isn't enough to call anything. In essence, your chr7-chr9 fusion is called based on two reads in different spots. I would strongly recommend running tophat-fusion-post before trying to analyze any of the 'results' from tophat fusion. It is unfiltered and ridden with false positives.

              Comment


              • #8
                Explanation for result.html

                Hi NKAkers,
                Can you please explain the result.html file generated by Tophat-fusion-post, As I am unable to correlate this (result.html) file with its potential_fusions.txt file.

                Thanx
                Last edited by bharati; 12-04-2012, 10:30 PM.

                Comment


                • #9
                  table description in result.html

                  Hi
                  In the table description of result.html generated by tophat-fusion-post I am unable to understand the last line ie "If you follow the the 9th column, it shows coordinates "number1:number2" where one end is located at a distance of "number1" bases from the left genomic coordinate of a fusion and "number2 is similarly defined".
                  As we know the table has following feilds:

                  chr12-chr8 fr
                  RB005 PCBP2 chr12 53858636 FLJ39080 chr8 75515898 139 1509 175

                  So the above Red line is about which feild?
                  Please explain if anybody can.

                  Thanx

                  Comment


                  • #10
                    Hi bharati,

                    If you open the result.html file, then click on the 9th column hyperlink (ie the '1509') it should move you view down your html page to a heading that says 1509 pairs, and below that heading 1509 lines, looking something like:

                    1509 pairs
                    29:10
                    1311:-75
                    1328:-78
                    -2434:-68
                    -3770:187
                    -3745:252
                    ...

                    each line giving coordinates relative to the proposed fusion site of mate pairs from one read.

                    I'm not certain what the purpose of potential_fusions.txt is, apart from giving some of the info from result.html in text form.

                    Comment


                    • #11
                      Difference between Spanning Reads and Spanning Matepairs

                      hi
                      can you please explain the difference between Spanning Reads and Spanning Mate pairs. As much I could understand the number of Spanning mate pairs should be lesser than spanning reads, but this is not the case in my results.

                      thanx

                      Comment


                      • #12
                        Confusion between Spanning reads and spanning mate pairs

                        Hi NKAkers,

                        What I understand is Spanning reads are those reads which do not harbor the fusion point but Split reads do harbor it, but Spanning mate pairs are those spanning reads which are supported by their mate pairs.

                        Are Spanning Reads those singular reads which do not have their mate pairs?

                        Comment

                        Latest Articles

                        Collapse

                        • performena
                          Reply to Recent Advances in Sequencing Technologies
                          by performena
                          Performena is a digital-first creative agency based in the UAE. We craft resonant brand experiences through strategic consultancy, creative services, media planning, programmatic advertising, and custom technology solutions.

                          Our journey began with a vision to revolutionize digital marketing. We combine data-driven insights with creative storytelling to deliver impactful campaigns that drive measurable results. Our team of wild hearts and tech wizards work tirelessly to unlock your brand's...
                          Today, 05:26 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin







                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has...
                          Yesterday, 01:49 PM
                        • seqadmin
                          Genetic Variation in Immunogenetics and Antibody Diversity
                          by seqadmin



                          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                          11-06-2024, 07:24 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 09:29 AM
                        0 responses
                        14 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 09:06 AM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 08:03 AM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 11-22-2024, 07:36 AM
                        0 responses
                        65 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X