Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • YOLO69SWAG
    Junior Member
    • Mar 2013
    • 9

    Indels in Sequence Reads

    Hi everyone,

    I am using paired end 100 bp reads on an illumina hiseq platform to find chromosomal rearrangements. I can find them just fine with the paired end data since the pairs are flagged, but I need to find the junctions within the sequence reads themselves. I've tried BWA, Bowtie2, Tophat2, SNAP, and ELAND and none of these report junctions except for Tophat2, but Tophat2 is not suitable since I'm sequencing DNA. I can see the junctions if I do not soft clip my reads, but it doesn't display the reads as junctions like it does when I use Tophat2 on RNA. Does anyone have any suggestions on how to get some of these programs to display split reads or filter them out into a separate file so I can find and analyze them effectively?

    The junctions are also more than just a few base pairs. They can vary from a few KB to hundreds of KB and a few are inter chromosomal rearrangements
  • shi
    Wei Shi
    • Feb 2010
    • 236

    #2
    Dear @YOLO69SWAG,

    The Subread aligner may help you to do this. If you run Subread with the '-J' option, it will mark those bases which can not be aligned together with other bases from the same read using 'S' in the CIGAR string (soft clipping). You can then identify the junction/fusion positions within reads using this information.

    Subread can be downloaded from http://subread.sourceforge.net

    Best wishes,

    Wei

    Comment

    • HESmith
      Senior Member
      • Oct 2009
      • 512

      #3
      There are a number of programs designed specifically for detecting indels and structural variants. Check out the software wiki.

      Comment

      • YOLO69SWAG
        Junior Member
        • Mar 2013
        • 9

        #4
        Thanks Wei,

        I'm almost done aligning now using subread. I'll make another post this wknd with the result.

        Comment

        • YOLO69SWAG
          Junior Member
          • Mar 2013
          • 9

          #5
          Subread

          Hey Wei,

          I tried the program and was able to get it to align my reads. I can see the soft clipped bases and these are almost always my junction reads. This helps a lot, since I can see the junctions that I saw with my paired end information, but there's still a problem because the junctions are in many different locations. Going through each of these sites will take forever BLASTing manually.

          Woe is graduate school?

          Any more suggestions would be much appreciated.

          Comment

          • shi
            Wei Shi
            • Feb 2010
            • 236

            #6
            Dear @YOLO69SWAG,

            Thanks for letting me know it worked and I'm glad it helps.

            Could you elaborate a bit more about what you want to do with these fusion reads? This will let more people give you comments about your project. Do you want to get the exact fusion positions or do you want to know the gene which were fused together?

            Best wishes,

            Wei

            Comment

            • YOLO69SWAG
              Junior Member
              • Mar 2013
              • 9

              #7
              I'm trying to record and count these junction reads and the coordinates of the junction. They occur spontaneously in certain parts of the genome. I already detect them using paired end information, but being able to get the information on junctions within reads will let me quantify the presence of these junctions accurately. If I could get my DNA-seq reads to be read using a gapped aligner and have the junctions displayed like Tophat/bowtie does with RNA seq data that would be perfect.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              30 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              96 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              115 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              109 views
              0 reactions
              Last Post SEQadmin2  
              Working...