Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indels in Sequence Reads

    Hi everyone,

    I am using paired end 100 bp reads on an illumina hiseq platform to find chromosomal rearrangements. I can find them just fine with the paired end data since the pairs are flagged, but I need to find the junctions within the sequence reads themselves. I've tried BWA, Bowtie2, Tophat2, SNAP, and ELAND and none of these report junctions except for Tophat2, but Tophat2 is not suitable since I'm sequencing DNA. I can see the junctions if I do not soft clip my reads, but it doesn't display the reads as junctions like it does when I use Tophat2 on RNA. Does anyone have any suggestions on how to get some of these programs to display split reads or filter them out into a separate file so I can find and analyze them effectively?

    The junctions are also more than just a few base pairs. They can vary from a few KB to hundreds of KB and a few are inter chromosomal rearrangements

  • #2
    Dear @YOLO69SWAG,

    The Subread aligner may help you to do this. If you run Subread with the '-J' option, it will mark those bases which can not be aligned together with other bases from the same read using 'S' in the CIGAR string (soft clipping). You can then identify the junction/fusion positions within reads using this information.

    Subread can be downloaded from http://subread.sourceforge.net

    Best wishes,

    Wei

    Comment


    • #3
      There are a number of programs designed specifically for detecting indels and structural variants. Check out the software wiki.

      Comment


      • #4
        Thanks Wei,

        I'm almost done aligning now using subread. I'll make another post this wknd with the result.

        Comment


        • #5
          Subread

          Hey Wei,

          I tried the program and was able to get it to align my reads. I can see the soft clipped bases and these are almost always my junction reads. This helps a lot, since I can see the junctions that I saw with my paired end information, but there's still a problem because the junctions are in many different locations. Going through each of these sites will take forever BLASTing manually.

          Woe is graduate school?

          Any more suggestions would be much appreciated.

          Comment


          • #6
            Dear @YOLO69SWAG,

            Thanks for letting me know it worked and I'm glad it helps.

            Could you elaborate a bit more about what you want to do with these fusion reads? This will let more people give you comments about your project. Do you want to get the exact fusion positions or do you want to know the gene which were fused together?

            Best wishes,

            Wei

            Comment


            • #7
              I'm trying to record and count these junction reads and the coordinates of the junction. They occur spontaneously in certain parts of the genome. I already detect them using paired end information, but being able to get the information on junctions within reads will let me quantify the presence of these junctions accurately. If I could get my DNA-seq reads to be read using a gapped aligner and have the junctions displayed like Tophat/bowtie does with RNA seq data that would be perfect.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X