Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bosTau2
    Member
    • Nov 2008
    • 12

    Split read mapping

    Mosaik does split read mapping for structural variation but does any one know any other program that does split read mapping??
    Thank you.
    From Antwerp
    hi1
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    Originally posted by bosTau2 View Post
    Mosaik does split read mapping for structural variation but does any one know any other program that does split read mapping??
    Thank you.
    From Antwerp
    hi1
    Split read mapping? Please be more specific.

    Comment

    • bosTau2
      Member
      • Nov 2008
      • 12

      #3
      Split read mapping: a read is mapped to two separate locations because of possible structural variation.
      -------- A ----------- break --------------- B -----------------
      |==============||=====================|

      This mapping makes sense for read longer than 50-76 or 454reads with sufficient coverage.
      Split reads should be flagged with 256 in SAM. So any split reads should have a SAM flag greater than 256. So far I have not seen any of split reads.

      Mosaik does this and BC is specialized in this area but the version released does not, I think. I thought ssaha does this but other people told me it does not.
      hi1

      Comment

      • snownebula
        Junior Member
        • Oct 2009
        • 9

        #4
        Hi there,

        We have been using the split read methodology quite a bit with MOSAIK.

        We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

        Briefly, our process is as follows:

        1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

        e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

        2. Build a new read archive using the unaligned reads from step 1.

        3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

        Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

        e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

        These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

        3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

        You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

        The reads that aligned to two significant regions are prime candidates for split-read structural variations.

        Cheers,

        // Michael

        Comment

        • lh3
          Senior Member
          • Feb 2008
          • 686

          #5
          You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

          PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.
          Last edited by lh3; 10-22-2009, 06:39 PM.

          Comment

          • jnfass
            Member
            • Aug 2008
            • 88

            #6
            Originally posted by lh3 View Post
            You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

            PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.
            So, does bwa bwasw (formerly misnamed as bwtsw?) not produce more than one alignment for each chunk of read?
            And, is there a way to force bwasw to apply the mismatch and indel cutoffs to the entire read -- in other words, not identify chimeric reads?

            Comment

            • lh3
              Senior Member
              • Feb 2008
              • 686

              #7
              BWT-SW is a different software that was published last year by a Hong Kong group. Previously the BWA-SW algorithm was named as dBWT-SW but people complain that it is hard to pronounce.

              Reporting local hits is the right thing for reads longer than 200bp. Long reads are fragile to SVs and misassemblies in the reference. We do not always know if the unaligned part is due to SV/misassembly or to low quality bases. If it is due to SV, forcefully aligning the entire reads will lead to spurious variants; if it is due to low quality bases, discarding them does not do much harm. You may reduce the mismatch/gap penalty to get longer aligned segments based on the error profile of your reads, but forcefully aligning the entire read is not an option.

              Comment

              • jnfass
                Member
                • Aug 2008
                • 88

                #8
                Hi Heng,

                That helps - very good point that assemblies may have chimeric sequence in them, so even if you expect no SV in your reads, local alignments are appropriate for long reads.

                But what about the number of alignments? Does bwasw look for the best local alignment for each chunk of a read, and only report one alignment for each chunk? I.e. is each base of a read involved in only one alignment (and is then clipped out of all other alignments)? Or can one stretch of a read be matched to different locations in the reference, thus appear on different lines of the bwasw output SAM file?

                ~Joe

                Comment

                • bosTau2
                  Member
                  • Nov 2008
                  • 12

                  #9
                  Thank M and H,
                  Mosaik and BWA split reads will be useful for SV as well as RNA seq in which a read can be mapped in separate locations, I think.
                  Similar to Joe's questions. In Mosaik and BWA, how these spitted reads will be presented in SAM? Also how are the mapping qualities for these reads?

                  Another question:
                  >PS: SAM flag 256 is not for split reads.
                  (from SAMrool) 256 : the alignment is not primary (a read having split hits may have multiple primary alignment records)
                  How do we interpret this if this is not for split read mapping???

                  Mosaik and BWA have very nice features but the manuals do not even mention split read mapping. It will be good to have these feature described in the manuals since it is not so obvious how to use them. Slightly different but PIDEL does split read but it is purely for SV detection.

                  hi1
                  not from Antwerp.

                  Comment

                  • lh3
                    Senior Member
                    • Feb 2008
                    • 686

                    #10
                    BWA does as follows:

                    In BWA-SW, we say two alignments are distinct if the length of the
                    overlapping region on the query is less than half of the length of the
                    shorter query segment. We aim to find a set of distinct alignments which
                    maximizes the sum of scores of each alignment in the set. This problem
                    can be solved by dynamic programming, but as in our case a read is
                    usually aligned entirely, a greedy approximation would work well. In the
                    practical implementation, we sort the local alignments based on their
                    alignment scores, scan the sorted list from the best one and keep an
                    alignment if it is distinct from all the kept alignments with larger
                    scores; if alignment a_2 is rejected because it is not distinctive
                    from a_1, we regard a_2 to be a suboptimal alignment to a_1 and
                    use this information to approximate the mapping quality.

                    A chimeric read will occupy two or more lines in SAM. Effectively identifying chimera and conveniently reporting chimera are important features of bwasw. They are documented in the bwa manual page as well as FAQ on its home page. In practical applications, you just need to use the default option. (Actually bwasw is designed in a way that internal parameters are adjusted automatically based on the input length and the error rate, and therefore the default option works for most inputs with different characteristics).

                    Nonetheless, pindel still has its advantage. An aligner specifically designed for split reads (not chimeric reads in general) is able to identify shorter matches and should achieve higher sensitivity.

                    Comment

                    • hada
                      Junior Member
                      • Jun 2010
                      • 1

                      #11
                      what are disadvantages of SR(split read) method in sequencing how to avoid it?

                      SR is popular now.But I don't know its distanvages and how to avoid it.I really appreciate it if you can help me solve this problem, thank you!



                      Originally posted by snownebula View Post
                      Hi there,

                      We have been using the split read methodology quite a bit with MOSAIK.

                      We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

                      Briefly, our process is as follows:

                      1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

                      e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

                      2. Build a new read archive using the unaligned reads from step 1.

                      3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

                      Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

                      e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

                      These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

                      3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

                      You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

                      The reads that aligned to two significant regions are prime candidates for split-read structural variations.

                      Cheers,

                      // Michael

                      Comment

                      • delphi_ote
                        Junior Member
                        • Oct 2010
                        • 9

                        #12
                        Since MosaikText doesn't properly deal with clipping when converting to SAM/BAM format, I wouldn't recommend it for this application. Without soft clipping, you're losing the necessary information to get the portion of the read not included in the alignment. Furthermore, without hard clipping information, you're losing the information to even know that a portion of the read didn't align in the first place. You're going to have to realign every single read to its own reference sequence alignment just to get back the unaligned portion of the read, which seems completely absurd.

                        In this day, with literally hundreds of alignment programs available and a mature standard alignment format available and widely used, I can't see learning an API for a aging alignment program myself. But that's what you're in for if you want to use Mosaik for this task. Just wanted to qualify snownebula's enthusiastic post. SAM/BAM is not really an option with Mosaik for this task, and it took me days to figure this out.

                        Comment

                        • wanpinglee
                          Junior Member
                          • Oct 2009
                          • 1

                          #13
                          Hi there,

                          MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.


                          Wan-Ping

                          Comment

                          • delphi_ote
                            Junior Member
                            • Oct 2010
                            • 9

                            #14
                            Originally posted by wanpinglee View Post
                            Hi there,

                            MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.


                            Wan-Ping
                            Great news!

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              Yesterday, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 12:03 PM
                            0 responses
                            17 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, Yesterday, 11:40 AM
                            0 responses
                            13 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-26-2026, 10:12 AM
                            0 responses
                            31 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...