Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jeremy37
    Member
    • Feb 2011
    • 17

    Dealing with overlapping read pairs

    Some of our whole-genome libraries end up with low insert sizes (e.g. ~150) for 2x100 bp sequencing with Illumina HiSeq. I'm concerned about the effect this will have on variant calling.

    Do you know how samtools and/or GATK deal with paired-end reads that overlap? I believe that samtools assumes the reads are independent. Therefore, if there is a PCR error in the middle of your insert, it may appear as two reads (the overlapping ends of a read pair). With low-coverage sequencing data this could lead to a significant number of false variants.

    Is there a good way to deal with this?
    Many thanks for your suggestions.
  • Jeremy37
    Member
    • Feb 2011
    • 17

    #2
    I'm sure this happens to a lot of people doing sequencing. Does everyone just assume that it's not a problem?
    Samtools WILL call variants with just 2 reads. Also, with low-coverage data we don't necessarily want to filter out variants seen in 2 reads if other quality indicators are fine. What to do...

    Comment

    • MeganS
      Member
      • Sep 2010
      • 14

      #3
      Merge the overlapping reads. There are a number of tools that do this (eg FLASH)

      Comment

      • dariober
        Senior Member
        • May 2010
        • 311

        #4
        Originally posted by Jeremy37 View Post
        I'm sure this happens to a lot of people doing sequencing. Does everyone just assume that it's not a problem?
        Samtools WILL call variants with just 2 reads. Also, with low-coverage data we don't necessarily want to filter out variants seen in 2 reads if other quality indicators are fine. What to do...
        I've been using clipOverlap on the aligned bam files. Just make sure the names of the two reads in each pair are identical (i.e. without the /1 or /2 suffix that some aligner add to the read names).

        This page is also quite useful http://thegenomefactory.blogspot.co....aired-end.html

        Best
        Dario

        Comment

        • Jeremy37
          Member
          • Feb 2011
          • 17

          #5
          Wow, clipOverlap looks great. Exactly what I am looking for!

          All the other tools I have seen (e.g. FLASh) try to remove overlap or combine reads straight from the fastq files. In the case where you have a good reference genome, e.g. human, this is sure to be much less accurate because it doesn't use the rest of the read to determine with confidence (e.g. by alignment) whether the reads overlap.
          I also need something that works on BAM files, since I will be getting them already aligned.
          Many thanks Dario.

          Comment

          • krobison
            Senior Member
            • Nov 2007
            • 734

            #6
            It would seem clipOverlap is potentially throwing away information; the ideal tool would update the qualities when the two reads agree with each other in the overlapping region, as you now have greater confidence that the base was read correctly from the fragment.

            Comment

            • JackieBadger
              Senior Member
              • Mar 2009
              • 385

              #7
              Originally posted by krobison View Post
              It would seem clipOverlap is potentially throwing away information; the ideal tool would update the qualities when the two reads agree with each other in the overlapping region, as you now have greater confidence that the base was read correctly from the fragment.

              FLASH does this

              Comment

              • krobison
                Senior Member
                • Nov 2007
                • 734

                #8
                Originally posted by JackieBadger View Post
                FLASH does this
                Yes, but as pointed out above there is a risk with FLASH and similar tools (& I use FLASH routinely) of it making a mistake on registering the reads on short imperfect repeats and artificially creating an indel. With the genome sequence in hand, more information is available to correctly merge the reads.

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 07-02-2026, 11:08 AM
                0 responses
                12 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                15 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                20 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                54 views
                0 reactions
                Last Post SEQadmin2  
                Working...