Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Boel
    Member
    • Oct 2009
    • 62

    BFAST + GATK -> strand bias?

    Hi All,

    I have HiSeq exome data, 75 b paired end. I've used bfast to align, which, if I am not mistaking, converts one end to its complement so that both members of a pair are attributed to the same strand, the pos one, in the resulting bam file.

    If this is true it leads to a problem in GATK (when filtering calls), simply that one can not use strand bias tests (they are all highly sign, all reds are on the same strand). Also, this would also create problem in the read position rank tests, as the 'best' end of every other read is annotated as its 'good' end.

    So the question is: Am I missing something, or are the above conclusions correct?
    Also, is there a way to circumvent this?

    Thanks a bunch,
    Boel
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    With the newest version of BFAST, you can have the input reads on be the opposite strand to properly represent paired end reads. BFAST will not reverse compliment one end, but the conversion to FASTQ may do so (a new "-k" option avoids this).

    What is your input data, FASTQ files?

    You may get more traction at [email protected].

    Comment

    • Boel
      Member
      • Oct 2009
      • 62

      #3
      Originally posted by nilshomer View Post
      With the newest version of BFAST, you can have the input reads on be the opposite strand to properly represent paired end reads. BFAST will not reverse compliment one end, but the conversion to FASTQ may do so (a new "-k" option avoids this).

      What is your input data, FASTQ files?

      You may get more traction at [email protected].
      Thanks.

      I started with qseq files, converted them with the ill2fastq.pl and then aligned with bfast (bfast+bwa-0.6.5). Did not use the -k option.

      Not sure that this does create a bias, thats what I am trying to figure out. Do you know?
      According to sam format all reads are on the same strand as the reference, which might make this a non-issue. But I must say that I am confused right now.

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        The ill2fastq.pl script will reverse compliment one of the ends, so they are mapped onto the same strand. You can try with the newest release and the "-k" option, as well as supplying the proper pairing information (there is a new "postprocess" pairing option).

        Comment

        • Boel
          Member
          • Oct 2009
          • 62

          #5
          Originally posted by nilshomer View Post
          The ill2fastq.pl script will reverse compliment one of the ends, so they are mapped onto the same strand. You can try with the newest release and the "-k" option, as well as supplying the proper pairing information (there is a new "postprocess" pairing option).
          I am correct in assuming that using bfast in the way I have does create a bias?

          Further, when looking at my reads in IGV most (70%) have a insert size that is positive, while my correct mean insert size should be around -12 (overlapping). Does this suggest that something has gone wrong in the mapping?

          Comment

          • nilshomer
            Nils Homer
            • Nov 2008
            • 1283

            #6
            Originally posted by Boel View Post
            I am correct in assuming that using bfast in the way I have does create a bias?

            Further, when looking at my reads in IGV most (70%) have a insert size that is positive, while my correct mean insert size should be around -12 (overlapping). Does this suggest that something has gone wrong in the mapping?
            Not sure as I don't have enough information. I have tried mapping with pairs that should have a mean insert of zero without problems. Please try the newest version of BFAST and post your results as well as enough information for us to debug.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM
            • SEQadmin2
              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
              by SEQadmin2

              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
              05-06-2026, 09:04 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-28-2026, 11:40 AM
            0 responses
            29 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-26-2026, 10:12 AM
            0 responses
            31 views
            0 reactions
            Last Post SEQadmin2  
            Working...