Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nucacidhunter
    Jafar Jabbari
    • Jan 2013
    • 1250

    #16
    Would you be able to post all of the FastQC output plots for comparison with other runs. For now, I would mention that Exome capture does not sample genome randomly, so it is not unusual to see what you are reporting.

    Comment

    • Khillo81
      Junior Member
      • May 2014
      • 4

      #17
      Thanks for your response. I first have to mention that I don't have a very strong background in bioinformatics and am using the CLC Genomics Workbench (ver. 7.5) which has a GUI and runs on Windows. I have used the Workbench's 'Merge Overlapping Pairs' function to generate the histogram below (I'm guessing it's similar to the BBMerge mentioned by Brian). I also haven't used the FASTQC but the native QC check in the Workbench. I'm attaching the output here. As you can see there is no severe drop in quality along the reads and besides the peaks in GC content observed at the end of the read (as I understand it, typical for Illumina data), the GC content along read length is around 45%. And the samples are human.
      Attached Files

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #18
        Unfortunately, it looks like that tool does not merge reads with insert size shorter than read length, which was the point of the exercise. But from the graph I can infer that maybe 30% of the reads are indeed in that category, so there are a few possibilities:

        1) The twin peaks are indeed from exon-capture bias, though I kind of doubt that, as it does not explain why trimming the reads would reduce it; and I would have expected such a bias to shift the peak center rather than creating a bimodal distribution, but of course it depends on the bait design.
        2) There is an exonic and intronic peak, or gene and non-gene peak. The GC content of a gene changes markedly once you get just outside of its bounds. For example, just upstream of the gene, it becomes very AT-rich, IIRC. But, I don't really like that explanation either.
        3) The adapter-trimming is unsuccessful or incomplete. From your GC content by base position, it looks fairly flat across the read, aside from the first 20 bp... so that doesn't make much sense either. Still, it wouldn't hurt to confirm. What were the total percent of reads and bases trimmed during adapter-trimming? I would expect something like 30% of the reads and maybe 5-10% of the bases. If you are using Nextera adapters, be sure you use those sequences for trimming.


        I suggest that you bin some of your reads by GC - just split them into pairs with GC<50% and GC>50%. Map both to human and look at the mapping rates (ideally, forcing unclipped global alignments). If they are equivalent, then the issue is not caused by contamination or adapter sequence, and it's probably safe to ignore.

        You can split the reads by GC content with my reformat tool:

        reformat.sh in1=read1.fq in2=read2.fq out1=low1.fq out2=low2.fq maxgc=0.5

        reformat.sh in1=read1.fq in2=read2.fq out1=high1.fq out2=high2.fq mingc=0.5

        Comment

        • Dr khani
          Junior Member
          • Jan 2017
          • 3

          #19
          my fastq GC content report has two peaks.can any one help me how i can assemble these type of data?
          Attached Files

          Comment

          • luc
            Senior Member
            • Dec 2010
            • 469

            #20
            As mentioned above, the two peaks could very well be a sign of a mixed sample (contamination).
            You could remove the all the high GC content reads and see if this improves the assembly.
            BBtools (BBduk?) has a GC content filter.

            Comment

            • Dr khani
              Junior Member
              • Jan 2017
              • 3

              #21
              thank you . i can not run bbmap tools on windows. i get error

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              49 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              57 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              50 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              201 views
              0 reactions
              Last Post seqadmin  
              Working...