Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Would you be able to post all of the FastQC output plots for comparison with other runs. For now, I would mention that Exome capture does not sample genome randomly, so it is not unusual to see what you are reporting.

    Comment


    • #17
      Thanks for your response. I first have to mention that I don't have a very strong background in bioinformatics and am using the CLC Genomics Workbench (ver. 7.5) which has a GUI and runs on Windows. I have used the Workbench's 'Merge Overlapping Pairs' function to generate the histogram below (I'm guessing it's similar to the BBMerge mentioned by Brian). I also haven't used the FASTQC but the native QC check in the Workbench. I'm attaching the output here. As you can see there is no severe drop in quality along the reads and besides the peaks in GC content observed at the end of the read (as I understand it, typical for Illumina data), the GC content along read length is around 45%. And the samples are human.
      Attached Files

      Comment


      • #18
        Unfortunately, it looks like that tool does not merge reads with insert size shorter than read length, which was the point of the exercise. But from the graph I can infer that maybe 30% of the reads are indeed in that category, so there are a few possibilities:

        1) The twin peaks are indeed from exon-capture bias, though I kind of doubt that, as it does not explain why trimming the reads would reduce it; and I would have expected such a bias to shift the peak center rather than creating a bimodal distribution, but of course it depends on the bait design.
        2) There is an exonic and intronic peak, or gene and non-gene peak. The GC content of a gene changes markedly once you get just outside of its bounds. For example, just upstream of the gene, it becomes very AT-rich, IIRC. But, I don't really like that explanation either.
        3) The adapter-trimming is unsuccessful or incomplete. From your GC content by base position, it looks fairly flat across the read, aside from the first 20 bp... so that doesn't make much sense either. Still, it wouldn't hurt to confirm. What were the total percent of reads and bases trimmed during adapter-trimming? I would expect something like 30% of the reads and maybe 5-10% of the bases. If you are using Nextera adapters, be sure you use those sequences for trimming.


        I suggest that you bin some of your reads by GC - just split them into pairs with GC<50% and GC>50%. Map both to human and look at the mapping rates (ideally, forcing unclipped global alignments). If they are equivalent, then the issue is not caused by contamination or adapter sequence, and it's probably safe to ignore.

        You can split the reads by GC content with my reformat tool:

        reformat.sh in1=read1.fq in2=read2.fq out1=low1.fq out2=low2.fq maxgc=0.5

        reformat.sh in1=read1.fq in2=read2.fq out1=high1.fq out2=high2.fq mingc=0.5

        Comment


        • #19
          my fastq GC content report has two peaks.can any one help me how i can assemble these type of data?
          Attached Files

          Comment


          • #20
            As mentioned above, the two peaks could very well be a sign of a mixed sample (contamination).
            You could remove the all the high GC content reads and see if this improves the assembly.
            BBtools (BBduk?) has a GC content filter.

            Comment


            • #21
              thank you . i can not run bbmap tools on windows. i get error

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advanced Tools Transforming the Field of Cytogenomics
                by seqadmin


                At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                09-26-2023, 06:26 AM
              • seqadmin
                How RNA-Seq is Transforming Cancer Studies
                by seqadmin



                Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                09-07-2023, 11:15 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 07:14 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-29-2023, 09:38 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-27-2023, 06:57 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-26-2023, 07:53 AM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Working...
              X