Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mgg
    Member
    • Nov 2011
    • 12

    FastQC,kmer content, per base sequence content: is this good enough

    Hi,

    I'd appreciate some advice on processing some Illumina libraries

    Initial FastQC runs showed the data as not great. I've used cutadapt to trim off adapters and FastQC shows improvements to all libraries.

    One remains of concern, because it still retains kmer and other issues (I've attached files for kmer content & per base sequence content for both the original and the processed data)

    My question is simple: is this good enough? (my next step is assembly with velvet) Does this data need some further processing before Velvet? If so, with what? I've considered trimming off the first 10nuc to remove the anomalous per_base_sequence_content trace, but that would do little for the persistent kmers.

    If this were your data, what would you do before velvet assembly?

    thanks
    mgg

    for the record my cutadapt commands are below

    PHP Code:
    # trim reads/2
    cutadapt -b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_2.fq -o processed/outfile_2.fq --wildcard-file=processed/outfile_2.fq.wildcard

    # trim reads/1
    cutadapt -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_1.fq -o processed/outfile_1.fq --wildcard-file=processed/outfile_1.fq.wildcard 
    Attached Files
  • minoru_harvest
    Junior Member
    • Aug 2012
    • 5

    #2
    yeah. i got the same question
    i have a very similar graph with your prosessed-per-base-sequencecontent

    Comment

    • Wallysb01
      Senior Member
      • Feb 2011
      • 286

      #3
      Looks like you have some base pair bias issues going on from bases 1-10 in your reads. You should trim those off.

      Comment

      • Jane M
        Senior Member
        • Aug 2011
        • 239

        #4
        Hello everybody,

        I come back to this topic which fits well to my interrogation: I would like your point of view on my RNA-Seq data (paired-ends, 100bp) generated by an Illumina HiSeq 2000 machine.
        I attached the "Per Base sequence Quality" and "Kmer Content" for 3 examples. In the first one, the library was prepared using polyA method. The 2 next examples were performed by ribodepletion. I would like to know if my data are "good enough" despite these 2 last profiles and if there is an explanation for this increase of A/T sequence along the read?

        I have the feeling from these examples and some others that the "Kmer Content profile" depends on the library preparation (ribodepletion vs polyA), the run (samples from a same run show a similar profile) and the sample itself (I observed similar profiles for a same sample ran on 2 different runs). Is this true?

        Thank you,
        Jane

        Comment

        • fahmida
          Member
          • Aug 2010
          • 54

          #5
          Originally posted by mgg View Post
          Hi,

          I'd appreciate some advice on processing some Illumina libraries

          Initial FastQC runs showed the data as not great. I've used cutadapt to trim off adapters and FastQC shows improvements to all libraries.

          One remains of concern, because it still retains kmer and other issues (I've attached files for kmer content & per base sequence content for both the original and the processed data)

          My question is simple: is this good enough? (my next step is assembly with velvet) Does this data need some further processing before Velvet? If so, with what? I've considered trimming off the first 10nuc to remove the anomalous per_base_sequence_content trace, but that would do little for the persistent kmers.

          If this were your data, what would you do before velvet assembly?

          thanks
          mgg

          for the record my cutadapt commands are below

          PHP Code:
          # trim reads/2
          cutadapt -b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_2.fq -o processed/outfile_2.fq --wildcard-file=processed/outfile_2.fq.wildcard

          # trim reads/1
          cutadapt -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC --minimum-length=10  --overlap=--quality-base=64 --quality-cutoff=--match-read-wildcards infile_1.fq -o processed/outfile_1.fq --wildcard-file=processed/outfile_1.fq.wildcard 
          Are these reads from mate pair libraries? You may also want to check the read duplication levels in that case.

          Comment

          • Jane M
            Senior Member
            • Aug 2011
            • 239

            #6
            I come back to my previous question because I still have doubts concerning the quality of my data. Any feedback would be appreciated

            Comment

            • Wallysb01
              Senior Member
              • Feb 2011
              • 286

              #7
              I didn't see the attachment. But from what you describe, it sounds ok.

              Comment

              • Jane M
                Senior Member
                • Aug 2011
                • 239

                #8
                Originally posted by Wallysb01 View Post
                I didn't see the attachment. But from what you describe, it sounds ok.
                Oups, I forgot to attach the file!
                Attached Files

                Comment

                • Jane M
                  Senior Member
                  • Aug 2011
                  • 239

                  #9
                  Originally posted by Jane M View Post
                  Oups, I forgot to attach the file!
                  Any comment with the attachment?

                  Comment

                  • Wallysb01
                    Senior Member
                    • Feb 2011
                    • 286

                    #10
                    Looks good enough for mapping. Might want to see if you have some adapter contamination in the first one. I've often found weird suden spikes of particular kmers are the adapters.

                    Comment

                    • Jane M
                      Senior Member
                      • Aug 2011
                      • 239

                      #11
                      Thank you Wallysb01.

                      Isn't it suprising to see an increase of AAAAA and TTTTT all along the read? It shoulb be constant, right?, like in the first case.
                      Why is there such a difference between polyA and ribodepletion?

                      Do all the "normal/good profiles" of these 2 methods always differ?

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        Yesterday, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 12:03 PM
                      0 responses
                      19 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, Yesterday, 11:40 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-26-2026, 10:12 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...