Announcement

Collapse
No announcement yet.

K-mer content failed on 5' end - advice needed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • K-mer content failed on 5' end - advice needed

    Hi folks,

    I am trying to do adapter and low quality trimming of a fungal genome (prepared with Illumina DNA nano kit and sequenced with HiSeq 2000 100PE). After using BBduk to trim adapters and low quality reads as following

    >./bbduk.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1_q25.fastq.gz out2=R2_q25.fastq.gz ktrim=r k=21 mink=11 hdist=2 tpe tbo ref=resources/adapters.fa qtrim=rl trimq=25

    Still FASTQC showed a K-mer content warning for both R1 and R2 reads [ https://goo.gl/photos/Lsyt7YJeQnjB8HQq5 ]. Can I have your opinion how shall I handle my data? Shall I just remove the first 20 bases to be on a safe side? Or it is normal behavior for a library prepared with the nano kit?

    Thanks in advance and have a great day!
    Last edited by Vinn; 04-21-2017, 06:47 AM.

  • #2
    What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.

    Comment


    • #3
      Originally posted by GenoMax View Post
      What kind of analysis are you trying to do? In general I have never worried about k-mer warnings from FastQC.
      Hi GenoMax, thanks for your reply. I would like to do de novo assembly.

      Comment


      • #4
        Take a look at @Brian's suggestions in this thread. I have provided a link for a specific post but take a look at the whole thread. He should be along with more later.

        Comment


        • #5
          Thank you, I will read the thread through.

          Comment


          • #6
            Kmer-content spikiness at the beginning of the read is normal for many fragmentation methodologies and should not be removed. I'm not sure what's going on at the end, though...

            Comment


            • #7
              Thanks for your reply Brian. Just to be on a safe side, do you think it is better to trim the end off?

              Comment


              • #8
                Excessive trimming reduces accuracy, and will degrade the results of any experiment. If you want to be confident that bases are genomic rather than artificial, I suggest you follow this methodology:

                1) Map the reads to the reference (if you don't have a reference, you can make a quick assembly with Tadpole) with BBMap like this:

                Code:
                bbmap.sh in=reads.fq ref=ref.fa mhist=mhist.txt qhist=qhist.txt
                2) Plot mhist with R or Excel with a log-scale Y-axis to look at the positional error rates.

                If there is not an increased error rate in a region of the read, there is no reason to trim it. And conversely, it is prudent to trim if there is a high error rate at one end or the other.

                Comment


                • #9
                  Thanks so much Brian for your advice. I will try as you suggested.

                  Comment

                  Working...
                  X