Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jossef
    Junior Member
    • Feb 2015
    • 4

    FASTQC Interpretation

    Hi all (first post here)!

    I am performing my first RNA-seq analysis (prokaryote, 50bp SE) from an Illumina TruSeq library that was sequenced on a Hiseq 2000. The FastQC is showing great read quality, but I have a few concerns that I am having difficulty interpreting.

    Should I be concerned about these kmer charts? There are about 30 sharply overrepresented sequences for any given sample. I have attached one sample's duplication levels graphic— it is representative of the others.

    Thank you in advance for any thoughts you can offer; I will report back if I have a moment of clarity.


    Click image for larger version

Name:	Kmer D1.png
Views:	1
Size:	90.4 KB
ID:	308686

    Click image for larger version

Name:	Kmer WT2.png
Views:	1
Size:	94.1 KB
ID:	308688

    Click image for larger version

Name:	Overrepresented.png
Views:	1
Size:	87.9 KB
ID:	308690

    Click image for larger version

Name:	Duplication levels.png
Views:	1
Size:	60.0 KB
ID:	308689
    Attached Files
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    On positive side the overrepresented sequences are not adapters (no hits )

    Have you tried a trimming program (BBDuk from BBMap or trimmomatic) to see if majority of reads survive?

    You should go forward with the analysis and see how the alignments look.

    Comment

    • Jossef
      Junior Member
      • Feb 2015
      • 4

      #3
      I am just about to perform a trimmomatic run on the fastq files, so we'll see.

      Looking at one of the overrepresented sequences, I found it to be from ssrA (a 10S RNA), so I'm assuming the issue has something to do with bias in the steps leading up to and during library prep.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Had you done anything to enrich mRNA/remove non-coding RNA?

        Comment

        • Jossef
          Junior Member
          • Feb 2015
          • 4

          #5
          Yes. The Ribo-Zero kit was used, and our electropherogram afterwards indicated about 3% rRNA in any given sample.

          Separately, and I'm a little embarrassed to ask, but am I supposed to trim Illumina multiplexing barcodes prior to mapping my reads? I'm almost positive the answer is yes, but the distinction between Illumina adaptor and multiplex barcode seems muddled in the threads I have read.
          Last edited by Jossef; 02-13-2015, 06:44 PM.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Illumina barcodes are read independently and are never part of the sequence (you will see the tag read sequence in each fastq read ID, it was used for the demultiplexing).

            Here is a video from Illumina that illustrates this: https://www.youtube.com/watch?v=womKfikWlxM

            Only thing you need to worry about is possible contamination of adapters (specially if your inserts are smaller than you thought they were).
            Last edited by GenoMax; 02-13-2015, 06:56 PM.

            Comment

            • Jossef
              Junior Member
              • Feb 2015
              • 4

              #7
              Ah, I see where I was getting confused— I had been reading the FASTQ file incorrectly. A silly oversight on my part, but thanks.

              Comment

              • Julia_S
                Junior Member
                • Jan 2015
                • 3

                #8
                Jossef - I am having the same problem. Could you please explain what exactly was going wrong (you said you had been reading the FASTQ file incorrectly), and what the solution was?
                Many thanks!

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  @Julia_S: I think Jossef was only referring to not correctly interpreting the fastq headers. Not a problem with reading the fastq file itself.

                  Have you done any trimming/adapter scans on your data? Can you post images of what the problem looks like in your case?
                  Last edited by GenoMax; 03-24-2015, 08:54 AM.

                  Comment

                  • Julia_S
                    Junior Member
                    • Jan 2015
                    • 3

                    #10
                    FASTQC shows no adapter content and no overrepresented sequences; per base sequence content is also ok (except the first few bases).
                    I have 24 samples of human paired-end RNA seq, and for all of them, the kmer pictures look similar to the ones attached.
                    I am a newbie and completely at loss, so any help would be really appreciated!
                    Attached Files

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      I am going to suggest that you go ahead with trimming of data and further downstream analysis. You can re-check data post-trimming with FastQC to see if the k-mer over-representation goes away. Remember to use a paired-end aware trimming program (bbduk from BBMap suite, trimmomatic, cutadapt).

                      If you are worried about the data take a few sequences and spot check by blast at NCBI to make sure that the data aligns well to human genome.

                      Comment

                      • Julia_S
                        Junior Member
                        • Jan 2015
                        • 3

                        #12
                        Originally posted by GenoMax View Post
                        I am going to suggest that you go ahead with trimming of data and further downstream analysis. You can re-check data post-trimming with FastQC to see if the k-mer over-representation goes away. Remember to use a paired-end aware trimming program (bbduk from BBMap suite, trimmomatic, cutadapt).

                        If you are worried about the data take a few sequences and spot check by blast at NCBI to make sure that the data aligns well to human genome.
                        @GenoMax: Thank you! The k-mer overrepresentation is not generally at the start or end of the reads, so I would guess trimming is unlikely to affect it.
                        Speaking of trimming (and again sorry if this is a stupid question, this is the first time I am analysing RNAseq data) - I have no adapter contamination and the quality of all the bases is in the green area. In that case, I would have thought no (additional) trimming is necessary?

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          If you don't have adapter contamination then a pass through the trimming program would leave the data intact but if you do have some then you want that part removed anyway.

                          Your are perhaps right that trimming may not change the k-mer result but the main thing you want to know is how well your data maps. One could have perfect data (great Q scores, no k-mer enrichment) but if it does not map well then it is not useful.

                          BTW: k-mer module in FastQC only tracks 2% of the total data.

                          Comment

                          Latest Articles

                          Collapse

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, 06-05-2026, 10:09 AM
                          0 responses
                          14 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-04-2026, 08:59 AM
                          0 responses
                          28 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          33 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 11:40 AM
                          0 responses
                          23 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...