Unconfigured Ad

**GenoMax** · 02-13-2015, 05:20 PM

On positive side the overrepresented sequences are not adapters (no hits

)

Have you tried a trimming program (BBDuk from BBMap or trimmomatic) to see if majority of reads survive?

You should go forward with the analysis and see how the alignments look.

**Jossef** · 02-13-2015, 05:30 PM

I am just about to perform a trimmomatic run on the fastq files, so we'll see.

Looking at one of the overrepresented sequences, I found it to be from ssrA (a 10S RNA), so I'm assuming the issue has something to do with bias in the steps leading up to and during library prep.

**GenoMax** · 02-13-2015, 06:20 PM

Had you done anything to enrich mRNA/remove non-coding RNA?

**Jossef** · 02-13-2015, 06:34 PM

Yes. The Ribo-Zero kit was used, and our electropherogram afterwards indicated about 3% rRNA in any given sample.

Separately, and I'm a little embarrassed to ask, but am I supposed to trim Illumina multiplexing barcodes prior to mapping my reads? I'm almost positive the answer is yes, but the distinction between Illumina adaptor and multiplex barcode seems muddled in the threads I have read.

**GenoMax** · 02-13-2015, 06:50 PM

Illumina barcodes are read independently and are never part of the sequence (you will see the tag read sequence in each fastq read ID, it was used for the demultiplexing).

Here is a video from Illumina that illustrates this: https://www.youtube.com/watch?v=womKfikWlxM

Only thing you need to worry about is possible contamination of adapters (specially if your inserts are smaller than you thought they were).

**Jossef** · 02-13-2015, 07:06 PM

Ah, I see where I was getting confused— I had been reading the FASTQ file incorrectly. A silly oversight on my part, but thanks.

**Julia_S** · 03-24-2015, 08:28 AM

Jossef - I am having the same problem. Could you please explain what exactly was going wrong (you said you had been reading the FASTQ file incorrectly), and what the solution was?
Many thanks!

**GenoMax** · 03-24-2015, 08:50 AM

@Julia_S: I think Jossef was only referring to not correctly interpreting the fastq headers. Not a problem with reading the fastq file itself.

Have you done any trimming/adapter scans on your data? Can you post images of what the problem looks like in your case?

**Julia_S** · 03-24-2015, 08:57 AM

FASTQC shows no adapter content and no overrepresented sequences; per base sequence content is also ok (except the first few bases).
I have 24 samples of human paired-end RNA seq, and for all of them, the kmer pictures look similar to the ones attached.
I am a newbie and completely at loss, so any help would be really appreciated!

Attached Files

**GenoMax** · 03-24-2015, 09:22 AM

I am going to suggest that you go ahead with trimming of data and further downstream analysis. You can re-check data post-trimming with FastQC to see if the k-mer over-representation goes away. Remember to use a paired-end aware trimming program (bbduk from BBMap suite, trimmomatic, cutadapt).

If you are worried about the data take a few sequences and spot check by blast at NCBI to make sure that the data aligns well to human genome.

**Julia_S** · 03-25-2015, 02:55 AM

Originally posted by GenoMax View Post

I am going to suggest that you go ahead with trimming of data and further downstream analysis. You can re-check data post-trimming with FastQC to see if the k-mer over-representation goes away. Remember to use a paired-end aware trimming program (bbduk from BBMap suite, trimmomatic, cutadapt).

If you are worried about the data take a few sequences and spot check by blast at NCBI to make sure that the data aligns well to human genome.

@GenoMax: Thank you! The k-mer overrepresentation is not generally at the start or end of the reads, so I would guess trimming is unlikely to affect it.
Speaking of trimming (and again sorry if this is a stupid question, this is the first time I am analysing RNAseq data) - I have no adapter contamination and the quality of all the bases is in the green area. In that case, I would have thought no (additional) trimming is necessary?

**GenoMax** · 03-25-2015, 04:20 AM

If you don't have adapter contamination then a pass through the trimming program would leave the data intact but if you do have some then you want that part removed anyway.

Your are perhaps right that trimming may not change the k-mer result but the main thing you want to know is how well your data maps. One could have perfect data (great Q scores, no k-mer enrichment) but if it does not map well then it is not useful.

BTW: k-mer module in FastQC only tracks 2% of the total data.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

FASTQC Interpretation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News