Seqanswers Leaderboard Ad

**kmcarr** · 03-09-2020, 10:38 AM

Rule #1: Do not get hung up on the big red X's in FastQC.

The thresholds which delineate Pass|Warn|Fail for the various metrics in FastQC were set using beautiful, single species, perfectly random and uniform genomic DNA libraries. Things that deviate from this in terms of sampling method, library content and library construction produce false failures. It is likely that the data is perfectly good for your organism(s), given that you are performing a metagenomic experiment with widely variable samples.

You stated that you made these libraries using a Nextera kit. The tagmentation in Nextera library kits is not perfectly random, there is a sequence composition bias for the tagmentation site. Your original (untrimmed) Per Base Sequence content is perfectly normal for Nextera libraries; the bias at the 5' end simply shows the bias of the tagmentation enzyme. There is no need to trim the 5' end but if you want to go ahead.

The highly skewed 3' end in the Per Base Sequence content plot after trimming I have seen before with trimmed reads. I'm not sure if it is an artifact of trimming or of the grouping algorithm in FastQC when it doesn't have enough bases left to include in its default group size of 5bp. (This is purely speculation.)

Regarding the GC content plots, you are sampling a large diversity of bacteria from a variety of very distinct environments. It is totally expected that the bacterial populations in your different environments would have widely variable GC content distributions. This has nothing to do with adapters. Again, the failure is due to FastQC's expectations not matching the reality of the experiment you are performing.

The Adapter content plot is the only one which really shows something you need to address. It is normal (especially for libraries prepared using Nextera kits) to have some fragments shorter than your read length (150bp in your case). Your particular libraries vary from ~20% to 35% in the percentage of fragments < 150bp. Performing 3' adapter trimming is required to remove adapter sequences from these reads.

**yy273826987** · 03-10-2020, 11:17 AM

Dear kmcarr,

Thanks a lot for the reply and explaining the details. Appreciate that!

After reading your response, I understand that the adapter contamination is the only thing that I need to worry about. I have used TrimGalore! to remove the adapters from the 3'-end of the raw reads. However, you also suggested that "Performing 5' adapter trimming is required to remove adapter sequences from these reads." I am a bit confused. Based on my current understanding (maybe I am wrong), in my case, I only have adapters at the 3'-end of the reads. Do we have adapters at both ends (3'- and 5'-)?

Thanks again!

**kmcarr** · 03-10-2020, 11:20 AM

Originally posted by yy273826987 View Post

Dear kmcarr,

Thanks a lot for the reply and explaining the details. Appreciate that!

After reading your response, I understand that the adapter contamination is the only thing that I need to worry about. I have used TrimGalore! to remove the adapters from the 3'-end of the raw reads. However, you also suggested that "Performing 5' adapter trimming is required to remove adapter sequences from these reads." I am a bit confused. Based on my current understanding (maybe I am wrong), in my case, I only have adapters at the 3'-end of the reads. Do we have adapters at both ends (3'- and 5'-)?

Thanks again!

Sorry, that was an error. I meant to type "Performing 3' adapter trimming...."

I have edited my original post to fix this.

**yy273826987** · 03-10-2020, 11:32 AM

Dear kmcarr,

Thanks for the quick response and the clarification.

Here may I have more questions? For my specific case, should I perform assembly before downstream analysis?

Also, after the Quality Control, which software or pipeline would you suggest for me to begin with (for assembly, annotation, taxonomic analysis, and finding functional genes)? I found that there are numerous software and pipelines. As a real newbie, I have a hard time to find which pipeline I shall start with.

Thanks!

**kmcarr** · 03-11-2020, 04:55 AM

Originally posted by yy273826987 View Post

Dear kmcarr,

Thanks for the quick response and the clarification.

Here may I have more questions? For my specific case, should I perform assembly before downstream analysis?

Also, after the Quality Control, which software or pipeline would you suggest for me to begin with (for assembly, annotation, taxonomic analysis, and finding functional genes)? I found that there are numerous software and pipelines. As a real newbie, I have a hard time to find which pipeline I shall start with.

Thanks!

yy2,

The downstream analysis part is a bit outside my area so I'll have to leave that to others to help you.

Cheers.

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

Shotgun Meta of Environ Sam: Per Base Seq Cont Per Seq GC Cont failed aft trimming

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News