Strange peak at beginning of M-bias plots
Hi,
I have generated RRBS read data, which I have filtered and trimmed using Trim Galore and then aligned to a reference genome using Bismark. Three independent sequencing libraries were created with samples randomly mixed among the three sequencing libraries.
All the individuals from one of the libraries have a characteristic peak in their m-bias plots (see attached Bad_M-bias image), where there is a spike in methylation in the first 5 nucleotide positions, before settling at a level of around 60 % CpG methylation. The M-bias plots from samples from the other libraries show relatively stable CpG methylation levels at 60% with no spike at the beginning (see attached Good_M-bias image).
This unusual M-bias profile is also accompanied by a drop in q-value in the same nucleotide position for all the samples from this one specific library.
I had previously ignored this issue as the drop in quality was not severe but this has lead to some issues with downstream analyses (for example SNP-calling from the RRBS reads) that has lead me to revisit my read QC and all these issues concern samples from this particular library.
To this end, does anyone know what could be causing these issues with samples from this specific library and how to go about solving this problem? Would simply trimming the 5' end of the reads in Trim Galore! for all samples regardless of library remedy this problem or would the issues be deeper than this?
Thanks in advance!
Seqanswers Leaderboard Ad
Collapse
X
-
Trim Galore tries to identify read-through adapter contamination, which is the kind of contaminant that will prevent your sequences from aligning at all, or might even cause mis-alignments. For the standard Illumina adapter, this sequence always starts with AGATCGGAAGAGC...
It does not attempt to find or remove any other kind of contaminant or over-represented sequence, because they are normally not as harmful. E.g. an over-represented sequence such as ATCGGAAGAGCACACGTCTGAACTCCAGTCACATGAGGCCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAA might be present in the library, but it will simply not align to a genome, thereby filtering itself out at the alignment stage.
Leave a comment:
-
-
Yes. Thank you.
Also, after trimming my fastq files using default parameters on Trim galore, I continue to see truseq adapters (under the overrepresented sequences tab). eg. ATCGGAAGAGCACACGTCTGAACTCCAGTCACATGAGGCCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAA
Does Trim_galore not recognize all adapters? Or should I explicitly provide these adapters to Trim_galore?
Leave a comment:
-
-
The quality trimming is performed by a sliding window approach across the read like the one that is used by BWA. Copied below is the text from the Cutadapt --help:
-q 3'CUTOFF Trim low-quality bases from 3' ends of reads before adapter removal. …The algorithm is the same as the one used by BWA (see documentation).
In some cases this may mean that if the quality briefly drops below the quality threshold but then comes back up again, the trimming algorithm decides that it’s not too bad after all.
I hope this clears things up?
Leave a comment:
-
-
Hi all,
I am trimming Illumina 1.9 encoded data with Trim-galore, and after Fastqc, the box-plot whiskers under the Per base sequence quality goes all the way to 13 or 14 Phred score.
Here is what I used:
trim_galore --rrbs --paired --length 20 -q 28 --illumina
Why am I getting such a result?
Thanks
Leave a comment:
-
-
Originally posted by pig_raffles View PostI am new to the bioinformatic analysis of RRBS data. I am using Trim Galore! to QC and adapter trim my RRBS read data. I have generated single-end 75bp reads on an Illumina NextSeq.
The default minimum read length parameter in Trim Galore! is 20 bp but I was wondering if there were any practical considerations for alignment/mapping of reads to take into account when choosing a minimum read length and if anyone had any tips on optimizing this parameter?
Leave a comment:
-
-
Choosing minimum RRBS read length in Trim Galore!
I am new to the bioinformatic analysis of RRBS data. I am using Trim Galore! to QC and adapter trim my RRBS read data. I have generated single-end 75bp reads on an Illumina NextSeq.
The default minimum read length parameter in Trim Galore! is 20 bp but I was wondering if there were any practical considerations for alignment/mapping of reads to take into account when choosing a minimum read length and if anyone had any tips on optimizing this parameter?
Leave a comment:
-
-
As long as you merged the R1 and R2 files in the same order (e.g. R1_rep1 R1_rep2, R2_rep1 R2_rep2) it shouldn't matter if you run Trim Galore on the merged files directly or run it first and merge then. All the best!
Leave a comment:
-
-
Run Trim Galore! before or after merging technical replicates
I'm quite new to NGS. We just did 4 lanes (2 lanes twice) of Illumina HiSeq Rapid Run 2x51 RNA sequencing of 24 samples. The bcl to fastq conversion was run for us, so every sample has 4 R1 forward fastq files and 4 R2 reverse files. I merged the technical replicates (merged the 4 R1 files, then merged the 4 R2 files) doing a basic command line cat and append. I also ran FastQC on the individual technical replicates, as well as on the merged files. I now plan to upload my files to the Galaxy pipeline for the remainder of the QA/QC and analysis, and was going to start with Trim Galore. But now I'm wondering if Trim Galore needs to work on the original unmerged technical replicates rather than the merged files. E.g., the quality at the beginning of all our reads was spiky, possibly indicating sequencing of the same sequence, and may need to be trimmed; but can trimming the first n bases of each of the 4 files still be done after the files have been merged? So do I upload the unmerged fastq files and run Trim Galore, and then merge them, or upload the merged files and run Trim Galore? Thank you.
Leave a comment:
-
-
Hi Guorong,
Great that it is working. My thoughts to your other problem are, as I have outlined above already, that you should absolutely not be doing what you are suggesting here. The sequence you are after is the sequence from the start of the read until you hit the small RNA adapter which starts with TGGAATTCT... Everything after that is either adapter that binds to the flowcell or something else you don't want to keep. In any case, the sequence on the 3' end should not align to a genome anyway.
Code:-g ADAPTER, --front=ADAPTER Sequence of an adapter that was ligated to the 5' end.
Code:trim_galore --trim-n file
Leave a comment:
-
-
Hi Felix,
Thank you so much for your new release!
The new features definitely can remove all Ns from the reads! Awesome!
For the question 1, I want to try run cutadapt three times to keep the longer reads.
1: cutadapt -a adapter -q 10 -m 17 --trim-n -o $inputFile".trim.3.fastq" $inputFile".fastq"
2: cutadapt -g adapter -q 10 -m 17 --trim-n -o $inputFile".trim.5.fastq" $inputFile".fastq"
3: cat $inputFile".trim.3.fastq" $inputFile".trim.5.fastq" > $inputFile".trim.fastq"
4: cutadapt -b adapter -q 10 -m 17 --trim-n -o $inputFile".trim.final.fastq" $inputFile".trim.fastq"
5: then keep only one read and delete other one read with the same fastq ID.
The reason why I need to run 3 times is the first run cutadapt will trim the 3' adapter string, then the second run cutadapt will trim the 5' adapter string. After these two runs, some reads in $inputFile".trim.3.fastq" may still have 5' adapter string and some reads in $inputFile".trim.5.fastq" may have 3' adapter string. After I merged these two resulting files, then I run the third run cutadapt to cut either 3' and 5' adapter strings. Since I merged two fastq files and it will have some identical reads, I then scan the $inputFile".trim.final.fastq" to keep only one read and delete the other one with the same fastq ID.
Do you have any suggestions about this solution?
Thanks!
Guorong
Leave a comment:
-
-
Hi Guorong,
I have added the option --trim-n now that should do just what you need. It also adds a few other features:
- Added option '--max_n COUNT' to remove all reads (or read pairs) exceeding this limit of tolerated Ns. In a paired-end setting it is sufficient if one read exceeds this limit. Reads (or read pairs) are removed altogether and are not further trimmed or written to the unpaired output.
- Enabled option '--trim-n' to remove Ns from both end of the reads. Does currently not work for RRBS-mode.
- Added new option '--max_length <INT>' which reads that are longer than <INT> bp after trimming. This is only advised for smallRNA sequencing to remove non-small RNA sequences.
- Replaced 'zcat' with 'gunzip -c' so that older versions of Mac OSX do not append a .Z to the end of the file and subsequently fail because the file is not present. Dah...
- Fixed a typo in adapter auto-detection warning message.
I have moved Trim Galore to Github where you can clone the latest development version: https://github.com/FelixKrueger/TrimGalore.
Leave a comment:
-
-
To 1) The way the sequencing normally works is that you sequence the first base after the 5' adapter, then you sequence the fragment of interest and then you sequence into the adapter on the 3' end. You don't just get the keep the sequences that appears longer and juicier, but you need to keep the sequence of the fragment you wanted to sequence, here the 7bp. Maybe this sequence is just a not very representative example of your entire run because 7bp is also not a typical length of miRNA. I would suggest you run Trim Galore on the file once and then look at the sequence length distribution to see if the majority of the sequences is between 20 and 24bp long.
To 2) I can add it to my list, not quite sure if when I can address it though (we've got a Brexit to stomach right now...)
Cheers, Felix
Leave a comment:
-
-
Hi Felix,
Thank you so much for your response!
For the question 1:
After trimming, the length of the left sequence is only 7nt but the length of the right sequence is 21nt. Obviously I want to keep the 21nt sequence and ignore the 7nt sequence because it is too short. I am not sure if I can directly run Cutadapt using -g option to keep the 21nt sequence instead of 7nt sequence.
For the question 2:
Sure, a single N cannot make a difference for mapping. But for miRNA seq alignment, it is better to remove the unknown nucleotides before alignment because of the sensitivity.
Leave a comment:
-
Latest Articles
Collapse
-
by seqadmin
The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...-
Channel: Articles
Yesterday, 11:48 AM -
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
34 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
42 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
34 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
190 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: