Depending on the type of libraries that were made high duplication levels can be normal. In total RNA preps, rRNAs will be so abundant that they will show lots of duplicates. Many highly abundant mRNA transcripts will cause the same thing in other types of preps. Often times if you take overrepressented sequences from fastqc and blast them, you’re going to find rRNAs or even other transcripts at times. Remember fastqc only checks the first 50bp of 200K reads for its duplication check and its only taking into account one side of the pair. Picardtools markduplicates will give you the kind of duplication checks you really need.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by GenoMax View PostJust to confirm: These samples have already been demultiplexed, correct? The only time I remember seeing that sort of pattern at the end of the read is for the "tag".
I'll read Simon's article about duplication level. Thank you!
Comment
-
Originally posted by zzhao2 View PostWhat I got from the core facility is the separate fastq files, a R1 and a R2 file per sample. So I assume that the samples have been demultiplexed. If in case the tags are left in the sequences, is there anyway to check it?
Comment
-
Originally posted by mastal View PostTry running FastQC using the --nogroup parameter, it will let you see how many bases at the end of the read are affected, it could be only the last base, which often has a much lower quality than the rest of the read.
As for the trimmomatic trimming, how long are the adapter sequences you are using for palindrome trimming?
With a threshold score of 30 for palindrome trimming, if each matching base adds 0.6 to the score (see the trimmomatic web page), unless the sequences in your adapter.fasta file are quite long, trimmomatic will not recognise and trim the adapter sequences.
You can use grep to see how many adapter sequences are in your reads before and after trimmomatic trimming.
Comment
-
Originally posted by Wallysb01 View PostPicardtools markduplicates will give you the kind of duplication checks you really need.
Comment
-
Here are the plots with the --nogroup option that I forgot to attach.
Comment
-
Just FYI, I tried trimmomatic with different palindrome clip thresholds including 10,15,20, and 30, and all gave me very similar numbers of dropped sequences. I think this is consistent with the following words in trimmomatic's manual:
"For palindromic matches, a longer alignment is possible, as described above. Therefore this
threshold can be higher, in the range of 30. Even though this threshold is very high (requiring
a match of almost 50 bases) Trimmomatic is still able to identify very, very short adapter
fragments."
So it sounds like 30 should be OK, and based on my tests different thresholds didn't affect the dropped sequences that much, so I would assume that they also output similar trimming results.
Comment
-
Hi,
Just another line of thought for trimming or not http://journal.frontiersin.org/Journ...014.00017/full
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:06 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Today, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Comment