Unconfigured Ad

**GenoMax** · 08-05-2014, 06:14 AM

FastQC is appropriate for QC of PE reads.

It would be better if you post screenshots/images of the FastQC results instead of just descriptions. Having something marked as "fail" does not automatically fail the entire sample. It is possible that the analysis done by your provider may not have removed all adapter dimers etc.

**Isa0984** · 08-06-2014, 01:23 AM

Ok, here are the images of FastQC...

Attached Files

**dpryan** · 08-06-2014, 01:34 AM

Read 2 often has decreased quality at its 3' end. A bit of trimming can easily get rid of that.

BTW, they likely sent you untrimmed sequences and aligned trimmed sequences, which is why fastQC is telling you that the raw sequences still have adapter contamination.

Also, a fail on duplication level is pretty much expected for RNAseq data (that test is really only meant for whole-genome sequencing).

**Isa0984** · 08-06-2014, 03:15 AM

Thanks a lot, that looks for me that the quality check makes not really sense then, its more or less good for the per base quality... ?

**dpryan** · 08-06-2014, 03:28 AM

Yeah, just do a bit of quality/adapter trimming (e.g., with trimmomatic or trim_galore) and you should be good to go.

**Isa0984** · 08-06-2014, 03:36 AM

But can I be shure that the company used trimmed data for mapping? Maybe they didnt, how can I check this?

**dpryan** · 08-06-2014, 03:43 AM

Just look at the read lengths in the BAM file:

Code:

samtools view some_file.bam | cut -f 10 | awk '{print length($1)}' | uniq | sort | uniq

If they trimmed the reads prior to alignment, you should get more than one value.

**Isa0984** · 08-06-2014, 04:08 AM

I will, but unfortunately I cant do this from my private computer so I have to wait until I am back at the institute... but many thanks already at this point.

**Isa0984** · 03-18-2015, 10:14 AM

Hello, its long time ago, but still/again present for me... It was not possible for me to check the data again at the institute with samtools, but shoudn't I see the same (different read sizes) if I look with IGV to my data? That in fact gives me for all reads the same size of 51 bases, which means the campany didn't trimm the data before mapping... am I right? Thanks for your help! Isabelle

**dpryan** · 03-18-2015, 11:07 AM

Yes, it sounds like they didn't trim them then. Scroll through IGV and see if there are any soft-clipped alignments (alignments that appear shorter but where the original sequence is 51). Using an aligner that does soft-clipping alleviates some of the issues surrounding adapter contamination and quality. If, however, they did end-to-end alignment (i.e., there are no soft-clipped alignments) on untrimmed data then I'd say they did a half-ass job.

**Isa0984** · 03-19-2015, 02:34 AM

Hey, thanks for the fast replay. I found some shorter ones... they did it with the -q option of BWA.
When I asked them for the mapping parameters I got following answer:

n NUM max #diff (int) or missing prob under 0.02 err rate
t:4 (number of threads)
M:3 (mismatch penalty)
q: (quality threshold for read trimming down to 35bp 0)

I am not shure if I understand this 35bp thing, because I can find reads with a length less then 35bp (The 0 is maybe a typing error)?
Another question is, how can I get alignments like that (see figure)??? If you have n=0.02, shouldtn there at most 2 mismatches per 50 bp? Isabelle

Attached Files

igv_snapshot.png (16.2 KB, 120 views)

**dpryan** · 03-19-2015, 04:16 AM

Can't say I'm overly familiar with bwa aln, since most people use bwa mem these days.

The -n option has to have one of the more confusing descriptions I've seen. If it's an integer then the explanation is simple. I assume that it uses a poisson distribution with fractional -n, so a value of 0.02 with 50bp reads would correspond to a maximal edit distance of 3 (in R: qpois(0.98, 50*0.02)).

The -q option in bwa aln doesn't really specify a minimum read length. It specifies a value used when determining the trim location:

The -q value is INT and the quality at position i is q_i. So, this basically sums the penalties and finds the maximum value. The position with the maximum value is where trimming will occur (essentially, obviously if the penalty is <0 then no trimming should occur).

**Isa0984** · 03-19-2015, 05:40 AM

Ok, I think I got the -q option, its just the information of the company, which is strange, maybe they mean a quality treshold of 35...
But the -n value is absolutely confusing... I was reading a lot of threads about this topic, but still. If in my case the maximal edit distance is 3, what does that mean??? Is there any relation to the allowed amount of mismatches?

**dpryan** · 03-19-2015, 05:56 AM

They are related, yes. "Edit distance" is a generalization of mismatches. If a read aligns with 3 mismatches then its edit distance is 3. However mismatches can't describe things like insertions or deletions. So if your read aligns with an insert of 2 bases then it has an edit distance of 2. If it has a single base mismatch and later a deletion of 3 bases then the edit distance is 4. The wikipedia article on edit distance is quite good. In short, "edit distance" is the minimum number of single character changes (insertion, deletion, or substitution) needed to convert one sequence to another.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, Yesterday, 10:09 AM	0 responses 10 views 0 reactions	Last Post by SEQadmin2 Yesterday, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

FastQC

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News