Hello,
I am using Newbler 2.6 to do a de novo cDNA assembly with 454 reads. The program is giving me a warning about possible primer contamination (i.e. TGTTTTTTTTTCT). I checked the assembly and found about 250 contigs (out of 20,000 in total) with the reported primer sequence. We used the MINT cDNA synthesis kit and the reported sequence seems to be part of the MINT kit primers. I could use the -vt flag with runAssembly and provide a fasta file with the primer sequences to trim the reads but would this be correct?
Reason for not trimming:
> the primer is part of the mRNA and therefore should not me removed - I will loose some information
Reasons for trimming:
> the primer sequence could lead to incorrect asemblies
> the primer sequence might be part of the mRNA but not the protein - this region could cause false positives in blast searches
I know that RNAseq data have this characteristic bias (e.g. random hexamer primer) but I think nobody is trimming the read because of it. I could assemble the reads without trimming and remove the contigs with the primer sequence not at the end.
Is anybody willing to share his thoughts or experience on this? I would appreciate your help. Thanks!
I am using Newbler 2.6 to do a de novo cDNA assembly with 454 reads. The program is giving me a warning about possible primer contamination (i.e. TGTTTTTTTTTCT). I checked the assembly and found about 250 contigs (out of 20,000 in total) with the reported primer sequence. We used the MINT cDNA synthesis kit and the reported sequence seems to be part of the MINT kit primers. I could use the -vt flag with runAssembly and provide a fasta file with the primer sequences to trim the reads but would this be correct?
Reason for not trimming:
> the primer is part of the mRNA and therefore should not me removed - I will loose some information
Reasons for trimming:
> the primer sequence could lead to incorrect asemblies
> the primer sequence might be part of the mRNA but not the protein - this region could cause false positives in blast searches
I know that RNAseq data have this characteristic bias (e.g. random hexamer primer) but I think nobody is trimming the read because of it. I could assemble the reads without trimming and remove the contigs with the primer sequence not at the end.
Is anybody willing to share his thoughts or experience on this? I would appreciate your help. Thanks!
Comment