Hi all,
I just started to work with NGS data and I have lots of questions/problems about this so I decided to ask for some advice.
I want to analyze SNP and indels of an organism without reference genome. My idea is to assemble de novo the transcriptome with Velvet/oases, use it with bowtie/tophat to map the reads and call the SNP with samtools. Do you think it's a valid protocol for my purpose?
My main doubt is about data preprocessing. I work with Illumina PE reads. After quality filtering and trimming I have two files of 24M 73bp reads each and with a 75% duplication level. I thought best option was to remove duplicated reads of each file but I'm having problems with this. I used some tools as fastx-collapser or other scripts available in the web but are not prepared to deal with PE data. Any suggestion to remove duplicated reads in PE or it's not necessary?
Thanks a lot for your help
I just started to work with NGS data and I have lots of questions/problems about this so I decided to ask for some advice.
I want to analyze SNP and indels of an organism without reference genome. My idea is to assemble de novo the transcriptome with Velvet/oases, use it with bowtie/tophat to map the reads and call the SNP with samtools. Do you think it's a valid protocol for my purpose?
My main doubt is about data preprocessing. I work with Illumina PE reads. After quality filtering and trimming I have two files of 24M 73bp reads each and with a 75% duplication level. I thought best option was to remove duplicated reads of each file but I'm having problems with this. I used some tools as fastx-collapser or other scripts available in the web but are not prepared to deal with PE data. Any suggestion to remove duplicated reads in PE or it's not necessary?
Thanks a lot for your help
Comment