I am a newbie to RNA-Seq.
Currently,I am doing the work about a bacterial.
I got the pair-end rna-seq data sequenced by Illumina.
I have mapped the data to the genome using bowtie, and I filtered the unmapped and ambiguously mapped reads.
Then the reads are all unambiguously mapped, and I pileuped the bam files.
The first thing I want to do is to determine the transcript start site.
So, I have a look at the upstream of the predicted gene.
At the beginning, I thought, If there are problem about the start site of the predicted gene, I will find the continuous reads at that point. The only thing I should do is extend the start site to the point whose coverage is 0.
In practice, I will get a very long sequence if I do that way.
Maybe there are some noise, and I should set the cut condition big than 0,ie.1,2,3...
However I don't know which number is suitable.
In my opinion,the reads is 90 bp, if it can be mapped to the genome,it shouldn't be noise,and if there are more than one reads mapped to the region,is the region really be transcripted? I am very confused.
Please help me.
Currently,I am doing the work about a bacterial.
I got the pair-end rna-seq data sequenced by Illumina.
I have mapped the data to the genome using bowtie, and I filtered the unmapped and ambiguously mapped reads.
Then the reads are all unambiguously mapped, and I pileuped the bam files.
The first thing I want to do is to determine the transcript start site.
So, I have a look at the upstream of the predicted gene.
At the beginning, I thought, If there are problem about the start site of the predicted gene, I will find the continuous reads at that point. The only thing I should do is extend the start site to the point whose coverage is 0.
In practice, I will get a very long sequence if I do that way.
Maybe there are some noise, and I should set the cut condition big than 0,ie.1,2,3...
However I don't know which number is suitable.
In my opinion,the reads is 90 bp, if it can be mapped to the genome,it shouldn't be noise,and if there are more than one reads mapped to the region,is the region really be transcripted? I am very confused.
Please help me.
Comment