I don't have much of a background in bioinformatics so any help that you can give me in figuring this out would be much appreciated.
I have paired end strand specific RNA seq data (sequenced with Illumina) from a non-model organism (Tasmanian Devil if you want to know), and I have a reference genome against which I can map the reads to. What I want to be able to do with this data is look for novel transcripts and non-coding RNAs, and visualise the data using IGV.
This is what I've been doing thus far:
I indexed the ref genome and used trimmomatic to trim the RNA reads, then I used tophat as follows:
tophat2 -r 60 -p 12 -G /path/to/GFF_file/Fgenesh_EST_aligned.gff -o /path/to/output/ --library-type fr-firststrand /path/to/ref_genome/421K17 /path/to/Trimmed/OP1_1_L4.fastq /path/to/Trimmed/OP1_2_L4.fastq
Then I used samtools to sort the accepted hits file and index the sorted file:
samtools sort accepted_hits.bam accepted_hits.sort
samtools index accepted_hits.sort.bam
However when I look at the data on IGV this is what I see (http://imgur.com/mnzm5hn and http://imgur.com/FymviY7)
(settings: view as pairs; color alignments by first of pair strand; sort by mapping quality)
I have also tried using flags with samtools to try and get the strand specific transcripts as follows
samtools view -b -f 99 accepted_hits.bam > acc_hits1.bam
samtools view -b -f 147 accepted_hits.bam > acc_hits2.bam
samtools view -b -f 83 accepted_hits.bam > acc_hits3.bam
samtools view -b -f 163 accepted_hits.bam > acc_hits4.bam
samtools merge file_1.bam acc_hits1.bam acc_hits2.bam
samtools merge file_2.bam acc_hits3.bam acc_hits4.bam
I converted these to the sam format, indexed it and viewed it with IGV and this is what i got (http://imgur.com/6bCWFGq). Essentially file_1.sam and file_2.sam look almost the same, just that one has more reads than the other, but they appear to me mapping to essentially the same place.
Is there something obvious that I have missed, either in the way I have prepared the data, or perhaps the way in which I'm viewing it on IGV?? Any help that you can give me in this would be much appreciated.
Thanks!!
I have paired end strand specific RNA seq data (sequenced with Illumina) from a non-model organism (Tasmanian Devil if you want to know), and I have a reference genome against which I can map the reads to. What I want to be able to do with this data is look for novel transcripts and non-coding RNAs, and visualise the data using IGV.
This is what I've been doing thus far:
I indexed the ref genome and used trimmomatic to trim the RNA reads, then I used tophat as follows:
tophat2 -r 60 -p 12 -G /path/to/GFF_file/Fgenesh_EST_aligned.gff -o /path/to/output/ --library-type fr-firststrand /path/to/ref_genome/421K17 /path/to/Trimmed/OP1_1_L4.fastq /path/to/Trimmed/OP1_2_L4.fastq
Then I used samtools to sort the accepted hits file and index the sorted file:
samtools sort accepted_hits.bam accepted_hits.sort
samtools index accepted_hits.sort.bam
However when I look at the data on IGV this is what I see (http://imgur.com/mnzm5hn and http://imgur.com/FymviY7)
(settings: view as pairs; color alignments by first of pair strand; sort by mapping quality)
I have also tried using flags with samtools to try and get the strand specific transcripts as follows
samtools view -b -f 99 accepted_hits.bam > acc_hits1.bam
samtools view -b -f 147 accepted_hits.bam > acc_hits2.bam
samtools view -b -f 83 accepted_hits.bam > acc_hits3.bam
samtools view -b -f 163 accepted_hits.bam > acc_hits4.bam
samtools merge file_1.bam acc_hits1.bam acc_hits2.bam
samtools merge file_2.bam acc_hits3.bam acc_hits4.bam
I converted these to the sam format, indexed it and viewed it with IGV and this is what i got (http://imgur.com/6bCWFGq). Essentially file_1.sam and file_2.sam look almost the same, just that one has more reads than the other, but they appear to me mapping to essentially the same place.
Is there something obvious that I have missed, either in the way I have prepared the data, or perhaps the way in which I'm viewing it on IGV?? Any help that you can give me in this would be much appreciated.
Thanks!!
Comment