Hi,
This is my very first experience analysing RNAseq data. My goal is to do differential analysis between two strains of a bacteria. So far, i managed to align and produce SAM and BAM files. I'm having problems to annotate and count my reads. Here are the commands that I used.
My reads are from SOLID and hence in colourspace
now I only get 0 counts for all genes. Does this mean that there is something wrong with my alignment files or something wrong with the counting method . And it seems like my .gff (version 3) file was unable to be read by htseq-count and also the java script . I downloaded the gff file from GeneDB and it seems like in many tutorials .gtf files are used instead. So I'm stuck at counting the read part and I really need some help . Help please .
This is my very first experience analysing RNAseq data. My goal is to do differential analysis between two strains of a bacteria. So far, i managed to align and produce SAM and BAM files. I'm having problems to annotate and count my reads. Here are the commands that I used.
My reads are from SOLID and hence in colourspace
Code:
$ nohup solid2fastq.pl 291_01_01 291_01_01-bwa #Convert .csfasta and .qual to .fastq $ nohup bwa index -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta $ nohup bwa aln -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta 291_01_01-bwa.singleF3.fastq 291_01_01-bwa.sai $ perl -ne 'if($_ !~ m/^\S+?\t4\t/){print $_}' 291_01_01-bwa.sam > 291_01_01-bwa.sam.filtered #Convert to SAM file $ samtools sort 291_01_01-bwa.bam 291_01_01-bwa.bam.sorted $ samtools index 291_01_01-bwa.bam.sorted.bam #to produce .rpkm file $ java -jar ~/bin/bam2rpkm-0.06/bam2rpkm-0.06.jar -i 291_01_01-bwa.bam.sorted.bam -f Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.RPKM2.out # i get an error here $ERROR: Problem encountered whilst reading gtf file. Could not interpret line 'GeneDB|Tb427_01_v4 EuPathDB supercontig 1 so i tried different method to count $ htseq-count -i ID 291_01_01-bwa.sam Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.sam_htseq-count #still error $Error occured when processing GFF file (line 37060 of file Tbrucei427_TriTrypDB-4.0.gff): need more than 1 value to unpack and different method $ python make_bed_from_fasta.py ~/Downloads/reference/TbruceiTreu927AnnotatedCDS_TriTrypDB-4.0.fasta > 927_reference.bed #this python script converts .fasta into .bed file since the .gff file cannot be processed $multiBamCov -q 30 -p -bams 291_01_01-bwa.bam.sorted.bam -bed 927_reference.bed > test_counts.txt