Hi, I am a new comer, and I am working on RNA-seq these days and want to further analyse the data produced in previous study. But I am confused by the data in the NCBI.
The file size varies a lot in the same species. some only about 500mb, some could up to 10Gb. Why are there such a difference? These difference even occur in the different run in same sample. for example in this http://www.ncbi.nlm.nih.gov/sra/SRX000571?report=full. the data in two runs is totally different:
# Run # of Spots # of Bases
1. SRR002321 54,856,271 2G
2. SRR002323 14,761,931 531.4M
I choose the bigger one SRR002321 and fed it into tophat using command :
$ tophat -p 8 ./bowtie/bowtie-0.12.5/indexes/genome SRR002321.fastq
( the ../indexes/genome is from NCBI build 37.2)
but failed this is the error code:
Error: segment-based junction search failed with err =-9
any guys know what ''err= -9'' means?
Then I tried another file SRR002320 in http://www.ncbi.nlm.nih.gov/sra/SRX000605 ( this file and SRR002321 are in the same study) with the same code:
$tophat -p 8 ./bowtie/bowtie-0.12.5/indexes/genome SRR002320.fastq
this time everything is ok.
I checked the log and found there was less than 50% seqences been aligned to the reference genome. Here is the text in bowtie.left_kept_reads.fixmap.log:
# reads processed: 38255195
# reads with at least one reported alignment: 17219294 (45.01%)
# reads that failed to align: 20387248 (53.29%)
# reads with alignments suppressed due to -m: 648653 (1.70%)
Reported 35237526 alignments to 1 output stream(s)
Anyone knows how to improve the results?
The file size varies a lot in the same species. some only about 500mb, some could up to 10Gb. Why are there such a difference? These difference even occur in the different run in same sample. for example in this http://www.ncbi.nlm.nih.gov/sra/SRX000571?report=full. the data in two runs is totally different:
# Run # of Spots # of Bases
1. SRR002321 54,856,271 2G
2. SRR002323 14,761,931 531.4M
I choose the bigger one SRR002321 and fed it into tophat using command :
$ tophat -p 8 ./bowtie/bowtie-0.12.5/indexes/genome SRR002321.fastq
( the ../indexes/genome is from NCBI build 37.2)
but failed this is the error code:
Error: segment-based junction search failed with err =-9
any guys know what ''err= -9'' means?
Then I tried another file SRR002320 in http://www.ncbi.nlm.nih.gov/sra/SRX000605 ( this file and SRR002321 are in the same study) with the same code:
$tophat -p 8 ./bowtie/bowtie-0.12.5/indexes/genome SRR002320.fastq
this time everything is ok.
I checked the log and found there was less than 50% seqences been aligned to the reference genome. Here is the text in bowtie.left_kept_reads.fixmap.log:
# reads processed: 38255195
# reads with at least one reported alignment: 17219294 (45.01%)
# reads that failed to align: 20387248 (53.29%)
# reads with alignments suppressed due to -m: 648653 (1.70%)
Reported 35237526 alignments to 1 output stream(s)
Anyone knows how to improve the results?
Comment