Hi, everyone. I am new in the NGS technology. I need some help about tophat and cufflinks.
Currently, I want to get some assembled alignment results of homo sapein whole genome sequencing data. I downloaded the sra file of e coli whole genome sequencing data from the NCBI SRA website (http://www.ncbi.nlm.nih.gov/sra) and convert it to fastq files via sra toolkit. I installed bowtie2, tophat and cufflinks on a EC2 machine. I tried two pre-built reference genome files from the NCBI and UCSC, respectively(http://support.illumina.com/sequenci...e/igenome.html). I used tophat to do the alignment and I can get the results, 'accepted_hit.bam'. But it is very small, only 1024K. The "unmapped.bam" is huge, 44G. Then I ran the cufflinks for both, mapped and unmapped. I got almost empty two "transcripts.gtf"s. Could you help me with these. Thanks a lot!
P.S. Here are the command line I used
SRA convert: fastq-dump SRR1171946.sra
tophat: tophat -p -8 genome SRR1171946.fataq
cufflinks: time cufflinks -p 8 accepted_hit.bam (unmapped one)
Currently, I want to get some assembled alignment results of homo sapein whole genome sequencing data. I downloaded the sra file of e coli whole genome sequencing data from the NCBI SRA website (http://www.ncbi.nlm.nih.gov/sra) and convert it to fastq files via sra toolkit. I installed bowtie2, tophat and cufflinks on a EC2 machine. I tried two pre-built reference genome files from the NCBI and UCSC, respectively(http://support.illumina.com/sequenci...e/igenome.html). I used tophat to do the alignment and I can get the results, 'accepted_hit.bam'. But it is very small, only 1024K. The "unmapped.bam" is huge, 44G. Then I ran the cufflinks for both, mapped and unmapped. I got almost empty two "transcripts.gtf"s. Could you help me with these. Thanks a lot!
P.S. Here are the command line I used
SRA convert: fastq-dump SRR1171946.sra
tophat: tophat -p -8 genome SRR1171946.fataq
cufflinks: time cufflinks -p 8 accepted_hit.bam (unmapped one)
Comment