I made an alignment for a 1.1GB file, the RNA-seq reads were in fastq (Illumina) format. The reference was in fasta, so i decided to convert the large file to fasta...
Later i made the bowtie libraries, and then i started. When tophat finished the work, i looked to the output folder, and i saw that the sequences aligned were a 28MB *.bam file, on the other hand, the sequences *rejected* were almost a 350MB *.bam file... I deduce that the file size is proportional to the sequences amount.
So, i'm a rookie on Bioinformatics, and my question is: is it normal to have such file sizes on both files, or i'm just doing it wrong?
Later i made the bowtie libraries, and then i started. When tophat finished the work, i looked to the output folder, and i saw that the sequences aligned were a 28MB *.bam file, on the other hand, the sequences *rejected* were almost a 350MB *.bam file... I deduce that the file size is proportional to the sequences amount.
So, i'm a rookie on Bioinformatics, and my question is: is it normal to have such file sizes on both files, or i'm just doing it wrong?
Comment