Hi, currently I'm working on a Illumina sequencing data in fastq format. I downloaded it from public available database (TCGA) and it was zipped. After unzip and trimming the size of the file is about 16G. Interesting thing comes. After I copied this file to another partition, the size of the new copy became 7.6G. The number of lines in the files, the number of reads and their length distribution are the same in the two files. So I guess the two files have the same content, the new copy is not truncated.
Moreover, when I run Tophat2/Cufflinks with 16G copy, it takes much longer time to finish and the the result looks strange. But it is quite normal with the 7.6G copy. This might not be a bioinformatics question but it's quite interesting. What happened to the file? What might be those additional size in the file?
Thanks a lot.
Moreover, when I run Tophat2/Cufflinks with 16G copy, it takes much longer time to finish and the the result looks strange. But it is quite normal with the 7.6G copy. This might not be a bioinformatics question but it's quite interesting. What happened to the file? What might be those additional size in the file?
Thanks a lot.
Comment