I ran the three fastq I downloaded from 1000g ftp site against hg19 using gsnap. For the single read file, I obtained 1.bam.
gsnap -A sam -B 4 -t 6 --gunzip -D /tank/gsnap/hg19 -d hg19 ../../NA12878/SRR098401.filt.fastq.gz | samtools view -bS - > 1.bam
Then I got 2.bam from the pair ended file:
gsnap -A sam -B 4 -t 6 --gunzip -D /tank/gsnap/hg19 -d hg19 ../../NA12878/SRR098401_1.filt.fastq.gz ../../NA12878/SRR098401_2.filt.fastq.gz | samtools view -bS - > 2.bam
I merged the two to become 3.bam
samtools merge 3.bam 1.bam 2.bam
Finally, I sorted 3.bam to become SRR098401_gsnap.bam
samtools sort 3.bam SRR098401_gsnap
But when I looked at the sizes of the files, I noticed that the sorted bam is 7.5% smaller than the unsorted 3.bam. How come?
-rwxrwxrwx 1 root root 61107717 Jul 17 07:34 1.bam
-rwxrwxrwx 1 root root 19908769734 Jul 17 00:27 2.bam
-rwxrwxrwx 1 root root 19969841372 Jul 17 08:40 3.bam
-rwxrwxrwx 1 root root 18525917076 Jul 17 09:54 SRR098401_gsnap.bam
gsnap -A sam -B 4 -t 6 --gunzip -D /tank/gsnap/hg19 -d hg19 ../../NA12878/SRR098401.filt.fastq.gz | samtools view -bS - > 1.bam
Then I got 2.bam from the pair ended file:
gsnap -A sam -B 4 -t 6 --gunzip -D /tank/gsnap/hg19 -d hg19 ../../NA12878/SRR098401_1.filt.fastq.gz ../../NA12878/SRR098401_2.filt.fastq.gz | samtools view -bS - > 2.bam
I merged the two to become 3.bam
samtools merge 3.bam 1.bam 2.bam
Finally, I sorted 3.bam to become SRR098401_gsnap.bam
samtools sort 3.bam SRR098401_gsnap
But when I looked at the sizes of the files, I noticed that the sorted bam is 7.5% smaller than the unsorted 3.bam. How come?
-rwxrwxrwx 1 root root 61107717 Jul 17 07:34 1.bam
-rwxrwxrwx 1 root root 19908769734 Jul 17 00:27 2.bam
-rwxrwxrwx 1 root root 19969841372 Jul 17 08:40 3.bam
-rwxrwxrwx 1 root root 18525917076 Jul 17 09:54 SRR098401_gsnap.bam
Comment