Dear all,
I order to assemble genomes (20x, illumina paired end, Bos taurus) I did the 2 steps (see command below) indicated on the Stampy manual (bwa+samtools and next, remap with stampy). My question concerns the size of the files generated in both steps. For example, for a given genome, the first step (BWA+samtools) generates a file of only 68Go, and 45Go once sorted, while the remapping with stampy generated a file of 215Go (4 time bigger than the original sorted bam). So is it expected that this step (stampy remapping) may generate file 4 time bigger? Or does this indicate an error from myself (parameters, filtering?) or eventually a bad quality of sequencing?
Details of my commands:
step 1 : ("5. faster mapping with BWA"), I generated bam file with bwa & samtools:
bwa mem -t 4 -M $REFERENCE $FASTQ_PATH/$FILE\_R1.fastq.gz $FASTQ_PATH/$FILE\_R2.fastq.gz | samtools view -b -S - > $IO/$FILE.bam
step 2 : I remapped the BAM file using Stampy, and keep only well mapped reads:
stampy.py -g Bt -h Bt -t 8 -o $FILE\_StampyProcessed.bam --bamkeepgoodreads -M $IO/$FILE\_sorted_NODUP.bam
Thanks for the help!
I order to assemble genomes (20x, illumina paired end, Bos taurus) I did the 2 steps (see command below) indicated on the Stampy manual (bwa+samtools and next, remap with stampy). My question concerns the size of the files generated in both steps. For example, for a given genome, the first step (BWA+samtools) generates a file of only 68Go, and 45Go once sorted, while the remapping with stampy generated a file of 215Go (4 time bigger than the original sorted bam). So is it expected that this step (stampy remapping) may generate file 4 time bigger? Or does this indicate an error from myself (parameters, filtering?) or eventually a bad quality of sequencing?
Details of my commands:
step 1 : ("5. faster mapping with BWA"), I generated bam file with bwa & samtools:
bwa mem -t 4 -M $REFERENCE $FASTQ_PATH/$FILE\_R1.fastq.gz $FASTQ_PATH/$FILE\_R2.fastq.gz | samtools view -b -S - > $IO/$FILE.bam
step 2 : I remapped the BAM file using Stampy, and keep only well mapped reads:
stampy.py -g Bt -h Bt -t 8 -o $FILE\_StampyProcessed.bam --bamkeepgoodreads -M $IO/$FILE\_sorted_NODUP.bam
Thanks for the help!
Comment