Hi,
I am using the Picard SamtoFastq function to extract paired fastq files from a bam file. My input is like this:
[Thu Jul 19 00:13:45 PDT 2012] net.sf.picard.sam.SamToFastq INPUT=input.bam FASTQ=out.1.fastq SECOND_END_FASTQ=out.2.fastq VALIDATION_STRINGENCY=LENIENT OUTPUT_PER_RG=false RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
What I've noticed is that the fastq files that are getting extracted or not the same size and actually differ in the number of sequences in each file. I think this has to do with the complication that the input bam file has both single and paired-end reads.
This is apparently when I try to run gsnap on the fastq files and it complains about how the read names do not match up.
Is there a way to force the Picard samtofastq function to extract only paired-end reads?
Thanks
I am using the Picard SamtoFastq function to extract paired fastq files from a bam file. My input is like this:
[Thu Jul 19 00:13:45 PDT 2012] net.sf.picard.sam.SamToFastq INPUT=input.bam FASTQ=out.1.fastq SECOND_END_FASTQ=out.2.fastq VALIDATION_STRINGENCY=LENIENT OUTPUT_PER_RG=false RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
What I've noticed is that the fastq files that are getting extracted or not the same size and actually differ in the number of sequences in each file. I think this has to do with the complication that the input bam file has both single and paired-end reads.
This is apparently when I try to run gsnap on the fastq files and it complains about how the read names do not match up.
Is there a way to force the Picard samtofastq function to extract only paired-end reads?
Thanks
Comment