I have some paired-end Illumina RNAseq data and have run tophat2 on it against the human genome. I would like now like to run tophat2 again to align the unmapped bams on some alternative genomes to check for contamination/infection. To do this I need to convert the unmapped.bam into fastq files.
To do this I do the following:
1) Remove any reads without a matching pair
2) Sort the reads according to name
3) Run tophat's bam2fastx to get fastq
Unfortunately this reports an error:
The problem is that the unmapped.bam file does not seem to have any information in the RNEXT column about the read name of the matched pair. Anyway three steps just to convert the data back to fastqs seems over the top.
Does anyone have any idea how to fix this problem, or provide a better way to do it?
Thanks
To do this I do the following:
1) Remove any reads without a matching pair
Code:
samtools view -f1 -b unmapped.bam > unmapped_paired.bam
Code:
samtools sort -n unmapped_paired.bam unmapped_paired_sort.bam
Code:
bam2fastx -q -Q -A -P -o test unmapped_paired_sort.bam
Code:
Error: couldn't retrieve both reads for pair HISEQ2500-01:110:H7AGVADXX:1:1101:1336:2967. Perhaps the input file is not sorted by name?
Does anyone have any idea how to fix this problem, or provide a better way to do it?
Thanks
Comment