Hi:
while working with COAD TCGA BAM files, I find the very annoying to find PE reads. These files are mashed up and not consistent.
for example:
1. read lengths are not consistent. Some are 34 some 76 reads.
2. Many reads miss mate or pair.
I want to identify novel splicing differences however TCGA BAM files are mapped to known transcripts (known exon pairing from known isoforms gtf) thus limiting the discovery of novel isoforms.
I decided convert BAM to fastq and realign to full genome.
While doing this, because of loss of many pair and mates in bam, I converted them to single end fastq.
Any ideas if converting a paired-end bam to single end fastq pose any problem in philosophical ways.
thanks
while working with COAD TCGA BAM files, I find the very annoying to find PE reads. These files are mashed up and not consistent.
for example:
1. read lengths are not consistent. Some are 34 some 76 reads.
2. Many reads miss mate or pair.
I want to identify novel splicing differences however TCGA BAM files are mapped to known transcripts (known exon pairing from known isoforms gtf) thus limiting the discovery of novel isoforms.
I decided convert BAM to fastq and realign to full genome.
While doing this, because of loss of many pair and mates in bam, I converted them to single end fastq.
Any ideas if converting a paired-end bam to single end fastq pose any problem in philosophical ways.
thanks
Comment