I came across this apparent bias when recovering FASTQ paired read data from BAM for re-mapping purpose (public data obtained from SRA or new reference genome available).
It seems that if I extract reads from an existing BAM file, the order in which the reads are presented in the BAM (and therefore extracted thereof) will affect the later remapping results!
People having a piece of answer told me that the bias affects computation of the read insert distance during remapping and will 'tend' to reproduce results obtained during the first alignment.
This is second-hand information (thanks a lot to Geraldine who shared this in http://gatkforums.broadinstitute.org...o-fastq-format) but is not very satisfactory and I would like to read a more conclusive discussion.
Thanks to all who know the detailed answer and could make it clear for us all.
It seems that if I extract reads from an existing BAM file, the order in which the reads are presented in the BAM (and therefore extracted thereof) will affect the later remapping results!
People having a piece of answer told me that the bias affects computation of the read insert distance during remapping and will 'tend' to reproduce results obtained during the first alignment.
This is second-hand information (thanks a lot to Geraldine who shared this in http://gatkforums.broadinstitute.org...o-fastq-format) but is not very satisfactory and I would like to read a more conclusive discussion.
Thanks to all who know the detailed answer and could make it clear for us all.
Comment