TopHat v1.0.12 SAM output question
I am running TopHat v1.0.12 with 40mer paired-end reads, and am having difficulty interpreting the SAM output file (accepted_hits.sam). Here is an example of a pair of output lines that correspond to the mates of a single paired-end read:
PATHBIO-SOLEXA2:2:58:734:942#0 145 chr1 554416 255 40M = 3836 0 ATAATACACACCCTCACCACTACAATCTTCCTAGGAACAA _QS_\^H^`W^Z[Ua^``M`^Jb_UNVZ\a`]baaaaV`K NM:i:1
PATHBIO-SOLEXA2:2:58:734:942#0 97 chrM 3836 255 40M = 554416 0 CCATCATGAACCTTGGCCATAATATGATTTATCTCCACAC `bbbbbbaVX`]aa`Y\X`Y]S_a\a_^XZ_bb`bbaT`` NM:i:1
My question is regarding the MNRM field (the 7th field, set to '=' here). The SAM format specification states that this field is set to the reference of the mate, and will only be set to '=' if the reference of the query sequence is the same as the reference of the mate sequence. In this case the mates map to different contigs, so why is this field set to '='?
When computing coverage (as in the coverage.wig file), do you eliminate mate pairs that have an MPOS field (the 8th field) that does not correspond to its mate?
Also, would it be possible for you to set the ISIZE parameter in the accepted_hits.sam file?
Thanks very much!
I am running TopHat v1.0.12 with 40mer paired-end reads, and am having difficulty interpreting the SAM output file (accepted_hits.sam). Here is an example of a pair of output lines that correspond to the mates of a single paired-end read:
PATHBIO-SOLEXA2:2:58:734:942#0 145 chr1 554416 255 40M = 3836 0 ATAATACACACCCTCACCACTACAATCTTCCTAGGAACAA _QS_\^H^`W^Z[Ua^``M`^Jb_UNVZ\a`]baaaaV`K NM:i:1
PATHBIO-SOLEXA2:2:58:734:942#0 97 chrM 3836 255 40M = 554416 0 CCATCATGAACCTTGGCCATAATATGATTTATCTCCACAC `bbbbbbaVX`]aa`Y\X`Y]S_a\a_^XZ_bb`bbaT`` NM:i:1
My question is regarding the MNRM field (the 7th field, set to '=' here). The SAM format specification states that this field is set to the reference of the mate, and will only be set to '=' if the reference of the query sequence is the same as the reference of the mate sequence. In this case the mates map to different contigs, so why is this field set to '='?
When computing coverage (as in the coverage.wig file), do you eliminate mate pairs that have an MPOS field (the 8th field) that does not correspond to its mate?
Also, would it be possible for you to set the ISIZE parameter in the accepted_hits.sam file?
Thanks very much!
Comment