Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read direction lost with BWA in SAM output?

    I've tried two different header styles in my input FASTQ headers when running BWA:

    @SN7001163:162:C4A1UACXX:1:1101:1062:2076/1

    and

    @SN7001163:162:C4A1UACXX:1:1101:1062:2076 1:N:0:GTCCGCA

    My goal is to be able to tell which mate I'm looking at in the FASTQ file, but it seems to get stripped in the SAM output, where from "bwa sampe" I get lines like this:

    Code:
    SN7001163:162:C4A1UACXX:1:1101:1062:2076	77	*	0	0	*	*	0	0	GTTTGCTTGGCTGTGAGCTTGTCCGACACGGGCCACCAGGAGAGTGAGATACACCGAGACGAGCATCCTGTCTTTCTCTCGGACGGTTCCACAACAAATAA	@?@DDD?;<;F>?<2A<E<FFC9:FE8):8@?FFF@FF=;=;D;).).7>77==EB'93;;3=@@:@(:3,+(4::@B>@5?-<@B<?<34>ABB1<8:43
    SN7001163:162:C4A1UACXX:1:1101:1062:2076	141	*	0	0	*	*	0	0	GCCATGTTGAGTGAGAATTTATTATTTGTTGTGGAACC	;<;;(42@9)@)84):46=69416)2@:@:<=1(66@?
    How can I tell which of these alignment lines refers to which input mate?

  • #2
    I realize that those two reads didn't actually align, so the SAM lines were pretty minimal. Here are a pair which did:

    Code:
    SN7001163:162:C4A1UACXX:1:1101:1174:2116        81      Locus_14841_Transcript_1__1_Confidence_0.750_Length_603 292     37      101M    =       294     -99     CTCGTCATTTCAATGCCCCCTCTCATATCAGAAGGAAAATCATGAGTGCTCCTTTGTCAAAAGAGCTGAGAGCAAAGTACAATGTGAGAAGTATGCCCATT   >BBDDDDDDDDDDDBDFFHHHHIIHJJJJJJJJJJJJJJIIJJJJJJJJJJJJJJIIIJJJJIJJJJJJJIJJJJJJJHJJJIJJJJJHHHHHFFFFFCCC   XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101
    SN7001163:162:C4A1UACXX:1:1101:1174:2116        161     Locus_14841_Transcript_1__1_Confidence_0.750_Length_603 294     37      101M    =       292     99      CGTCATTTCAATGCCCCCTCTCATATCAGAAGGAAAATCATGAGTGCTCCTTTGTCAAAAGAGCTGAGAGCAAAGTACAATGTGAGAAGTATGCCCATTAG   BCBFFFFFHHHH?HIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHIJJJJGHIIHHIJJJJJJJJJIJJJJJHHHHHHFFFFFFFEEEEEEEEDDDDDDDC   XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:99C1

    Comment


    • #3
      [original post deleted because I misunderstood the question]

      gingers answer below is correct. You can simply confirm this by swapping R1 and R2 reads.
      Last edited by WhatsOEver; 05-11-2017, 12:36 AM.

      Comment


      • #4
        There are flags in the SAM file for the first (and last) read attached to a template sequence. If a bitwise and of the flag field with 0x40 returns non-zero, then it is the first read of a template sequence. In the case of the two examples you have, here is the full flag breakdown:

        Code:
        81 = 0101 0001
                     Paired
                Reverse-complemented
              [B]First read in the template[/B]
        
        161 = 1010 0001
                      Paired
                Other read is reverse-complemented
              [B]Last read in the template[/B]
        See https://samtools.github.io/hts-specs/SAMv1.pdf

        These flags can be filtered using samtools view:

        Code:
        samtools view -b -f 0x40 in.bam > out_FirstRead.bam
        samtools view -b -F 0x40 in.bam > out_notFirst.bam
        The distinction between "last" and "second" is not important for most purposes, but there are some situations where more than two reads can be associated with the same template sequence.

        Whether or not a read is first or last is particularly important for strand-specific sequencing, because it allows you to distinguish between templates that are oriented in the same direction as the primary transcript, and those that are not (e.g. siRNA).
        Last edited by gringer; 05-11-2017, 12:14 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-07-2024, 06:58 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:18 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:04 AM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Working...
        X