Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Tomi
    Member
    • Jul 2011
    • 12

    Mate reads to fastq

    Hi,

    I am trying to extract mate reads out of a sam file with following flags: view -b -S -f 8 -F 4 output.sam > mate.bam

    and then with bam2fastq: bam2fastq -o mate#.fastq -f mate.bam

    Unfortunetly i get this error message:


    This looks like paired data from lane 0.
    Output will be in unmapped_1.fastq and unmapped_2.fastq
    1 sequences in the BAM file
    1 sequences exported
    WARNING: 1 reads could not be matched to a mate and were not exported


    Probably someone can help me out?

    Thanks,
    TOmi
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Is there really just one read in the original SAM file, and therefore just 1 in the BAM file?

    Comment

    • Tomi
      Member
      • Jul 2011
      • 12

      #3
      Hi,

      thank you for your reply.
      No both reads are in the sam file (reverse and forward).

      Greetings,
      Tomi

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        Could you share the (start of the) SAM file then? If you post it here, use the [ code ] tags - e.g. via the # icon on the advanced editor.

        Comment

        • Tomi
          Member
          • Jul 2011
          • 12

          #5
          BWA Sampe Output

          Code:
          @SQ     SN:gi|157704448|ref|AC_000133.1|        LN:219475005
          @PG     ID:bwa  PN:bwa  VN:0.5.9-r16
          testSample_0_1  73      gi|157704448|ref|AC_000133.1|   1       37      75M     =       1       0       ATTGACAAGGGGAGGGAAAAGAGGAACAGAAATTCTTTTCTAT$
          testSample_0_2  133     gi|157704448|ref|AC_000133.1|   1       0       *       =       1       0       ATACCCAGGATTTTACCTGTAAAAGTACCCTCAGGTCGTGATT$
          testSample_1_1  77      *       0       0       *       *       0       0       ATCGTCAATAGGGTACTACTTCCATAATTTTGTAAATCCGCATGTTCCTCGAATAATAACGTGGAAC$
          testSample_1_2  141     *       0       0       *       *       0       0       AATGGAAGACCAGATGATTCTACTATATGACTCCAGACTAAGACAAATGCTAGGCTTTGAAAACGCA

          Comment

          • swbarnes2
            Senior Member
            • May 2008
            • 910

            #6
            My guess is that the program is picky about the read names. You'd have to check the source itself to be sure.

            Either the name of read one and read 2 should be the same, or read 1 should end in '/1', and read 2 in '/2'. Try those. The Picard suite also has a program to turn bams to fastqs.

            Comment

            • maubp
              Peter (Biopython etc)
              • Jul 2009
              • 1544

              #7
              As swbarnes2 points out, for a SAM/BAM file both parts of a pair of reads should be recorded with the same template name in column 1 (the suffix /1 or /2 or whatever can optionally be recorded in the tags). The FLAG in column 2 specifies which is the first read and which is the second.

              Your filtered SAM file has four unique identifiers in column 1, therefore no complete pairs.

              Double check the filter options you using with samtools view...

              Comment

              • Tomi
                Member
                • Jul 2011
                • 12

                #8
                Hi, thank you very much for your replies.

                Yes, indeed the names were wrong, so I corrected it.
                Basically I created a sample where I took the first 75bp of a reference chromosome, then I skipped a specific insert size and took the next 75, made the reverse and then the complementery strand out of it.

                Fortunately I get the correct flags for the mapped reads, but I am still not able to export the reads where one is mapped and the second not - I mutated the second strand with a high mutation rate, just to make sure, that he can't map it.

                Do you have an idea why? Is this not possible? I tried it with sam tools like that:
                samtools view -b -S -f 8 -F 4 output.sam > mate.bam

                and then with bam2fastq: bam2fastq -o mate#.fastq -f mate.bam

                He is saying the correct number of mate reads (in that case one), but he still gets the warning I described above.


                Here again the output of sam:
                Code:
                @SQ     SN:gi|157704448|ref|AC_000133.1|        LN:219475005
                @PG     ID:bwa  PN:bwa  VN:0.5.9-r16
                testSample_0    73      gi|157704448|ref|AC_000133.1|   1       37      75M     =       1       0       ATTGACAAGGGGAGGGAAAAGAGGAACAGAAATTCTTTTCTAT$
                testSample_0    133     gi|157704448|ref|AC_000133.1|   1       0       *       =       1       0       TGGCTCTAACAGGCCACGATGGAATAGTCAATAATCACCTCTT$
                testSample_1    99      gi|157704448|ref|AC_000133.1|   76      60      75M     =       226     225     AAATCCAGTTTGTGCCTACGGACATAATCTTTGAATTTGCTTT$
                testSample_1    147     gi|157704448|ref|AC_000133.1|   226     60      75M     =       76      -225    AATAGATTTTCAAATAAGAAAATGAGAGGACATGAGCTTGAGG$
                testSample_2    99      gi|157704448|ref|AC_000133.1|   301     60      75M     =       451     225     CTGACGACCTCCACGTGATTTCAACAATGATTTCAAATATTTC$
                testSample_2    147     gi|157704448|ref|AC_000133.1|   451     60      75M     =       301     -225    TATAATCTATTGGCCATTCACAGCATAGCGTATAAACCTAGCT$
                Thank you very much

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM
                • seqadmin
                  Investigating the Gut Microbiome Through Diet and Spatial Biology
                  by seqadmin




                  The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                  02-24-2025, 06:31 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                17 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                18 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                19 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                185 views
                0 reactions
                Last Post seqadmin  
                Working...