Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mate reads to fastq

    Hi,

    I am trying to extract mate reads out of a sam file with following flags: view -b -S -f 8 -F 4 output.sam > mate.bam

    and then with bam2fastq: bam2fastq -o mate#.fastq -f mate.bam

    Unfortunetly i get this error message:


    This looks like paired data from lane 0.
    Output will be in unmapped_1.fastq and unmapped_2.fastq
    1 sequences in the BAM file
    1 sequences exported
    WARNING: 1 reads could not be matched to a mate and were not exported


    Probably someone can help me out?

    Thanks,
    TOmi

  • #2
    Is there really just one read in the original SAM file, and therefore just 1 in the BAM file?

    Comment


    • #3
      Hi,

      thank you for your reply.
      No both reads are in the sam file (reverse and forward).

      Greetings,
      Tomi

      Comment


      • #4
        Could you share the (start of the) SAM file then? If you post it here, use the [ code ] tags - e.g. via the # icon on the advanced editor.

        Comment


        • #5
          BWA Sampe Output

          Code:
          @SQ     SN:gi|157704448|ref|AC_000133.1|        LN:219475005
          @PG     ID:bwa  PN:bwa  VN:0.5.9-r16
          testSample_0_1  73      gi|157704448|ref|AC_000133.1|   1       37      75M     =       1       0       ATTGACAAGGGGAGGGAAAAGAGGAACAGAAATTCTTTTCTAT$
          testSample_0_2  133     gi|157704448|ref|AC_000133.1|   1       0       *       =       1       0       ATACCCAGGATTTTACCTGTAAAAGTACCCTCAGGTCGTGATT$
          testSample_1_1  77      *       0       0       *       *       0       0       ATCGTCAATAGGGTACTACTTCCATAATTTTGTAAATCCGCATGTTCCTCGAATAATAACGTGGAAC$
          testSample_1_2  141     *       0       0       *       *       0       0       AATGGAAGACCAGATGATTCTACTATATGACTCCAGACTAAGACAAATGCTAGGCTTTGAAAACGCA

          Comment


          • #6
            My guess is that the program is picky about the read names. You'd have to check the source itself to be sure.

            Either the name of read one and read 2 should be the same, or read 1 should end in '/1', and read 2 in '/2'. Try those. The Picard suite also has a program to turn bams to fastqs.

            Comment


            • #7
              As swbarnes2 points out, for a SAM/BAM file both parts of a pair of reads should be recorded with the same template name in column 1 (the suffix /1 or /2 or whatever can optionally be recorded in the tags). The FLAG in column 2 specifies which is the first read and which is the second.

              Your filtered SAM file has four unique identifiers in column 1, therefore no complete pairs.

              Double check the filter options you using with samtools view...

              Comment


              • #8
                Hi, thank you very much for your replies.

                Yes, indeed the names were wrong, so I corrected it.
                Basically I created a sample where I took the first 75bp of a reference chromosome, then I skipped a specific insert size and took the next 75, made the reverse and then the complementery strand out of it.

                Fortunately I get the correct flags for the mapped reads, but I am still not able to export the reads where one is mapped and the second not - I mutated the second strand with a high mutation rate, just to make sure, that he can't map it.

                Do you have an idea why? Is this not possible? I tried it with sam tools like that:
                samtools view -b -S -f 8 -F 4 output.sam > mate.bam

                and then with bam2fastq: bam2fastq -o mate#.fastq -f mate.bam

                He is saying the correct number of mate reads (in that case one), but he still gets the warning I described above.


                Here again the output of sam:
                Code:
                @SQ     SN:gi|157704448|ref|AC_000133.1|        LN:219475005
                @PG     ID:bwa  PN:bwa  VN:0.5.9-r16
                testSample_0    73      gi|157704448|ref|AC_000133.1|   1       37      75M     =       1       0       ATTGACAAGGGGAGGGAAAAGAGGAACAGAAATTCTTTTCTAT$
                testSample_0    133     gi|157704448|ref|AC_000133.1|   1       0       *       =       1       0       TGGCTCTAACAGGCCACGATGGAATAGTCAATAATCACCTCTT$
                testSample_1    99      gi|157704448|ref|AC_000133.1|   76      60      75M     =       226     225     AAATCCAGTTTGTGCCTACGGACATAATCTTTGAATTTGCTTT$
                testSample_1    147     gi|157704448|ref|AC_000133.1|   226     60      75M     =       76      -225    AATAGATTTTCAAATAAGAAAATGAGAGGACATGAGCTTGAGG$
                testSample_2    99      gi|157704448|ref|AC_000133.1|   301     60      75M     =       451     225     CTGACGACCTCCACGTGATTTCAACAATGATTTCAAATATTTC$
                testSample_2    147     gi|157704448|ref|AC_000133.1|   451     60      75M     =       301     -225    TATAATCTATTGGCCATTCACAGCATAGCGTATAAACCTAGCT$
                Thank you very much

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Working...
                X