Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use SAM file to pull reads from FASTQ

    Hi Folks,

    I have a SAM file with unpaired reads (originally from a FASTQ) and I would like to use it to pull the read and its pair from the FASTQ file - does anyone know if there is a script out there to do this?

    I have used the Picard tools SamToFastq but to my knowledge there is not a script in Picard or SamTools to do exactly what I described here (or maybe there is and I just haven't found it!).

    Thank you!

  • #2
    If the reads are unpaired, how can you pull their mate?

    Juts work out the read names you desire, and write a short script to fish those reads out.

    Comment


    • #3
      ^ good point...i have a feeling there is some miswording in the question.

      I'd try to truncate the file through some form of filtering (maybe samtools or bamtools) and then use one of the sam/bam to fastq conversion scripts.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment


      • #4
        Yes, sorry for the unclear wording. The SAM file is a result of mapping paired end reads to a reference. I have a SAM file with mapped mated pairs that I was able to convert to a FASTQ which worked great. But I also have a SAM file with mapped unmated pairs - it is with this file that I would like to use to pull the reads that mapped (but their "mate" did not) and their pair from the original FASTQ files.

        Ideally the output would be these pairs in a FASTQ file.

        Comment


        • #5
          You should be able to extract those alignments as long as the aligner you used set the flags right. The unmapped mates will have 0x4 set and the mapped mates should have 0x8 set. You might need to name sort the bam first but then you could pull out only those reads with this:
          Code:
          samtools view -f 0xC -b alignments.bam > singletons.bam
          Then convert that bam file into fastq.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #6
            Actually that will also pull out all unaligned reads in addition to your singleton alignments. So more filtering will be necessary. Pairs make this tricky because the SAM annotation of pairs is messy.
            /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
            Salk Institute for Biological Studies, La Jolla, CA, USA */

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Exploring the Dynamics of the Tumor Microenvironment
              by seqadmin




              The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
              07-08-2024, 03:19 PM
            • seqadmin
              Exploring Human Diversity Through Large-Scale Omics
              by seqadmin


              In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
              06-25-2024, 06:43 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 07-10-2024, 07:30 AM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-03-2024, 09:45 AM
            0 responses
            201 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-03-2024, 08:54 AM
            0 responses
            212 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-02-2024, 03:00 PM
            0 responses
            194 views
            0 likes
            Last Post seqadmin  
            Working...
            X