Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dcfargo
    Member
    • Aug 2008
    • 22

    Stand Alone Bam to FASTQ

    Does anyone have a suggested best practice utility for this?
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    What are you trying to do?
    Do you want to pull out the reads as FASTQ records?
    Do you care about the strand used for reads which mapped to the reverse stand?
    Do you care about how paired end reads are named?

    You could try seqret from EMBOSS 6.3.0,

    Comment

    • dcfargo
      Member
      • Aug 2008
      • 22

      #3
      I do care about recovery of all of the information.

      I'd like to essentially recover all the initial text information that went into making the BAM file.

      Comment

      • Martin R
        Junior Member
        • May 2010
        • 7

        #4
        the problem you can run into is, that after alignment the quality values might change.

        Comment

        • dcfargo
          Member
          • Aug 2008
          • 22

          #5
          Sorry for my ignorance - why might the quality values change?

          Comment

          • Martin R
            Junior Member
            • May 2010
            • 7

            #6
            That is no problem. It's also some point that confused me, btw is still confusing me.

            The experince I made, ist that the aligned quality values (qv) in the sam files from e.g. bowtie are different from the ones in the original file. I think the values you get after the alignment are the qv from the alignment and not the one from the original file.

            Comment

            • maubp
              Peter (Biopython etc)
              • Jul 2009
              • 1544

              #7
              Originally posted by dcfargo View Post
              I do care about recovery of all of the information.

              I'd like to essentially recover all the initial text information that went into making the BAM file.
              Assuming I have understood your aim, that is not entirely possible.

              e.g. Support you had some paired FASTQ reads like this:

              Code:
              @SRR001666.1/1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
              GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
              +
              IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
              @SRR001666.1/2 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
              AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
              +
              IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
              ...
              All that will be stored in SAM/BAM is the pair name without the suffix, here SRR001666.1, the sequence and quality. You lose any description from the FASTQ lines after the ID. Potentially the alignment tool may hard clip the reads so you don't even get the full sequence and quality.

              If on converting SAM/BAM back to FASTQ you specify suffixes of /1 and /2, the best you can hope to recover is:

              Code:
              @SRR001666.1/1
              GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
              +
              IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
              @SRR001666.1/2
              AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
              +
              IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
              ...
              This may or may not suffice for your needs.

              Comment

              • dcfargo
                Member
                • Aug 2008
                • 22

                #8
                Thanks so much.

                Given some information may be lost and we'll just have to accept that would the best model for conversion be 2 steps such as:

                1) SAMtools for BAM -> SAM

                2) followed by a home made script for SAM -> FASTQ

                Comment

                • Martin R
                  Junior Member
                  • May 2010
                  • 7

                  #9
                  Well you don't have to think that complicated. There are two libraries you can use, and than you have your converter. I e.g. prefer Java and use biojava to read/write FastQ (http://www.biojava.org/wiki/BioJavaownload_1.7.1) and use samtools (http://sourceforge.net/projects/picard/files/) to read BAM/SAM files (it's the same).
                  Then you only have to transform from a SAM Object to a FastQBuilder:

                  public FastqBuilder convert(SAMRecord element2) {
                  FastqBuilder builder = new FastqBuilder();
                  builder.withDescription(element2.getReadName());
                  builder.withQuality(element2.getBaseQualityString());
                  builder.withSequence(element2.getReadString());
                  return builder;
                  }

                  that's the easiest way.

                  good luck

                  Comment

                  • maubp
                    Peter (Biopython etc)
                    • Jul 2009
                    • 1544

                    #10
                    Originally posted by dcfargo View Post
                    Thanks so much.

                    Given some information may be lost and we'll just have to accept that would the best model for conversion be 2 steps such as:

                    1) SAMtools for BAM -> SAM

                    2) followed by a home made script for SAM -> FASTQ
                    Not necessarily.

                    As mentioned above, EMBOSS 6.3.x can do SAM/BAM direct to FASTQ, although it may not do exactly what you want it to do.

                    You could also write a script to go from BAM to FASTQ, for example using pysam to access the samtools C API from Python.

                    Personally I've been doing with SAM/BAM to FASTQ in Biopython (to recover reads to redo a mapping), but this is with an experimental branch and is not ready for general use.

                    Comment

                    • maubp
                      Peter (Biopython etc)
                      • Jul 2009
                      • 1544

                      #11
                      Originally posted by Martin R View Post
                      Well you don't have to think that complicated. There are two libraries you can use, and than you have your converter. I e.g. prefer Java and use biojava to read/write FastQ (http://www.biojava.org/wiki/BioJavaownload_1.7.1) and use samtools (http://sourceforge.net/projects/picard/files/) to read BAM/SAM files (it's the same).
                      Then you only have to transform from a SAM Object to a FastQBuilder:

                      public FastqBuilder convert(SAMRecord element2) {
                      FastqBuilder builder = new FastqBuilder();
                      builder.withDescription(element2.getReadName());
                      builder.withQuality(element2.getBaseQualityString());
                      builder.withSequence(element2.getReadString());
                      return builder;
                      }

                      that's the easiest way.

                      good luck
                      Plus potentially add code to append /1 and /2 if dealing with paired end data.

                      Also I would reverse complement any reads mapped to the reverse strand to recover them in their original orientation pre-mapping.

                      Comment

                      • maubp
                        Peter (Biopython etc)
                        • Jul 2009
                        • 1544

                        #12
                        Originally posted by maubp View Post
                        As mentioned above, EMBOSS 6.3.x can do SAM/BAM direct to FASTQ, although it may not do exactly what you want it to do.
                        Well, EMBOSS 6.3.1 isn't doing what I want it to do


                        This should be resolved in the next patch or point release though


                        Peter
                        Last edited by maubp; 08-03-2010, 01:38 AM. Reason: Adding link

                        Comment

                        • divon
                          Member
                          • Jul 2021
                          • 12

                          #13
                          For the sake of completeness, I will just mention that you can also achieve this with my Genozip program:

                          genozip file.bam <---- compresses the BAM file
                          genocat file.bam.genozip --output file.fq.gz <---- converts it to FASTQ


                          See documentation here: https://genozip.com/sam2fq.html

                          Paper here: https://www.researchgate.net/publica...ata_Compressor

                          Comment

                          • divon
                            Member
                            • Jul 2021
                            • 12

                            #14
                            Hi Andrey, which file can't you open?

                            Comment

                            • divon
                              Member
                              • Jul 2021
                              • 12

                              #15
                              Here's an alternative link: https://genozip.readthedocs.io/sam2fq.html

                              Does this work?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              18 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...