Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JenBarb
    Member
    • Oct 2010
    • 47

    extract forward and reverse reads?

    Hello,
    I have forward and reverse reads in a fastq file from Ion Torrent PGM sequencing data and I would like to know if anyone knows a way that I can extract the forward and reverse reads into two separate files?

    And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

    Thank you,
    Jennifer
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    Hi Jennifer,

    You can do that with BBTools:

    reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

    and

    filterbyname.sh in=reads.fq out=filtered.fq names=names.txt


    -Brian

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      #3
      Originally posted by JenBarb View Post
      And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

      Thank you,
      Jennifer
      Heng Li's seqtk "subseq" option: https://github.com/lh3/seqtk

      Comment

      • JenBarb
        Member
        • Oct 2010
        • 47

        #4
        Thank you so much for a quick reply! I really appreciate it. I will try it now.
        Jennifer

        Comment

        • JenBarb
          Member
          • Oct 2010
          • 47

          #5
          Hi Brian,
          I am looking for the installation instructions on the page you sent and I can't find them. I also was looking for info about the two scripts that you sent. Is there a documentation page that describes what the scripts do and any argument options they take?

          Thank you,
          Jennifer

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            There is no installation required for BBTools. You just uncompress the file. Then you can run the shell scripts (you may need to add execute permissions depending on what OS you are using). If you run the shell script by itself (e.g. $ reformat.sh) it will print information about all possible command line options.

            Here is a thread with information about reformat tool: http://seqanswers.com/forums/showthread.php?t=46174

            Comment

            • JenBarb
              Member
              • Oct 2010
              • 47

              #7
              Thank you!!

              Comment

              • JenBarb
                Member
                • Oct 2010
                • 47

                #8
                Hello again,
                So I tried the reformat.sh script on my fastq file. I then took each separate output file (fwd and reverse reads) and blasted the reads along a database of interest and am still finding that some reads align in the forward and some align in the reverse direction. My understanding is that the result of this program should have put only forward reads into one file and reverse reads into another file and thus the results of the alignment would be forward only and reverse only given the appropriate file. I am not finding this to be true. Thoughts?

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  In the original post you had talked about forward/reverse reads in a simple context (as if they are two reads from the two ends of a fragment).

                  reformat.sh will not separate reads that align in opposite orientations. It will only separate reads if they were interleaved in a single file (as long as they came from a single fragment).

                  You will need to parse the output from your alignment program (what program are you using?) to separate reads that align to +/- strands into two files. I am not sure if BBMap can write to separate alignment files based on the strand info.

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    #10
                    Actually... there is a tool for that, "splitsam.sh", which is not part of the public distribution because I didn't think it would be of use to anyone. I've attached it to this post; just extract it and put it in the folder with the other shellscripts, then run it like this:

                    splitsam.sh mapped.sam forward.sam reverse.sam

                    You can also do that with samtools, by filtering on the 0x10 flag bit. In either case, they have to be mapped first, of course - you cannot determine which read goes to which strand from a fastq file.
                    Attached Files

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      Ask and ye shall receive

                      Roll that into BBMap download Brian!

                      Comment

                      • JenBarb
                        Member
                        • Oct 2010
                        • 47

                        #12
                        Thank you so much for your help!

                        Comment

                        • Brian Bushnell
                          Super Moderator
                          • Jan 2014
                          • 2709

                          #13
                          Originally posted by GenoMax View Post
                          Roll that into BBMap download Brian!
                          OK... I don't like to release incomplete things so I made it faster, added some features, and then rolled it into the download.

                          Originally posted by JenBarb View Post
                          Thank you so much for your help!
                          You're welcome!

                          Comment

                          • JenBarb
                            Member
                            • Oct 2010
                            • 47

                            #14
                            Thank you again, Brian and GenoMax for all of your help.

                            I now am trying the filterbyname script and it does not seem to be pulling out only those reads that match a particular read id found in my names.txt file. Is there something I am missing?

                            sh /data/barbj/bbmap/filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt

                            Comment

                            • Brian Bushnell
                              Super Moderator
                              • Jan 2014
                              • 2709

                              #15
                              By default, "filterbyname" discards reads with names in your name list, and keeps the rest. To include them and discard the others, do this:

                              filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt include=t

                              Sorry for the confusion. I guess that default is kind of odd.
                              Last edited by Brian Bushnell; 02-09-2015, 12:02 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...