Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract forward and reverse reads?

    Hello,
    I have forward and reverse reads in a fastq file from Ion Torrent PGM sequencing data and I would like to know if anyone knows a way that I can extract the forward and reverse reads into two separate files?

    And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

    Thank you,
    Jennifer

  • #2
    Hi Jennifer,

    You can do that with BBTools:

    reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

    and

    filterbyname.sh in=reads.fq out=filtered.fq names=names.txt


    -Brian

    Comment


    • #3
      Originally posted by JenBarb View Post
      And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

      Thank you,
      Jennifer
      Heng Li's seqtk "subseq" option: https://github.com/lh3/seqtk

      Comment


      • #4
        Thank you so much for a quick reply! I really appreciate it. I will try it now.
        Jennifer

        Comment


        • #5
          Hi Brian,
          I am looking for the installation instructions on the page you sent and I can't find them. I also was looking for info about the two scripts that you sent. Is there a documentation page that describes what the scripts do and any argument options they take?

          Thank you,
          Jennifer

          Comment


          • #6
            There is no installation required for BBTools. You just uncompress the file. Then you can run the shell scripts (you may need to add execute permissions depending on what OS you are using). If you run the shell script by itself (e.g. $ reformat.sh) it will print information about all possible command line options.

            Here is a thread with information about reformat tool: http://seqanswers.com/forums/showthread.php?t=46174

            Comment


            • #7
              Thank you!!

              Comment


              • #8
                Hello again,
                So I tried the reformat.sh script on my fastq file. I then took each separate output file (fwd and reverse reads) and blasted the reads along a database of interest and am still finding that some reads align in the forward and some align in the reverse direction. My understanding is that the result of this program should have put only forward reads into one file and reverse reads into another file and thus the results of the alignment would be forward only and reverse only given the appropriate file. I am not finding this to be true. Thoughts?

                Comment


                • #9
                  In the original post you had talked about forward/reverse reads in a simple context (as if they are two reads from the two ends of a fragment).

                  reformat.sh will not separate reads that align in opposite orientations. It will only separate reads if they were interleaved in a single file (as long as they came from a single fragment).

                  You will need to parse the output from your alignment program (what program are you using?) to separate reads that align to +/- strands into two files. I am not sure if BBMap can write to separate alignment files based on the strand info.

                  Comment


                  • #10
                    Actually... there is a tool for that, "splitsam.sh", which is not part of the public distribution because I didn't think it would be of use to anyone. I've attached it to this post; just extract it and put it in the folder with the other shellscripts, then run it like this:

                    splitsam.sh mapped.sam forward.sam reverse.sam

                    You can also do that with samtools, by filtering on the 0x10 flag bit. In either case, they have to be mapped first, of course - you cannot determine which read goes to which strand from a fastq file.
                    Attached Files

                    Comment


                    • #11
                      Ask and ye shall receive

                      Roll that into BBMap download Brian!

                      Comment


                      • #12
                        Thank you so much for your help!

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          Roll that into BBMap download Brian!
                          OK... I don't like to release incomplete things so I made it faster, added some features, and then rolled it into the download.

                          Originally posted by JenBarb View Post
                          Thank you so much for your help!
                          You're welcome!

                          Comment


                          • #14
                            Thank you again, Brian and GenoMax for all of your help.

                            I now am trying the filterbyname script and it does not seem to be pulling out only those reads that match a particular read id found in my names.txt file. Is there something I am missing?

                            sh /data/barbj/bbmap/filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt

                            Comment


                            • #15
                              By default, "filterbyname" discards reads with names in your name list, and keeps the rest. To include them and discard the others, do this:

                              filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt include=t

                              Sorry for the confusion. I guess that default is kind of odd.
                              Last edited by Brian Bushnell; 02-09-2015, 12:02 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:58 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:18 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-06-2024, 08:04 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-03-2024, 06:55 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X