Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Xterra
    Member
    • Jun 2010
    • 27

    Using sff tools to Extract Read Subsets

    How can I extract a subset of reads (longer than 150 bases but shorter than 200) from a sequencing Run using the sfftools. I think this can be accomplished using either sffinfo or fnafile (-t) but I just cannot get it to work.
    Any help will be very much appreciated.
  • sklages
    Senior Member
    • May 2008
    • 628

    #2
    This is not directly possible with sfftools. You'll probably need to work on fasta files.

    cheers,
    Sven

    Comment

    • kmcarr
      Senior Member
      • May 2008
      • 1181

      #3
      You can't do this with sfftools directly. You can use sfffile to select a subset of reads from an input SFF file by using the -i option to pass a file containing a list of accession number to include in the output. Of course you will have to use some other tools to first create the list of accession numbers for the reads which match your criteria, whatever they may be.

      (ETA: Oh, Sven beat me to it.)
      Last edited by kmcarr; 06-17-2010, 05:37 AM.

      Comment

      • Xterra
        Member
        • Jun 2010
        • 27

        #4
        How about fnafile?

        Could someone please explain me how to use -t option with fnafile? I mean if I use -i accno 1-200, wouldn't I be able to 'extract' the reads that are 150 bases long?

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          No, the -t option is used for changing trim point information stored in the file, not for extracting specific reads.

          Comment

          • Xterra
            Member
            • Jun 2010
            • 27

            #6
            fnafile -t syntax

            Hmmm! Ok, I understand the limitations. Still, would you mind explaining me the syntax for fnafile using the -t option (accno 1 200)? I just want to see what that tool is capable to do and see if I can use it down the road.
            Thanks!
            Last edited by Xterra; 06-17-2010, 01:19 PM.

            Comment

            • sklages
              Senior Member
              • May 2008
              • 628

              #7
              fnafile --help
              I never used this program, it seems to be the fasta counterpart to sffinfo.

              Both sffinfo and fnafile are capable of filtering by read names (ids) via '-e' or '-i', they are not capable of filtering by any other characteristic (e.g. length, gc content etc.).

              -t / -tr *set* trim points according to a file given by the user. These are not filter.

              Comment

              • Xterra
                Member
                • Jun 2010
                • 27

                #8
                Skalages

                I just cannot get it to work. Would you mind being a little more specific? Let's assume I have the file A.fas which is a FASTA file and I am using 1 200 instead of 12 543 as described on the manual
                The specifi ed “trimfi le” should contain one or more lines
                consisting of (1) a read accession number, (2) a starting trimpoint
                and (3) an ending trimpoint, separated by whitespace characters or
                where the trimpoints are separate by a dash (e.g., “accno 12 543” or
                “accno 12-543”)

                Comment

                • kmcarr
                  Senior Member
                  • May 2008
                  • 1181

                  #9
                  Originally posted by Xterra View Post
                  I just cannot get it to work. Would you mind being a little more specific? Let's assume I have the file A.fas which is a FASTA file and I am using 1 200 instead of 12 543 as described on the manual
                  Could you please post the exact command you were using, as well as small samples of the FASTA and trim files you tried to use.

                  Comment

                  • Xterra
                    Member
                    • Jun 2010
                    • 27

                    #10
                    I have tried so many different commands

                    but none is working. That's why I would like to get the right syntax.
                    I have uploaded an example of the FASTA file I would like to process using fnafile. As I said, I am only trying to find out what fnafile can do.
                    Thanks.
                    Attached Files

                    Comment

                    • kmcarr
                      Senior Member
                      • May 2008
                      • 1181

                      #11
                      Originally posted by Xterra View Post
                      but none is working. That's why I would like to get the right syntax.
                      O.K., how about an example of just one command that didn't work.

                      And your FASTA file looks like it contains gapped sequence.

                      >GF2FOAC04ISOQO
                      ACGAG-TG----GTGATGT-GCCAGC-TG-CCGTTGGTGT-TAATGAGCTGAA-TGTTCT
                      GCTGA-G-------GGC--ATGGC-T-GAACAC-GACGG-CAAATCACGT----TGTGAA
                      CGTG-CAA-CACGCG-CC--TCAA-CGGT-GGTGGT-G--CCCG-CGT--CCACCCCA-G
                      CGG-CCAG-C-AGAAGGA--TGA-CAAT-GACCCTT-C--G-CCCACGACT---------
                      >GF2FOAC04J0H2I
                      ACGAA-TGCG-TTTGATGT-GCCAGC-TG-CCGTTGGTGT-TAATGAGCTGAA-TGTTCT
                      GCTGA-G---G---GCCGAGTGGCGTAGAACAC-GCCGG-CAAT-CA-GT-TGGTGG-AA
                      CGTG-CAA-CA-GCG-CC-TCCAA-C----GGGGGTCG--CCCG-CGC--CCACCCCA-G
                      CGG-CCAG-C-AGAAGGA--TGA-CAAT-GA-CCTT-C--G-CCCA--------------
                      Where did this FASTA come from?
                      Last edited by kmcarr; 06-17-2010, 02:19 PM.

                      Comment

                      • Xterra
                        Member
                        • Jun 2010
                        • 27

                        #12
                        Here you have one without gaps.

                        Does it matter if you have gaps in the fasta file? Would it affect the performance of fnafile? Anyways, I have uploaded a file with no gaps. That's a 454 run using the Titanium amplicon sequencing kit. The previous file was the aligned file and that's why you could see the indels.
                        Attached Files

                        Comment

                        • sklages
                          Senior Member
                          • May 2008
                          • 628

                          #13
                          Originally posted by Xterra View Post
                          Does it matter if you have gaps in the fasta file? Would it affect the performance of fnafile? Anyways, I have uploaded a file with no gaps. That's a 454 run using the Titanium amplicon sequencing kit. The previous file was the aligned file and that's why you could see the indels.
                          You still haven't provided the exact command you used.

                          Your fasta file contains aligned sequences created by some "windows" program? OK, .. you provided a DOS formatted file, no idea if fnafile is happy about that.

                          1) create a file with trim points, just like:
                          Code:
                          GF2FOAC04I8T0F 1 200
                          GF2FOAC04J305H 1 200
                          GF2FOAC04J3QXL 1 200
                          2) run fnafile
                          Code:
                          $ fnafile -o out.fa -tr tp.txt UniqueHapsUnix.fas
                          3) see the differences

                          a) original sequence:
                          Code:
                          >GF2FOAC04I8T0F
                          ACGAGTGCGTTTGATGTGCCAGCTGCCGTTGGTGTTAATGAGCTGAATGTTCTGCTGAGGGCCATGGCTG
                          AACACGCCGGCAATCACGTTGGTGGAACGTGCAACAGCGCCTCCAACGGTGGTGGTGCCCGCGTCCACCC
                          CAGCGGCCAGCAGAAGGATGACAATGACCTTCGCCCAC
                          b) "trimmed" sequence
                          Code:
                          >GF2FOAC04I8T0F trim=1-200
                          ACGAGTGCGTTTGATGTGCCAGCTGCCGTTGGTGTTAATGAGCTGAATGTTCTGCTGAGG
                          GCCATGGCTGAACACGCCGGCAATCACGTTGGTGGAACGTGCAACAGCGCCTCCAACGGT
                          GGTGGTGCCCGCGTCCACCCCAGCGGCCAGCAGAAGGATGACAATGACCTTCGCCCAC
                          The sequence itself remains unchanged, but there has been a flag introduced (trim=1-200) directing the assembler (newbler) to just use the sequence within this range. Again, this not a filter and no physical trimming of the reads. You need to use an external tool to either work on fasta or sff files to trim your sequences.

                          Comment

                          • Xterra
                            Member
                            • Jun 2010
                            • 27

                            #14
                            Thanks!

                            That's the answer I was looking for!

                            Comment

                            • TheLight
                              Junior Member
                              • Sep 2008
                              • 5

                              #15
                              SFF editor/convert

                              There is a free tool with graphic interface that allows you to view/edit and convert SFF files here

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                07-01-2026, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 07-02-2026, 11:08 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              20 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              54 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...