Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Xterra
    Member
    • Jun 2010
    • 27

    sffinfo -s inputfile

    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna
    This is what I get
    >GHXCZCC01AJ8CJ length=314 xy=0113_1201 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGA
    AGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    >GHXCZCC01APUO5 length=312 xy=0177_1303 region=1 run=R_2010_05_27_13_55_50_
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAG
    AGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCCCG
    TCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAA
    TAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCT
    TGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC
    ATATCCACGCAC
    >GHXCZCC01AQSRP length=314 xy=0188_0403 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTG
    AAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCTCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    Would it be possible to change it to
    >GHXCZCC01AJ8CJ
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTG CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    >GHXCZCC01APUO5
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCC CCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCCAG GACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC ATATCCACGCAC
    >GHXCZCC01AQSRP
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTGAAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCT CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    Any help will be greatly appeciated!
  • sklages
    Senior Member
    • May 2008
    • 628

    #2
    Originally posted by Xterra View Post
    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna

    Code:
    sffinfo -s Inputfile.sff  | perl -lpe 's/^(\>\S+).+/$1/'  > Outputfile.fna
    should work in most cases :-)

    Comment

    • Xterra
      Member
      • Jun 2010
      • 27

      #3
      Very nice!

      However, I still have a problem, the file now look like this (60 characters per line):

      >GHXCZCC01AJ8CJ
      TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGC
      CCCGGGGC
      >GHXCZCC01APUO5
      TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCC
      CGGGGCGA
      >GHXCZCC01AQSRP
      TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACG
      CCCCGGGG
      And I need something like this (the entire sequence in one line):
      >GHXCZCC01AJ8CJ
      TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGC
      >GHXCZCC01APUO5
      TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGA
      >GHXCZCC01AQSRP
      TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGG
      Thanks in advance!
      Last edited by Xterra; 06-18-2010, 10:56 AM.

      Comment

      • sklages
        Senior Member
        • May 2008
        • 628

        #4
        look at your previous post and see what you have requested.

        Write a little perl script to remove the newline at the end of the sequence lines.

        Comment

        • Xterra
          Member
          • Jun 2010
          • 27

          #5
          sklages

          I am trying to combine sffinfo with a code that can get rid of the extra information in the ID line and at the same time remove the new line at the end of each line. Originally, I was hoping there was an option in sffinfo that could do exactly what I needed. Not being an scripter makes the task of finding the right code a little more challenging.
          Write a little perl script to remove the newline at the end of the sequence lines.
          Not Perl but AWK:
          awk '/^>/ {
          print (buff ? buff RS : null) $0
          buff = null; next
          }
          {
          buff = buff ? buff FS $0 : $0
          }
          END { print buff }' infile
          Last edited by Xterra; 06-18-2010, 02:16 PM.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          24 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          31 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          23 views
          0 reactions
          Last Post SEQadmin2  
          Working...