Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sffinfo -s inputfile

    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna
    This is what I get
    >GHXCZCC01AJ8CJ length=314 xy=0113_1201 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGA
    AGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    >GHXCZCC01APUO5 length=312 xy=0177_1303 region=1 run=R_2010_05_27_13_55_50_
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAG
    AGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCCCG
    TCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAA
    TAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCT
    TGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC
    ATATCCACGCAC
    >GHXCZCC01AQSRP length=314 xy=0188_0403 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTG
    AAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCTCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    Would it be possible to change it to
    >GHXCZCC01AJ8CJ
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTG CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    >GHXCZCC01APUO5
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCC CCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCCAG GACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC ATATCCACGCAC
    >GHXCZCC01AQSRP
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTGAAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCT CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    Any help will be greatly appeciated!

  • #2
    Originally posted by Xterra View Post
    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna

    Code:
    sffinfo -s Inputfile.sff  | perl -lpe 's/^(\>\S+).+/$1/'  > Outputfile.fna
    should work in most cases :-)

    Comment


    • #3
      Very nice!

      However, I still have a problem, the file now look like this (60 characters per line):

      >GHXCZCC01AJ8CJ
      TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGC
      CCCGGGGC
      >GHXCZCC01APUO5
      TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCC
      CGGGGCGA
      >GHXCZCC01AQSRP
      TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACG
      CCCCGGGG
      And I need something like this (the entire sequence in one line):
      >GHXCZCC01AJ8CJ
      TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGC
      >GHXCZCC01APUO5
      TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGA
      >GHXCZCC01AQSRP
      TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGG
      Thanks in advance!
      Last edited by Xterra; 06-18-2010, 10:56 AM.

      Comment


      • #4
        look at your previous post and see what you have requested.

        Write a little perl script to remove the newline at the end of the sequence lines.

        Comment


        • #5
          sklages

          I am trying to combine sffinfo with a code that can get rid of the extra information in the ID line and at the same time remove the new line at the end of each line. Originally, I was hoping there was an option in sffinfo that could do exactly what I needed. Not being an scripter makes the task of finding the right code a little more challenging.
          Write a little perl script to remove the newline at the end of the sequence lines.
          Not Perl but AWK:
          awk '/^>/ {
          print (buff ? buff RS : null) $0
          buff = null; next
          }
          {
          buff = buff ? buff FS $0 : $0
          }
          END { print buff }' infile
          Last edited by Xterra; 06-18-2010, 02:16 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X