Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sffinfo -s inputfile

    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna
    This is what I get
    >GHXCZCC01AJ8CJ length=314 xy=0113_1201 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGA
    AGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    >GHXCZCC01APUO5 length=312 xy=0177_1303 region=1 run=R_2010_05_27_13_55_50_
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAG
    AGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCCCG
    TCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAA
    TAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCT
    TGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC
    ATATCCACGCAC
    >GHXCZCC01AQSRP length=314 xy=0188_0403 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTG
    AAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCTCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    Would it be possible to change it to
    >GHXCZCC01AJ8CJ
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTG CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    >GHXCZCC01APUO5
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCC CCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCCAG GACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC ATATCCACGCAC
    >GHXCZCC01AQSRP
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTGAAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCT CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    Any help will be greatly appeciated!

  • #2
    Originally posted by Xterra View Post
    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna

    Code:
    sffinfo -s Inputfile.sff  | perl -lpe 's/^(\>\S+).+/$1/'  > Outputfile.fna
    should work in most cases :-)

    Comment


    • #3
      Very nice!

      However, I still have a problem, the file now look like this (60 characters per line):

      >GHXCZCC01AJ8CJ
      TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGC
      CCCGGGGC
      >GHXCZCC01APUO5
      TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCC
      CGGGGCGA
      >GHXCZCC01AQSRP
      TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACG
      CCCCGGGG
      And I need something like this (the entire sequence in one line):
      >GHXCZCC01AJ8CJ
      TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGC
      >GHXCZCC01APUO5
      TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGA
      >GHXCZCC01AQSRP
      TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGG
      Thanks in advance!
      Last edited by Xterra; 06-18-2010, 10:56 AM.

      Comment


      • #4
        look at your previous post and see what you have requested.

        Write a little perl script to remove the newline at the end of the sequence lines.

        Comment


        • #5
          sklages

          I am trying to combine sffinfo with a code that can get rid of the extra information in the ID line and at the same time remove the new line at the end of each line. Originally, I was hoping there was an option in sffinfo that could do exactly what I needed. Not being an scripter makes the task of finding the right code a little more challenging.
          Write a little perl script to remove the newline at the end of the sequence lines.
          Not Perl but AWK:
          awk '/^>/ {
          print (buff ? buff RS : null) $0
          buff = null; next
          }
          {
          buff = buff ? buff FS $0 : $0
          }
          END { print buff }' infile
          Last edited by Xterra; 06-18-2010, 02:16 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X