Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Xterra
    replied
    sklages

    I am trying to combine sffinfo with a code that can get rid of the extra information in the ID line and at the same time remove the new line at the end of each line. Originally, I was hoping there was an option in sffinfo that could do exactly what I needed. Not being an scripter makes the task of finding the right code a little more challenging.
    Write a little perl script to remove the newline at the end of the sequence lines.
    Not Perl but AWK:
    awk '/^>/ {
    print (buff ? buff RS : null) $0
    buff = null; next
    }
    {
    buff = buff ? buff FS $0 : $0
    }
    END { print buff }' infile
    Last edited by Xterra; 06-18-2010, 02:16 PM.

    Leave a comment:


  • sklages
    replied
    look at your previous post and see what you have requested.

    Write a little perl script to remove the newline at the end of the sequence lines.

    Leave a comment:


  • Xterra
    replied
    Very nice!

    However, I still have a problem, the file now look like this (60 characters per line):

    >GHXCZCC01AJ8CJ
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGC
    CCCGGGGC
    >GHXCZCC01APUO5
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCC
    CGGGGCGA
    >GHXCZCC01AQSRP
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACG
    CCCCGGGG
    And I need something like this (the entire sequence in one line):
    >GHXCZCC01AJ8CJ
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGC
    >GHXCZCC01APUO5
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGA
    >GHXCZCC01AQSRP
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGG
    Thanks in advance!
    Last edited by Xterra; 06-18-2010, 10:56 AM.

    Leave a comment:


  • sklages
    replied
    Originally posted by Xterra View Post
    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna

    Code:
    sffinfo -s Inputfile.sff  | perl -lpe 's/^(\>\S+).+/$1/'  > Outputfile.fna
    should work in most cases :-)

    Leave a comment:


  • Xterra
    started a topic sffinfo -s inputfile

    sffinfo -s inputfile

    Is there any way to chage the 'format' of the output fasta file generated by sffinfo -s?
    Using the following code
    Code:
    sffinfo -s Inputfile.sff > Outputfile.fna
    This is what I get
    >GHXCZCC01AJ8CJ length=314 xy=0113_1201 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGA
    AGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    >GHXCZCC01APUO5 length=312 xy=0177_1303 region=1 run=R_2010_05_27_13_55_50_
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAG
    AGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCCCCACCGTACGGGTGCTCCCG
    TCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAA
    TAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCT
    TGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC
    ATATCCACGCAC
    >GHXCZCC01AQSRP length=314 xy=0188_0403 region=1 run=R_2010_05_27_13_55_50_
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTG
    AAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCTCCCCACCGTACGGGTGCTCC
    CGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGG
    AATAGTAGGCAAGGCCCGCCAGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAG
    CTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA
    TCATATCCACGCAC
    Would it be possible to change it to
    >GHXCZCC01AJ8CJ
    TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTG CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGACCCCCGCCACCATATCTACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    >GHXCZCC01APUO5
    TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCCCGGGGCGAAGAGGCTGGTCAGCGCACGGGTGTCCCTGCCCGCCGTCTGCC CCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCACAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCCAG GACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCATC ATATCCACGCAC
    >GHXCZCC01AQSRP
    TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGCCCCGGGGTGAAGAGGCTGGTCAGCGCACGGGTGTTCATACCCGCCGTCT CCCCACCGTACGGGTGCTCCCGTCAACGCCGGCAAAGAGTAGCATCGCAATCAAGACCTTAGCCCAGTTCCCCACCATGGAATAGTAGGCAAGGCCCGCC AGGACTCCCCAGTGGGCCCCCGCCACCATATCCACGACAGCTTGTGGGATCCGGAGTAACTGCGATACCACCAGGGCCGTTGTAGGTGACCAGTTCATCA TCATATCCACGCAC
    Any help will be greatly appeciated!

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:24 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-10-2024, 07:59 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Working...
X