Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with adding numerical sequence at the end of line

    Hi,

    Anyone has any idea how to get this:

    >no_name
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

    become this:

    >no_name_1
    TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
    >no_name_2
    GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
    >no_name_3
    GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
    >no_name_4
    GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
    >no_name_5
    GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

  • #2
    Here's one way using Perl. Save the text in a file named numbers.pl (or whatever). Usage would be:

    perl numbers.pl --in file_to_change.fasta --out revised_file.fasta


    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Getopt::Long;
    
    my $inFile;
    my $outFile;
    
    GetOptions  ("in=s"      => \$inFile,
                 "out=s"      => \$outFile);
    
    if (!$inFile or !$outFile) {
        die "Must supply both infile and outfile as command line arguments.\n";
    }
    
    open(my $inFH, "<", $inFile) or die "couldn't open infile for reading.\n";
    if (-e $outFile) {
        die "Output file $outFile already exists--aborting so you don't overwrite.\n";
    }
    open(my $outFH, ">", $outFile) or die "couldn't open outfile for writing.\n";
        
    my $counter = 1;
    while (my $line = <$inFH>) {
        chomp $line;
        if ($line =~ /^(>.*)/) {
            print $outFH $1 . "_$counter\n";
            $counter++;
        } else {
            print $outFH "$line\n";
        }
    }
    Last edited by atcghelix; 09-26-2013, 09:57 PM. Reason: Edited to move $counter++ so that you didn't just get odd-numbered sequences

    Comment


    • #3
      Heres another way: R

      Code:
      library(seqinr)
      read.fasta("fastafile.fa")->fa
      write.fasta(fa,names=paste(getName(fa),1:5,sep="_"),file.out="fa_new_name.fa")
      where you swap '1:5' with '1:n', n being the number of sequences you have.

      Comment


      • #4
        Anyone know how to use AWK to do this task?

        Comment


        • #5
          Thanks. I am pretty weak in Perl. Do you have any idea using AWK to do this?

          Comment


          • #6
            What version of Awk are you running/what operating system?

            Comment


            • #7
              Running is UNIX

              Comment


              • #8
                This work? (It assumes all sequence strings are on a single line)

                Code:
                awk '{if($0 ~ /^>/){print $0"_"(NR+1)/2}else{print $0}}' input.fasta > changed.fasta
                Last edited by atcghelix; 09-26-2013, 11:33 PM. Reason: Less confusing regex

                Comment


                • #9
                  try this

                  Code:
                  paste - - < input.fa | awk ' { print $1"_"NR"\n"$2 } ' > output.fa
                  make sure to have spaces between the hyphens for 'paste'

                  Comment


                  • #10
                    Thank you everybody. I have done my task. =)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM
                    • seqadmin
                      Multiomics Techniques Advancing Disease Research
                      by seqadmin


                      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                      A major leap in the field has
                      ...
                      02-08-2024, 06:33 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 02-28-2024, 06:12 AM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-23-2024, 04:11 PM
                    0 responses
                    70 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-21-2024, 08:52 AM
                    0 responses
                    78 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-20-2024, 08:57 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X