Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • To get the no. of repeats and along with the repeated element

    Hi all,

    I want to compare first line with next line of a file containg single column. for example a file cotains:
    NM_1
    NM_1
    NM_1
    NM_2
    NM_2
    NM_3
    NM_4
    NM_5
    NM_5
    NM_5
    NM_5
    i want to get output as
    1 NM_1
    2 NM_1
    3 NM_1
    1 NM_2
    2 NM_2
    1 NM_3
    1 NM_4
    1 NM_5
    2 NM_5
    3 NM_5
    4 NM_5
    separated by tabs.

    Can anybody help me to write perl script to get the above output giving an input file. It will be well appreciated.

    But I have written the code as follows. I could not find where to correct my script:
    #!/usr/bin/perl

    $file=$ARGV[0];
    open(INFILE,"$file");
    $i = 1;

    while(<INFILE>){

    chomp;
    $currentline = <INFILE>;
    print "$i\t $currentline";
    $nextline = <INFILE>;

    if ($currentline == $nextline ){
    $i++;
    print "$i\t $nextline";

    }
    else{
    print "1\t $nextline";
    $i = 1;

    }

    }
    close(INFILE);


    With Regards,
    Aeolus

  • #2
    Originally posted by Aeolus Huios View Post
    I want to compare first line with next line of a file containg single column. for example a file cotains:
    NM_1
    NM_1
    NM_1
    NM_2
    NM_2
    NM_3
    NM_4
    NM_5
    NM_5
    NM_5
    NM_5
    i want to get output as
    1 NM_1
    2 NM_1
    3 NM_1
    1 NM_2
    2 NM_2
    1 NM_3
    1 NM_4
    1 NM_5
    2 NM_5
    3 NM_5
    4 NM_5
    separated by tabs.
    First question, do you really need every occurrence of each identical element written to the output? It seems to me that it would be just as informative, easier to read/parse and more compact if your output only contained one line for each unique element with the count for that element. For example:

    Code:
    3 NM_1
    2 NM_2
    1 NM_3
    1 NM_4
    4 NM_5
    etc....
    The way I would do this would be with the unix command uniq, and since uniq requires that its input be sorted I always use sort first since it's never good to assume that your input is already sorted. By default uniq collapses all identical lines into a single line and adding the -c option will also output the count of the number of element in the original file.
    Code:
    # sort <inputFile> | uniq -c > <outputFile>
    I should note that uniq prints leading spaces before the count and the separator between the count and the element is a space not a tab. The output for the above example would look like:

    Code:
       3 NM_1
       2 NM_2
       1 NM_3
       1 NM_4
       4 NM_5
    You could clean these up by adding sed and tr to the command pipeline

    Code:
    # sort <inputFile> | uniq -c | sed -e 's/^ *//' | tr ' ' '\t' > <outputFile>
    
    Which will produce an output which looks like:
    
    3	NM_1
    2	NM_2
    1	NM_3
    1	NM_4
    4	NM_5
    Last edited by kmcarr; 02-08-2012, 10:27 AM.

    Comment


    • #3
      cat file.txt| awk '{if ($1!=prev)k=1;else k++;print k"\t"$0;prev=$0}'

      Comment


      • #4
        sort | uniq -c

        this is the most elegant solution

        Comment


        • #5
          Hi kmcar, Gege

          Thanks alot for reply but i know using uniq linux function for getting the frequency of repeated data. But i want the output as what i said.
          :-) )))

          Hi Rechard,

          Let me try I will reply U back after a while.
          Thanks alot. :-) )))

          With reagrds,
          Aeolus

          Comment


          • #6
            Originally posted by Aeolus Huios View Post
            Hi kmcar, Gege

            Thanks alot for reply but i know using uniq linux function for getting the frequency of repeated data. But i want the output as what i said.
            :-) )))

            Hi Rechard,

            Let me try I will reply U back after a while.
            Thanks alot. :-) )))

            With reagrds,
            Aeolus
            Certainly you can write a long script. But you could also do a "sort | uniq -c" then use "cut" to grab the columns individually and "paste" to reassemble however you want with whatever delimiter you want. So this makes it just a couple unix commands.

            Comment


            • #7
              Hi Rechard ,

              Once again thanks alot. Its works very well. :-) ))))
              Can you tell me a good guide book or online articles for AWK commands.
              It will very grateful.

              With regards,
              Pawan

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:35 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 02:46 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Working...
              X