Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gokhulkrishnakilaru
    Member
    • Jul 2011
    • 39

    #16
    Originally posted by rkk View Post
    command has to identify min and max value from col1 values.. and then bin that into 100bp regions...
    I am afraid then your bins would be like this

    Code:
    10175-10275 8
    10276-10375 1
    10376-10475 1
    10476-10575 2

    Comment

    • rkk
      Junior Member
      • Feb 2011
      • 9

      #17
      Once minimum value is identified.. then nearest 100 should be calculated.. for example in this case min value is 10175 so the bins starting value should be 10100.. hope this helps

      Comment

      • gokhulkrishnakilaru
        Member
        • Jul 2011
        • 39

        #18
        Originally posted by rkk View Post
        I should use that command in LINUX...

        Now, I have another issue

        I have a file like following..I need to bin the first column in 100bp regions and count the second column value for that bin
        10175 1
        10179 1
        10189 1
        10191 1
        10201 1
        10243 1
        10249 1
        10262 1
        10313 1
        10414 1
        10485 1
        10499 1

        The output should be something like this..

        10101-10200 4
        10201-10300 4
        10301-10400 1
        10401-10500 3

        Can someone help with this..

        Thanks in advance..
        @rkk,

        Your input can have two solutions

        Code:
        [COLOR="DarkOrchid"]Solution 1(Considering your minimum and maximum value from col1:
        
        cat input
        10175	1
        10179	1
        10189	1
        10191	1
        10201	1
        10243	1
        10249	1
        10262	1
        10313	1
        10414	1
        10485	1
        10499	1[/COLOR]


        Code:
        awk 'NR == 1 {max=$1 ; min=$1} $1 >= max {max = $1} $1 <= min {min = $1} END { print min"\t"max}' 1 | awk '{ print $1, i=$1+100;while(i++<$2) print i, i+=99}' > intermediate
        Code:
        cat intermediate
        
        10175 10275
        10276 10375
        10376 10475
        10476 10575
        Now, consider the above intermediate file and run the following code

        Code:
        awk 'NR==FNR{
           C[NR]=$1 " " $2
           L[C[NR]]=0
           next
        }
        {
         for (t in C) {
            split(C[t],v," ")
            if($1>=v[1] && $1<=v[2])
               L[C[t]]+=$2
         }
        }
        END {
           for(i=1;i in C;i++)
               print C[i] " " L[C[i]]
        }' intermediate input > output

        Code:
        cat output
        
        10175 10275 8
        10276 10375 1
        10376 10475 1
        10476 10575 2


        ###########################################


        Code:
        Solution 2 (Considering minimum value and nearest 100 and maximum value and nearest 100 from column1):
        
        cat input
        10175	1
        10179	1
        10189	1
        10191	1
        10201	1
        10243	1
        10249	1
        10262	1
        10313	1
        10414	1
        10485	1
        10499	1
        Code:
        awk '{       
            min=$1<min||!min?$1:min
            max=$1>max||!max?$1:max
        }      
        END {
          s=int(min/100)*100
          e=int(max/100)*100+100
          print s " " s+100
          for(i=s+101;i<e;i+=100)
             print i " " i+99
        }' input > intermediate
        Code:
        cat intermediate
        10100 10200
        10201 10300
        10301 10400
        10401 10500

        Now, consider the above intermediate file and run the following code

        Code:
        awk 'NR==FNR{
           C[NR]=$1 " " $2
           L[C[NR]]=0
           next
        }
        {
         for (t in C) {
            split(C[t],v," ")
            if($1>=v[1] && $1<=v[2])
               L[C[t]]+=$2
         }
        }
        END {
           for(i=1;i in C;i++)
               print C[i] " " L[C[i]]
        }' intermediate input > output
        Code:
        cat output
        
        10100 10200 4
        10201 10300 4
        10301 10400 1
        10401 10500 3

        Comment

        • francois.sabot
          Member
          • Dec 2009
          • 41

          #19
          Originally posted by rkk View Post
          Hello,

          I have a file like the following

          chr1 1234
          chr1 2345
          chr2 94837
          chr2 73457

          how can I split this data into two files

          chr1.txt

          chr1 1234
          chr1 2345

          chr2.txt

          chr2 94837
          chr2 73457

          Thanks in advance.
          What about a simple grep ?

          grep 'chr1' FILE > chr1.txt
          grep 'chr2' FILE > chr2.txt
          Francois Sabot, PhD

          Be realistic. Demand the Impossible.
          www.wikiposon.org

          Comment

          • gokhulkrishnakilaru
            Member
            • Jul 2011
            • 39

            #20
            Originally posted by francois.sabot View Post
            What about a simple grep ?

            grep 'chr1' FILE > chr1.txt
            grep 'chr2' FILE > chr2.txt
            Francois,

            Grep is a handy tool. But, you have to repeat that command for each chromosome in ur first column. And with awk, a simple command when used once, will do the task easily.

            After all, it's a life worth counting on the clock. No one wants to sit there typing each chromosome, at least myself.

            Comment

            • syfo
              Just a member
              • Nov 2012
              • 103

              #21
              Originally posted by gokhulkrishnakilaru View Post
              Code:
              awk '{print > $1".txt"}' input
              This is the correct and the best answer to the original question of the thread. The other awk command that was posted almost at the same time has a space in the output file name after "$1", it should not change anything but if you got an error try it as quoted here.

              As for the second problem, since you already know the resolution you want you don't need to compute min and max. Everything in one step:

              Code:
              awk '{bin[int($1/100)]+=$2}END{for (i in bin)print i*100+1"-"(i+1)*100,bin[i]}' input
              This line should give exactly the output you want. Pipe it on a sort -n if needed and/or change the separator "-".

              Comment

              • alexdobin
                Senior Member
                • Feb 2009
                • 161

                #22
                Originally posted by syfo View Post
                The other awk command that was posted almost at the same time has a space in the output file name after "$1", it should not change anything but if you got an error try it as quoted here.
                The space in $1 ".txt" is perfectly valid and cannot cause any problems. When you concatenate strings in awk, you separate them by spaces in the right-hand side: http://www.gnu.org/software/gawk/man...atenation.html
                Leaving space out in this case does not cause a problem, however it is a better practice to have space between concatenated strings. For example, if you concatenate several awk variables, you have to have space between them: v3=v1 v2. Of course, v3=v1v2 will not work.

                Comment

                • syfo
                  Just a member
                  • Nov 2012
                  • 103

                  #23
                  OK good, thanks Alex for the precision. Both commands should work then, I don't see any reason for an error either. Maybe try \awk instead of awk in case of some alias or shortcut?

                  Rkk, let me know if there is any issue with my one-liner for your second task.

                  Comment

                  • syfo
                    Just a member
                    • Nov 2012
                    • 103

                    #24
                    Originally posted by francois.sabot View Post
                    What about a simple grep ?

                    grep 'chr1' FILE > chr1.txt
                    grep 'chr2' FILE > chr2.txt
                    A more generic grep solution could be something like
                    Code:
                    for i in `cut -d" " -f1 input | sort -u`; do grep -w $i input > $i.txt ; done
                    But the awk alternative is better.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM
                    • seqadmin
                      Investigating the Gut Microbiome Through Diet and Spatial Biology
                      by seqadmin




                      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                      02-24-2025, 06:31 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    17 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    18 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    19 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    185 views
                    0 reactions
                    Last Post seqadmin  
                    Working...