Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • awk help

    Can any one help me with awk.

    The attached file is an example. I have 10 functions and 6 comparsions. The first column is the function name and column 2- 7 are p value.

    I need to write a awk command to keep the functions which have five out of six p value less than 0.05

    I want to select functions like this

    function1 0.01 0.01 0.01 0.01 0.01 0.1

    Thanks
    Ben
    Attached Files
    Last edited by SDPA_Pet; 03-26-2014, 02:20 PM.

  • #2
    cat file | awk '{n=0;if($2<0.05)n++; if($3<0.05)n++; if($4<0.05)n++; if($5<0.05)n++; if($6<0.05)n++; if($7<0.05)n++; if (n==5) print $0}'

    Note this only picks if 5, if you want >= 5 change "==" to ">="

    Comment


    • #3
      If you need 5 out of 6 then the following should work:

      Code:
      $ awk '{count = 0; for (i=2; i<=NF; i++) if ($i < 0.05) count++; if (count == 5) print $0}' data.txt
      Note: Richard beat me to it but I will leave this here since as always more than one way of doing things.
      Last edited by GenoMax; 03-27-2014, 04:14 AM.

      Comment


      • #4
        Originally posted by GenoMax View Post
        If you need 5 out of 6 then the following should work:

        Code:
        $ awk '{count = 0; for (i=2; i<=NF; i++) if ($i < 0.05) count++; if (count == 5) print $0}' data.txt
        Note: Richard beat me to it but I will leave this here since as always more than one way of doing things.
        Hi GEnoMax

        "i<NF", What does NF mean?

        Thanks

        Comment


        • #5
          Originally posted by GenoMax View Post
          If you need 5 out of 6 then the following should work:

          Code:
          $ awk '{count = 0; for (i=2; i<=NF; i++) if ($i < 0.05) count++; if (count == 5) print $0}' data.txt
          Note: Richard beat me to it but I will leave this here since as always more than one way of doing things.
          Hi GenoMax,

          Can you help me the attached dataset. I want to extract all the data meet these conditions:

          either column4 or column 6 data >=1

          I try this but it doesn't work

          awk '{n=0;if($4>=1)n++; if($6>=1)n++; if (n>=1) print $0}' t-test-story.txt

          Thank you.
          Attached Files
          Last edited by SDPA_Pet; 03-29-2014, 12:50 PM.

          Comment


          • #6
            Originally posted by SDPA_Pet View Post
            Hi GEnoMax

            "i<NF", What does NF mean?

            Thanks
            NF is total number of fields in a record (a line).

            Comment


            • #7
              Hi GenoMax,

              Can you help me with the new file? I don't know why my script doesn't work.

              Comment


              • #8
                Originally posted by SDPA_Pet View Post
                Hi GenoMax,

                Can you help me the attached dataset. I want to extract all the data meet these conditions:

                either column4 or column 6 data >=1

                Thank you.
                Try this:

                Code:
                $ awk '{FS="\t"; if ($4 > 1 || $6 > 1 ) print $0}' t-test-story.txt

                Comment


                • #9
                  Need help converting tab to fasta

                  Hello all!! I need help converting tab to fasta... I am a newbie and know only a little bash scripting. I need to convert a tab delimited SNP file into either a single fasta file or a multiple fasta files for each column using a the first line as identifier.

                  The closest I got was a script that enters a loop and can only be stopped by ctrl-C


                  #!/bin/bash


                  echo ">" > carat.txt

                  counter=1
                  #My tab file has 64 columns
                  while : [ $counter -lt 64]
                  do

                  less <SNP.txt |awk "{print$"$counter"}"| cat carat.txt - >$counter.fa
                  counter=$(($counter +1))

                  done

                  exit



                  SNP_001 SNP_002 SNP_003....
                  T T T T
                  C C C C
                  C C C C
                  C C C C
                  A A A A
                  A A A A
                  T T T T
                  T T T T
                  C C C C
                  G G G G
                  G G G G
                  C C C C

                  Comment


                  • #10
                    Moved

                    this post is also copied to a new thread to increase exposure... thanks

                    Comment


                    • #11
                      Originally posted by musta1234 View Post
                      this post is also copied to a new thread to increase exposure... thanks
                      Are you basically looking to transpose the matrix your posted and covert it to a multi-fasta file?

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      39 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      41 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      35 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X