Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Mug_4
    Junior Member
    • Aug 2013
    • 2

    Extracting the pattern not matching

    Hi there,

    I have a list of sequence id and I am trying to extract the sequences from the proteome of the organism using these ids. I used the grep command with -w and -f to find out the sequences and got some of the sequences. Further, I want to extract the remaining sequences of my list from uniprot. For this I am trying to identify the sequence ids from my listfile which I could not find in the proteome set. How can I do this using grep/awk/sed commands or any other oneliner?

    Thanks
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Well, you'd need to show the exact format of everything to get a one-liner, so I can't help you there (at the moment at least). However, what it sounds like you want is an inverted match grep (usually grep -v). You can likely combine that in with your previous grep command to get the output that you want.

    Comment

    • Mug_4
      Junior Member
      • Aug 2013
      • 2

      #3
      Thanks. grep -v is not the solution for me. I have a list file like
      list.txt that contains the multiple strings as
      AVD
      HJK
      DFT
      XXT
      MNZ
      LRB
      ..

      I have a protein sequence file from uniprot. I used grep -wi -f list.txt uniprot.fasta > out

      out contains the sequences that are matched with the pattern in the file list.txt. What I exactly want the strings from list.txt not matched with uniprot.fasta

      Originally posted by dpryan View Post
      Well, you'd need to show the exact format of everything to get a one-liner, so I can't help you there (at the moment at least). However, what it sounds like you want is an inverted match grep (usually grep -v). You can likely combine that in with your previous grep command to get the output that you want.

      Comment

      • Apexy
        Member
        • Apr 2011
        • 62

        #4
        Try:
        grep -wiv -f list.txt uniprot.fasta > out

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          07-01-2026, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 11:08 AM
        0 responses
        6 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        53 views
        0 reactions
        Last Post SEQadmin2  
        Working...