Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Extract Multiple Sequence from Multi Fasta File by ID list

    Hi,
    I have a list of ids in .txt format and a multi fasta file with sequences. I need to extract sequences with the IDs in the list.

    Can you help me, please?

  • #2
    I think you can do that using seqret which is part of EMBOSS. According to the documentation the paramater -iquery1 can be used to specify a list of IDs, although probably not a file with IDs...

    Comment


    • #3
      Do you program? You can do that with a few lines using a library like Biopython.

      Alternatively, if you have a local Galaxy you could ask your admin to install one of these tools: http://toolshed.g2.bx.psu.edu/view/p...q_filter_by_id or http://toolshed.g2.bx.psu.edu/view/p...q_select_by_id

      Comment


      • #4
        If there are no linebreaks in the sequences, then

        Code:
        grep -A1 -w -f id.txt seqFile.fasta > output.fasta
        should work. The ids have to be identical to the fasta headers including the greater than sign.
        savetherhino.org

        Comment


        • #5
          faSomeRecords from Kent utilities is the simplest solution (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/)

          More here: http://seqanswers.com/forums/showpos...0&postcount=13

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin




            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin


            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          23 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 09:45 AM
          0 responses
          200 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 08:54 AM
          0 responses
          209 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-02-2024, 03:00 PM
          0 responses
          192 views
          0 likes
          Last Post seqadmin  
          Working...
          X