Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast multifasta protein to genome and get protein which match to genome.

    Hello,

    I want to blast multifasta protein to genome and get multifasta protein file with all proteins that aligned to the genome. Is there any way to do this?

    It is even fine if I can get all the header of proteins that aligned to the genome.

    Thanks in advance
    Sandesh

  • #2
    If this multifasta file comes from the same (or a closely related genome) then blat may be the fastest option to get the alignments you need. You can choose several output formats with blat.

    You will need to do some additional parsing to pare down your original multifasta file to only retain sequences that had a "hit" in the genome.
    Last edited by GenoMax; 01-25-2015, 11:20 AM.

    Comment


    • #3
      Use makeblastdb to turn your genome into an nucleotide BLAST database. Then use tblastn since you have protein queries and a nucleotide database (with suitable score threshold). Filter your protein file according to if any hits were found. Done?

      Comment


      • #4
        Thanks for response. I am trying to align uniref90 proteins to get all protein hits and run exonerate later with that hit because running exonerate with uniref90 is really slow. So, i want to filter proteins to run exonerate afterwards.

        There are around 22 millions proteins in uniref90.fasta.

        I will try both of your suggestion. Let me see how it goes.

        Comment


        • #5
          I was assuming you had a few thousand proteins (or a few tens of thousands of proteins), for example the predicted protein set of one organism. Not 22 millions proteins (!).

          How big is your genome (base pairs)? And is it nicely assembled into a few chromosomes, or in many contigs? How many contigs?

          The relative size of the protein set and the genome size will strongly influence the best approach (e.g. which to use as the query and which as the database), or if BLAST is even suitable.

          Comment


          • #6
            @Peter: Sandesh probably has a few thousand proteins. He has been trying to align them to uniprot ref clusters data.

            @Sandesh: This is probably not the most efficient way to try and annotate a new genome/proteome. As Peter asked above, can you tell us how you arrived at this set of proteins? Is there a related genome available?

            Comment


            • #7
              Actually I am trying to annotate genome size of 65 MB. It has 18 linkage groups with other remaining scaffolds (which may be around 800 small scaffolds). Most of the sequence are in LGs.

              I got the the protein data from http://www.ebi.ac.uk/uniprot/database/download.html.
              Yes uniprot ref clusture. I was using this protein to align using exonerate but took long.

              Maybe I am doing wrong.
              There are other related species for this organisms, which has been sequenced and annotated.

              Please correct me if I am doing wrong.
              Last edited by sandesh; 01-26-2015, 07:16 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                Today, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:18 AM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Today, 08:04 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-03-2024, 06:55 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-30-2024, 03:16 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Working...
              X