Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Protein ID that blast could not identify

    HI
    I downloaded a proteome in fasta formater, which contains hundreds of proteins (http://labs.umassmed.edu/chlamyfp/in...p?content=help). And I want to blast against these proteins with my data using Blast+, however, when I makeblastdb the proteome dataset, an error occurred
    *******************************************************************
    Error: NCBI C++ Exception:
    "/am/ncbiapdata/release/blast/src/2.2.26/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/objects/seq/../seqloc/Seq_id.cpp", line 1679: Error: ncbi:bjects::CSeq_id::x_Init() - Unsupported ID type C_1150005
    *******************************************************************
    I thing there must be something wrong with the proteome data, cause the blast+ just worked well when I used the data downloaded directly from NCBI.

    Therefore, I opened the proteome data with textedit, and for example, the header of each sequence was like this
    *****************************************************************
    >C_680011|168600 FAP45, Flagellar Associated Protein Weakly Similar to Nasopharyngeal Epithelium Specific Protein 1
    MPQTPPRSGGYRSGKQSYVDESLFGGSKRTGAAQVETLDSLKLTAPTRTISPKDRDVVTLTKGDLTRMLKASPIMTAEDVAAAKREAEAKREQLQAVSKA
    RKEKMLKLEEEAKKQAPPTETEILQRQLNDATRSRATHMMLEQKDPVKHMNQMMLYSKCVTIRDAQIEEKKQMLAEEEEEQRRLDLMMEIERVKALEQYE
    ARERQRVEERRKGAAVLSEQIKERERERIRQEELRDQERLQMLREIERLKEEEMQAQIEKKIQAKQLMEEVAAANSEQIKRKEGMKVREKEEDLRIADYI
    LQKEMREQ
    *****************************************************************

    Here the "C_680011|168600" should be the protein ID I think, but there was no found if I search it in NCBI. I just wonder what kind of ID it is and how should I do to make the blast+ recognise it.

    Thanks!

  • #2
    Are you using the -parse_seqids option? If so, try it without this. I only ever use this if my FASTA file identifiers follow the NCBI naming conventions.

    It would be useful to show the command you used to run makeblastdb as that might help us to understand what you are doing.

    Comment


    • #3
      Originally posted by maubp View Post
      Are you using the -parse_seqids option? If so, try it without this. I only ever use this if my FASTA file identifiers follow the NCBI naming conventions.

      It would be useful to show the command you used to run makeblastdb as that might help us to understand what you are doing.
      Dear Maubp,
      Thanks for you reply.
      Yes I used -parse_seqids, and followed your suggestion, without the -parse_seqids, another error showed up,
      *******************************************************************
      Error: (CArgException::eNoArg) Argument "dbtype". Mandatory value is missing: `String, `nucl', `prot''
      Error: (CArgException::eNoArg) Application's initialization failed
      *****************************************************************

      The command I used was
      makeblastdb -in CrFP.fasta -out CrFP

      Thanks

      Comment


      • #4
        That error is clear isn't it? You have to tell makeblastdb if your FASTA file is protein or nucleotides. i.e. either:

        Code:
        makeblastdb -in CrFP.fasta -out CrFP -dbtype nucl
        or,

        Code:
        makeblastdb -in CrFP.fasta -out CrFP -dbtype prot

        Comment


        • #5
          Originally posted by maubp View Post
          That error is clear isn't it? You have to tell makeblastdb if your FASTA file is protein or nucleotides. i.e. either:

          Code:
          makeblastdb -in CrFP.fasta -out CrFP -dbtype nucl
          or,

          Code:
          makeblastdb -in CrFP.fasta -out CrFP -dbtype prot
          YES!
          What a stupid mistake I made. It succeeded now!

          Thank you!

          Comment


          • #6
            Originally posted by Tsuyoshi View Post
            It succeeded now!
            Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice

            Comment


            • #7
              Originally posted by maubp View Post
              Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice
              YEAP!

              I couldn't agree with you anymore. Many thanks!

              Comment


              • #8
                Originally posted by maubp View Post
                Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice
                HI Maubp,
                But I still have a question about the protein ID, it seems like that there is no database name the proteins in that way, I mean, take several proteins as example, they are

                C_1620015|156900
                C_10830001|152917
                C_2020008|159281
                C_510029|166481
                C_510029|166481
                C_510029|166481
                C_510029|166481

                I do not think they are accession numbers for Chlamydomonas in NCBI, but I want to identify their correct or real NCBI accession numbers, do you have any idea about that?

                Comment


                • #9
                  That's a different question - the only way your sequences would have real NCBI accession numbers would be if they have already been submitted to one of the databases. I would explore the NCBI databases for this using Entrez search term "chlamydomonas[orgn]" and see if anything matches your dataset:


                  (square brackets in the URL confuse the forum software)

                  Or you could try BLAST'ing some of your sequences against the NR database to see if any give perfect matches?
                  Last edited by maubp; 09-10-2012, 03:10 AM. Reason: Trying to fix link

                  Comment


                  • #10
                    Originally posted by maubp View Post
                    That's a different question - the only way your sequences would have real NCBI accession numbers would be if they have already been submitted to one of the databases. I would explore the NCBI databases for this using Entrez search term "chlamydomonas[orgn]" and see if anything matches your dataset:

                    http://www.ncbi.nlm.nih.gov/sites/gq...=chlamydomonas[orgn]

                    Or you could try BLAST'ing some of your sequences against the NR database to see if any give perfect matches?
                    The sequences themselves are perfectly matched the submitted data of Chlamydomonas. I just have no idea what kind of IDs they are that the authors used.

                    Comment


                    • #11
                      If you can work out how to get the data from the NCBI with their accessions, that might be simpler than working with the original author's private identifiers.

                      Comment


                      • #12
                        Originally posted by maubp View Post
                        If you can work out how to get the data from the NCBI with their accessions, that might be simpler than working with the original author's private identifiers.
                        That's right.
                        Anyway, I will try to extract the accession numbers from NCBI.
                        Thank you very much Maubp !

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Genetic Variation in Immunogenetics and Antibody Diversity
                          by seqadmin



                          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                          11-06-2024, 07:24 PM
                        • seqadmin
                          Choosing Between NGS and qPCR
                          by seqadmin



                          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                          10-18-2024, 07:11 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 11:09 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Today, 06:13 AM
                        0 responses
                        20 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 11-01-2024, 06:09 AM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 10-30-2024, 05:31 AM
                        0 responses
                        21 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X