Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with blastdbcmd. “Error: [blastdbcmd] Skipped TRANSCRIPT/4865”

    Hello, I am looking for help with running a Reciprocal BLAST. I cannot query any of my transcripts using blastdbcmd because I get this error message: “Error: [blastdbcmd] Skipped TRANSCRIPT_4865”. I appreciate any insight that can help resolve this issue!

    I am using high quality FASTA files with compiled transcripts of the subspecies I am working with. For context, I ran LORDEC on my transcript files to reduce redundancy. Now, I am working on Reciprocal Blasting to identify comparable contigs between subspecies.


    This is the script I am using for the Reciprocal BLAST (macOS):

    #Making a database

    makeblastdb -in ~/Desktop/tegula_blast/Hq_transcripts_2.fa -dbtype 'nucl' -out ~/Desktop/tegula_blast/Tfunebralis_DB -parse_seqids


    #Output message

    Building a new DB, current time: 03/06/2024 11:23:46

    New DB name: /Users/lanigleason/Desktop/tegula_blast/Tfunebralis_DB

    New DB title: /Users/lanigleason/Desktop/tegula_blast/Hq_transcripts_2.fa

    Sequence type: Nucleotide

    Keep MBits: T

    Maximum file size: 3000000000B

    Adding sequences from FASTA; added 99388 sequences in 2.75653 seconds.



    #Blast for all possible pairwise comparisons

    blastn -query ~/Desktop/tegula_blast/Hq_transcripts_1.fa -db ~/Desktop/tegula_blast/Tfunebralis_DB -out ~/Desktop/tegula_blast/eiseni_to_funebralis.txt -evalue 1E-20 -outfmt 6 -max_target_seqs 1


    #Output message

    Warning: [blastn] Examining 5 or more matches is recommended

    (See attached photo for output file)


    #Retrieve subset of assembly using blastdbcmd

    Here, I took the queried transcripts with matches, from previous blastn output (eiseni_to_funebralis.txt) and made a list of the transcript names (eiseni_names.txt).

    An observation I made here is that there are multiple matches listed in the blastn output file. My original fasta file with transcripts does not include multiple copies, so this has to be a product of the blastn command. I am wondering if this is a possible reason for the error in the next step.



    blastdbcmd -db ~/Desktop/tegula_blast/Tfunebralis_DB -dbtype 'nucl' -entry_batch ~/Desktop/tegula_blast/eiseni_names.txt -out ~/Desktop/tegula_blast/eiseni_to_funebralis.subset.reciprocal.fasta


    #Output message

    Error: [blastdbcmd] Skipped transcript_4866

    Error: [blastdbcmd] Skipped transcript_9439

    Error: [blastdbcmd] Skipped transcript_9439

    Error: [blastdbcmd] Skipped transcript_13586

    Error: [blastdbcmd] Skipped transcript_17909

    Error: [blastdbcmd] Skipped transcript_38053

    Error: [blastdbcmd] Skipped transcript_38053

    Error: [blastdbcmd] Skipped transcript_22088

    Error: [blastdbcmd] Skipped transcript_34418

    Error: [blastdbcmd] Skipped transcript_45393

    Error: [blastdbcmd] Skipped transcript_45393

    Error: [blastdbcmd] Skipped transcript_13587

    Error: [blastdbcmd] Skipped transcript_13587

    Error: [blastdbcmd] Skipped transcript_13587

    etc…


    Most of the transcripts are skipped in this step. As seen, all duplicates of the same transcript are skipped too.

    I have tried using a smaller subset of transcripts and removing the duplicate matches to run the reciprocal blast but I encounter the same errors.


    I also tried looking up database identifiers that I could be missing but I am not sure how to use these identifiers to query with blastdbcmd.

    blastdbcmd -db ~/Desktop/tegula_blast/Tfunebralis_DB -dbtype 'nucl' -entry all -out ~/Desktop/tegula_blast/eiseni_to_funebralis.subset.reciprocal.all.fasta -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"

    #Example output
    OID: 0 GI: N/A ACC: transcript_3756 IDENTIFIER: lcl|transcript_3756
    OID: 1 GI: N/A ACC: transcript_9791 IDENTIFIER: lcl|transcript_9791
    OID: 2 GI: N/A ACC: transcript_7816 IDENTIFIER: lcl|transcript_7816
    OID: 3 GI: N/A ACC: transcript_1853 IDENTIFIER: lcl|transcript_1853​


    Please let me know any ways to reduce the amount of transcripts skipped by blastdbcmd. Any recommended parameter changes and other explanations are welcomed. Thank you!

    Attached Files
    Last edited by arredondoea; 03-06-2024, 12:43 PM.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X