Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Makeblastdb from paired end reads

    Hello everyone. I’m in the process of analyzing some transcriptome data. I’m not quantifying expression at all. I only want to search for sequences based on a query file of my genes of interest. I have paired end reads, so I’m not using a fully assembled transcriptome. Basically, I need some advice since I’m new to this area.

    What I have done so far is merge the R1/R2 files and convert the files from Fastq to Fasta using the FastX-toolkit. I used the makeblastdb command to make the blast database from the resulting Fasta file. I know that I’m supposed to get .nhr .nin and .nsq, but I think that the database is so big that I got something like this:
    DB.00.nsq, DB.00.nin, DB.00.nhr
    DB.01.nsq, DB.01.nin, DB.01.nhr
    and so on.
    So here’s the first question: is this a problem? Or will I just have to blast my query file against each database (00, 01), one at a time?

    Also, before I get too far into this, I also would like to know if for some reason I shouldn’t be merging the read files and creating a database from it.

    Thank you for taking the time to read this!

  • #2
    Originally posted by sp24 View Post
    I know that I’m supposed to get .nhr .nin and .nsq, but I think that the database is so big that I got something like this:
    DB.00.nsq, DB.00.nin, DB.00.nhr
    DB.01.nsq, DB.01.nin, DB.01.nhr
    and so on.
    So here’s the first question: is this a problem? Or will I just have to blast my query file against each database (00, 01), one at a time?
    No that's normal for a very large database - have a look at the NCBI provided NR or NT databases as an example.
    Originally posted by sp24 View Post
    Also, before I get too far into this, I also would like to know if for some reason I shouldn’t be merging the read files and creating a database from it.
    That is a very sensible question - you might get something out of your planned analysis but this is not the normal approach (I would do a transcriptome assembly giving you putative transcripts, attempt to analyse them, for example with BLAST against sister species).

    Comment


    • #3
      Thanks for your response. I decided not to merge the R1/R2 files since I've been reading on here that the R2 file needs to be reverse complemented. I've been creating blast databases with individual read files and blasting against my query, but now I'm trying to figure out how to interpret the outputs.

      For the scope of this project I'm not going to be able to assemble transcriptomes, just trying to figure out how to retrieve these sequences. If someone reads this and has done this before, I would appreciate your input. Or if there's any papers/online resources you can point me to that would be great as well.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 12:17 PM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-29-2024, 10:49 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-25-2024, 11:49 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      23 views
      0 likes
      Last Post seqadmin  
      Working...
      X