Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract reads from Blast output

    Hi...

    I hope that someone can help me out with this one.

    I ran blast against a db I created (specifically is from an organism I want to remove from my reads). So I ran the blast analyses and now I would like to extract all the reads and sequence that didn't have hits.

    is this possible and how can I extract the reads I want from the blast output?

    Thanks

    F.

  • #2
    I suggest that you add the option "-outfmt 7" when running blast. This would make the result look like this:

    # Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
    5 # 157 hits found
    6 sp|Q91G65|032R_IIV6 gi|15078745|ref|NP_149495.1| 100.00 100 0 0 1 100 1 100 6e-64 198
    7 sp|Q91G65|032R_IIV6 gi|291396174|ref|XP_002714758.1| 36.56 93 58 1 9 100 60 152 4e-15 74.7
    8 sp|Q91G65|032R_IIV6 gi|296198107|ref|XP_002746567.1| 35.48 93 59 1 9 100 60 152 5e-15 74.3
    9 sp|Q91G65|032R_IIV6 gi|116004445|ref|NP_001070581.1| 36.56 93 58 1 9 100 60 152 8e-15 73.9
    10 sp|Q91G65|032R_IIV6 gi|126309825|ref|XP_001370260.1| 36.56 93 58 1 9 100 60 152 1e-14 73.6

    Then it would be easier to extract the reads from the output.

    Comment


    • #3
      I used a script to obtain the same output ... the issue is extract them...

      Comment


      • #4
        Originally posted by flacchy View Post
        I used a script to obtain the same output ... the issue is extract them...
        blastdbcmd, read the manual..
        savetherhino.org

        Comment


        • #5
          I've read the manual but I found more useful use the script ncbi_parse.pl because I needed the description of the organism before and the tab output don't gave that infromation (I saw also in some blog other people having the same issue) I was only wondering if there was a way to extract the reads and sequence of the no hits from the output...

          Comment


          • #6
            Originally posted by flacchy View Post
            I've read the manual but I found more useful use the script ncbi_parse.pl because I needed the description of the organism before and the tab output don't gave that infromation (I saw also in some blog other people having the same issue) I was only wondering if there was a way to extract the reads and sequence of the no hits from the output...
            With blast 2.2.28+ you can have organism name in the output given you have setup taxdb properly and use e.g. -outfmt '6 std sscinames'..
            savetherhino.org

            Comment


            • #7
              flacchy wants to extract reads that did not have a blast hit in results. Not sure if blastdbcmd allows that.

              Comment


              • #8
                Originally posted by GenoMax View Post
                flacchy wants to extract reads that did not have a blast hit in results. Not sure if blastdbcmd allows that.
                Ah yeah, my bad. So for tsv/csv output:

                cut -f 1 (optional -d ',' for csv) yourResultFile | sort -u > seqIdsYouWantToRemove.txt. Then ideas for that from for example here..
                Last edited by rhinoceros; 07-25-2013, 06:28 AM.
                savetherhino.org

                Comment


                • #9
                  once you get the reads list you want to remove, you can use comm command to get left reads names wanted, and then extract them very easily. fastqselect.tcl from mira software works very well

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 05-14-2024, 07:03 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-10-2024, 06:35 AM
                  0 responses
                  44 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-09-2024, 02:46 PM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-07-2024, 06:57 AM
                  0 responses
                  42 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X