Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAT - uniquely mapped reads/multiple hits

    Hi all,

    I wanted to know if there was a simple way to filter the output.psl from BLAT to obtain a file containing only the uniquely mapped reads.
    Concerning the multiple hits, does BLAT sort them by any order of probability or else? I couldn't find this information in the documentation... Having a look at the output file makes me think it doesn't.
    But, can I nevertheless tell the soft to output only the alignments that are the most likely to be true?

    Thanks in advance.

  • #2
    "Most likely to be true" is a nebulous standard. You can, however, filter a psl file to report only the best and nearly the best hits for a given query. The program pslReps, which should be distributed with BLAT, filters .psl files. There are a number of parameters to adjust the stringency of filtering. Here is a link to some tips given by Jim Kent (author of BLAT and pslReps) on the parameters they use at UCSC. Of course that was in the context of aligning ESTs or full length cDNAs. He makes the point in his response that it is not possible to force pslReps to only report a single alignment for a query (even when using the "-singleHit" option) if there are multiple hits with the same or nearly the same score.

    Comment


    • #3
      Yes, "most likely to be true" is a very fuzzy notion here.
      I hadn't see pslReps/Sort... was distributed with blast, I'm still a newbie and I'm quite confused with all the different softs that have been developped...
      This is a real mess for someone new in this field as I am!

      Thank you very much for your help, I'll try to run pslReps and others psl stuff.

      Comment


      • #4
        You may find the git repo helpful, here is the link:


        I used BLAT recently in a RNA-seq splice junction detection project, here is
        some perl scripts for running BLAT and parsing psl result, might be of help to you:
        Yet another bioinformatics tool to detect de novo splice junctions from paired-end RNA-seq reads (human genome only) - lifengtian/SplicePL


        I tried pslReps for exactly the same problem, it was not designed for it.



        Originally posted by Adamo View Post
        Yes, "most likely to be true" is a very fuzzy notion here.
        I hadn't see pslReps/Sort... was distributed with blast, I'm still a newbie and I'm quite confused with all the different softs that have been developped...
        This is a real mess for someone new in this field as I am!

        Thank you very much for your help, I'll try to run pslReps and others psl stuff.
        Last edited by lifeng.tian; 07-02-2010, 04:01 PM.

        Comment


        • #5
          Thank you, I think it can be very helpful!

          However, I have some questions about how to use the scripts (I'm all new to biology and bioinformatic...):

          Why should I mask the genome? (actually, I haven't understood this notion yet). I'll work on a bacterial one, do I have to mask it too?

          I only have single end read, is it ok anyway? Will it work if I just use the "--forward=..." thing?

          As I understand it, I'll have my alignment stored in the "temp" directory after running Blat. Then, what is the command to filter the output.psl so that I obtain only uniquely mapped reads?

          Sorry if some questions are a little bit naive...!
          Last edited by Adamo; 07-05-2010, 12:56 AM.

          Comment


          • #6
            Please check out this perl script at
            Yet another bioinformatics tool to detect de novo splice junctions from paired-end RNA-seq reads (human genome only) - lifengtian/SplicePL


            It will run BLAT on N processes and generate temp/unique and temp/unique.psl
            LMK if you have more questions at [email protected]

            BTW, you don't need to mask the genome.
            Last edited by lifeng.tian; 07-05-2010, 03:39 PM.

            Comment


            • #7
              Thanks you again, I'm having a look at your script. It seems quite approachable, even for me!
              I'll let you know if I need some more help.

              Comment


              • #8
                Just remind you, the minscore will determine the final number of unique reads. The default value of 30 is way too low for bacterial genome and long reads. Assuming the read length is 200bp, then a 90% match requires
                a minscore of 180.
                Last edited by lifeng.tian; 07-06-2010, 05:50 AM.

                Comment


                • #9
                  The thing is, I've reads of different lenghts, from 100bp to 300bp. Can't I specify a percentage instead of a precise score?
                  Last edited by Adamo; 07-06-2010, 06:42 AM.

                  Comment


                  • #10
                    Oops, mistake.
                    Last edited by Adamo; 07-06-2010, 06:40 AM.

                    Comment


                    • #11
                      I modified the blat_singleend.pl.
                      Try run it with --minidentity=90
                      IT will require the match score to be larger than individual_read_length * 0.9.

                      Originally posted by Adamo View Post
                      The thing is, I've reads of different lenghts, from 100bp to 300bp. Can't I specify a percentage instead of a precise score?

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        Yesterday, 07:48 AM
                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 07:17 AM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-02-2024, 08:06 AM
                      0 responses
                      19 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-30-2024, 12:17 PM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-29-2024, 10:49 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X