Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • milo0615
    Member
    • Dec 2012
    • 39

    Comparative Genomics - BLAT

    Hello,

    I have assembled two related species using Abyss. However, I would like to know if there is a significant difference between the two genomes. A way of doing this is by aligning both assemblies to one another. I used BLAT for the alignment but now I need help visualizing and interpreting the output.psl file. How can I get the total percentage coverage of the alignment? Lets say, assembly A covers x% of assembly B? Is there a better way of doing this? Please let me know. I would really appreciate your help.


    Thank you,

    -Milo
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    How big are these genomes? How many contigs are there in each?

    Comment

    • milo0615
      Member
      • Dec 2012
      • 39

      #3
      Originally posted by GenoMax View Post
      How big are these genomes? How many contigs are there in each?
      Hi GenoMax,

      Thank you for replying.

      Assembly a is 444.5MB and it contains 1.4M contigs
      Assembly b is 526.1MB and it contains 1.6M contigs

      Please advice.

      Thank you,

      -Milo

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Looks like you are a ways away from having a real assembly. You likely will need some custom scripting to get a meaningful answer since I assume your blat result file (even in PSL format) is probably pretty large.

        If the blat results were in blast format and you just wanted to visualize them then http://bioinformatics.oxfordjournals...31/8/1305.full or http://www.biomedcentral.com/1471-2105/15/128 would have been useful.

        Do you have a related reference genome available? What is the expected genome size for your samples?

        Comment

        • milo0615
          Member
          • Dec 2012
          • 39

          #5
          Originally posted by GenoMax View Post
          Looks like you are a ways away from having a real assembly. You likely will need some custom scripting to get a meaningful answer since I assume your blat result file (even in PSL format) is probably pretty large.

          If the blat results were in blast format and you just wanted to visualize them then http://bioinformatics.oxfordjournals...31/8/1305.full or http://www.biomedcentral.com/1471-2105/15/128 would have been useful.

          Do you have a related reference genome available? What is the expected genome size for your samples?
          Hi GenoMax,

          I do no have a related reference genome. Therefore, I aligned both assemblies against each other. My specie is a diploid plant with a 2C DNA value estimated at 5.1 pg (about 5.0 Gb).

          I am going to try the ones that you suggested. Is there a better way of comparing both assemblies? How can I get an alignment percentage coverage? Please let me know.

          Thank you,

          -Milo

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Perhaps you should concentrate on largest (what is the size range of the largest contigs? Are there any that are 10kb and above?) ones first and see if they are related. That would make your search space smaller.

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              @Milo: With 1.4M+ input sequences those viewer programs are not going to be useful (and your results are not in blast format either).

              You can get an idea of coverage estimate by using Brian's suggestion in this thread: http://seqanswers.com/forums/showthread.php?t=44035 You will have to use the raw data for this. This suggestion is not directly related to question you asked, but may be worth while to do, since you are working with a unknown genome.
              Last edited by GenoMax; 04-16-2015, 05:19 PM.

              Comment

              • milo0615
                Member
                • Dec 2012
                • 39

                #8
                Originally posted by GenoMax View Post
                @Milo: With 1.4M+ input sequences those viewer programs are not going to be useful (and your results are not in blast format either).

                You can get an idea of coverage estimate by using Brian's suggestion in this thread: http://seqanswers.com/forums/showthread.php?t=44035 You will have to use the raw data for this. This suggestion is not directly related to question you asked, but may be worth while to do, since you are working with a unknown genome.
                Hi GenoMax,

                I took the longest 10 contigs from each assembly (>8500 bp with a max length of 120 K for assembly 1 and 104 K for assembly 2) and BLAT aligned them together. They seem to be pretty similar but I would like more in depth information about their differences. I am going to give it a try at what you suggested.

                Thank you for your help and please let me know if you have any other ideas.


                Thank you,

                -Emilio

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  For those larger contigs that are noted be similar by blat try using Mauve . You can get additional information from Mauve alignments: http://darlinglab.org/mauve/user-guide/files.html

                  Comment

                  • milo0615
                    Member
                    • Dec 2012
                    • 39

                    #10
                    Originally posted by GenoMax View Post
                    For those larger contigs that are noted be similar by blat try using Mauve . You can get additional information from Mauve alignments: http://darlinglab.org/mauve/user-guide/files.html
                    Hi GenoMax,

                    I did use Mauve. Attached is a pdf with the Mauve results of largest 10 contigs for both assemblies (gerbera_alignment.pdf). However, Mauve does not work when I align the largest 10 contigs from assembly 1 to the whole assembly 2. I guess it is mostly designed for bacterial genome.
                    Attached Files

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      You should align only 2 contigs that are most similar to each other. If you look closely at the PDF you posted you can probably make out which contigs are most similar to each other as pairs.

                      Again this is not going to help you a lot since you have 1.4M contigs.

                      If you don't know perl/python find a friend who can help parse the blat result file.
                      Last edited by GenoMax; 04-17-2015, 02:06 PM.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Pathogen Surveillance with Advanced Genomic Tools
                        by seqadmin




                        The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                        03-24-2025, 11:48 AM
                      • seqadmin
                        New Genomics Tools and Methods Shared at AGBT 2025
                        by seqadmin


                        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                        The Headliner
                        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                        03-03-2025, 01:39 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-20-2025, 05:03 AM
                      0 responses
                      41 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-19-2025, 07:27 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-18-2025, 12:50 PM
                      0 responses
                      36 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-03-2025, 01:15 PM
                      0 responses
                      192 views
                      0 reactions
                      Last Post seqadmin  
                      Working...