Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • scaffoldig/re-assembly of mira output using quality values

    Hi guys,
    i've been assembling NGS data (454 in the past and IonTorrent now, unpaired data) with MIRA for quite some time and i usually re-assemble the contigs generated by MIRA with Lasergene SeqMan 7.

    This provides really good results by finding overlaps between contigs but the problem is that this software can't load quality data from the *.fasta.qual file given by MIRA and just load the unpadded.fasta output even if it can manage other formats with quality values, like ab1 for example.

    Do you have any suggestion about a way to import quality values in SeqMan or another software that will do the work??



    Thank you so much for your help

  • #2
    If SeqMan can take FASTQ, combine MIRA's FASTA + QUAL files into FASTQ files.

    Comment


    • #3
      SeqMan 7 can't load fastq files

      Comment


      • #4
        anyone have any suggestion?

        Comment


        • #5
          Does it really matter if Qual is not included when you join contigs?
          I would assume that you would include quality thresholds for contig assembly/Q.clipping during your Mira assembly?
          The end result should be high quality contigs....Mira is pretty good at producing these (at least I know for 454), which then you can align.

          Take a look at SeqManPro, or ask the developers. I would be surprised if this program can not utilize the quality values embedded within your assembly ace file to join consensus sequences.

          Comment


          • #6
            Originally posted by JackieBadger View Post
            Does it really matter if Qual is not included when you join contigs?
            I would assume that you would include quality thresholds for contig assembly/Q.clipping during your Mira assembly?
            The end result should be high quality contigs....Mira is pretty good at producing these (at least I know for 454), which then you can align.

            Take a look at SeqManPro, or ask the developers. I would be surprised if this program can not utilize the quality values embedded within your assembly ace file to join consensus sequences.

            Contigs generated by mira happens to have low quality ends that results in overlaps with SNPs where contigs join... at the moment i'm correcting this problem by checking everyone of them in consed and manually trimming the low quality contig end... obiously, since seqman support trimming based on quality, if it would accept .fasta.qual or fastq files the assembly would improve a lot (immagine that an overlap isn't recognized because of a long low quality end of a contig....)

            Comment


            • #7
              Hard to believe that such commercial software is not capable of loading sequence formats with quality values (except for "read data").

              hmm, by reading http://www.dnastar.com/t-fileformats.aspx I suspect you are right :-)
              Have you asked the lasergene support about that?

              Maybe you should look for another solution .. you could give gap4/gap5 from the Staden package a try. This works smooth with MIRA: MIRA->CAF->GAP.

              my 2p, Sven

              Comment


              • #8
                You don't need to be manually trimming low quality ends.
                Trimmomatic will trim your ends based on either absolute quality value of each base, or the mean across a designated sliding window.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X