Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bioman1
    Member
    • May 2012
    • 80

    Extracting mitochondrial reads

    We have sequenced plant through whole-genome short gun sequencing approach with illumina paired-end reads. I am thinking to extract mitochondiral reads by mapping against available mitochondrial genomes in NCBI. Two options comes to my mind

    1. Denovo genome assembly using velvet and blast against genbank mitochondrial genomes to extract mitochondrial reads and proceed for denovo mitochondrial genome assembly.
    2. To align the reads to each mitochondiral genome using tophat or any other aligner like bowtie or bwa and extract mitochondiral related reads and then proceed for denovo mitochondrial genome assembly

    I like the option two, because we can extract without denovo genome assembly. But the daunting task will be mapping raw reads and extracting mapped reads for each genome. Are there any script available already to extract mapped reads?

    Which method is the best one you think?
    Last edited by bioman1; 03-31-2014, 08:30 PM.
  • yueluo
    Member
    • Aug 2013
    • 82

    #2
    A common aligner such as bowtie2 can output 'mapped' reads.
    If you want reads mapped to different mitochondrial genomes going to seperate files. You can either:
    1) Make seperate indexes for each mitochondrial genome, then run a mapping process seperately. This way you get a seperate set of reads that align to different genomes.
    2) Brian Bushnell recently announced a set of tools including BBMap and BBSplit. If they work like he says, then you should be able to get what you want with just a single run. I can't be sure though, since I haven't tested these tools...

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      You could try MITObim,

      Comment

      • sphil
        Senior Member
        • Apr 2010
        • 192

        #4
        Originally posted by yueluo View Post
        1) Make seperate indexes for each mitochondrial genome, then run a mapping process seperately. This way you get a seperate set of reads that align to different genomes.
        Since MT-Genomes aren't that large, imo he could also 'cat' all MTgenomes.fa he wants to into one genome. Map against this, since the mapping location will give rise to which genome the reads map you also get the information in just one mapping run. Need to cat before though.

        Comment

        • bioman1
          Member
          • May 2012
          • 80

          #5
          Thanks maubp..this tools seems to interesting to try out.
          @sphil: do you think is that right approach to combine all plant mitochondrial genomes (93 genome) in to one MTgenome.fa using 'cat' and mapping reads to them will be right approach?. If it works, it will be good idea.

          I am also aiming to separate chloroplast reads by using similar approach. Any tools available for this like MITobim for chloroplast?

          Comment

          • JackieBadger
            Senior Member
            • Mar 2009
            • 385

            #6
            If you have a good reference you can map directly to it. I have done this with Newbler before and got quick and good mtDNA genomes out of total DNA pools

            Comment

            • sphil
              Senior Member
              • Apr 2010
              • 192

              #7
              Originally posted by bioman1 View Post
              Thanks maubp..this tools seems to interesting to try out.
              @sphil: do you think is that right approach to combine all plant mitochondrial genomes (93 genome) in to one MTgenome.fa using 'cat' and mapping reads to them will be right approach?. If it works, it will be good idea.

              I am also aiming to separate chloroplast reads by using similar approach. Any tools available for this like MITobim for chloroplast?
              I acutally don't see any problem with that. However, I actually have no clue how much sequence similarity there is between those MT-Genomes. But anyways I would definitely give it a try!!!

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #8
                Originally posted by yueluo View Post
                2) Brian Bushnell recently announced a set of tools including BBMap and BBSplit. If they work like he says, then you should be able to get what you want with just a single run. I can't be sure though, since I haven't tested these tools...
                That's correct, BBSplit was designed specifically for this scenario.

                bbsplit.sh ref=x.fa,y.fa,z.fa in=reads.fq basename=o%.fq

                That will create 3 output files, "ox.fq", "oy.fq", and "oz.fq". The reads that map best to x.fa will go to ox.fq, and so forth.

                Some reads can map ambiguously to multiple references, if the sequences are highly conserved; you can control that with the "ambig2" flag. "ambig2=best" will map the read to the first best-matching reference; "toss" will discard away ambiguous reads; "all" will send the read to the output for every reference to which it maps; and "split" will put the ambiguous reads in a separate file (i.e. AMBIGUOUS_ox.fq) for each reference to which it maps.

                Comment

                • bioman1
                  Member
                  • May 2012
                  • 80

                  #9
                  @Brian Bunshell, I think BBsplit will helpful in my work. Thanks.
                  regarding command line

                  bbsplit.sh ref=x.fa,y.fa,z.fa in=reads.fq basename=o%.fq

                  for me ref will be 93 genomes, should I need to give individually like ref= r1.fa, r2.fa, r3.fa..r93.fa or can I combine all reference genomes in one file using 'cat' as sphil suggested.

                  I will be paired end reads, should I need make interleaved as single file?. Did the output best match ox.fa will be single file. Can I use this reads for further denovo assembly?

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    #10
                    Originally posted by bioman1 View Post
                    @Brian Bunshell, I think BBsplit will helpful in my work. Thanks.
                    regarding command line

                    bbsplit.sh ref=x.fa,y.fa,z.fa in=reads.fq basename=o%.fq

                    for me ref will be 93 genomes, should I need to give individually like ref= r1.fa, r2.fa, r3.fa..r93.fa or can I combine all reference genomes in one file using 'cat' as sphil suggested.

                    I will be paired end reads, should I need make interleaved as single file?. Did the output best match ox.fa will be single file. Can I use this reads for further denovo assembly?
                    bioman,

                    If your reads are paired in two files, you should use "in1=reads1.fq in2=reads2.fq". For paired input, the output will be interleaved, but the bbtools package includes another program, "reformat.sh", which can interleave, de-interleave, change quality offset, file format, and do various other things like subsampling and trimming, very fast.

                    To interleave:
                    reformat.sh in1=read1.fq in2=read2.fq out=reads.fq

                    To de-interleave:
                    reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

                    By default, if there is a single input file, BBMap will autodetect whether the reads are interleaved or single-ended based on the names, assuming normal Hiseq/Miseq naming conventions. You can override this with the "interleaved=t" or "interleaved=f" flag. All flags are in any order.

                    If you have any trouble with RAM regarding bbsplit, you can set the amount of memory to use with the flag "-Xmx1g" for (for example) 1 GB RAM. This should be set to about 85% of physical memory, but don't bother unless you encounter a problem.

                    -Brian

                    P.S. I forgot to answer your question... No, you MUST specify "ref=r1.fa,r2.fa,r3.fa..r93.fa" (without any spaces). If you use cat, it will not work.
                    Last edited by Brian Bushnell; 04-01-2014, 07:56 PM.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Today, 08:59 AM
                    0 responses
                    7 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    21 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    29 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...