Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sdriscoll
    I like code
    • Sep 2009
    • 436

    building a Mosaik reference for Mouse

    Can anybody point me in the right direction for building a single, full genome, reference for the mouse that i can then align my illumina read data to? I have FASTA reference files, one per chromosome, for the mouse which I downloaded from UCSC. I can pass one of those at a time to MosaikBuild to produce .dat files for each chromosome but that seems a little crazy because that means I'd have to run a single lane of data against each chromosome, 1 at a time.

    If this is how other people do it then that's totally fine - it just seems like I should be able to build a single reference file for the entire genome.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */
  • snownebula
    Junior Member
    • Oct 2009
    • 9

    #2
    Hi there,

    All you have to do is create a concatenated FASTA file and you'll be all set with MOSAIK.

    For example, if I wanted to combine the first four mouse chromosomes into one file, I could type:

    cat mm_ref_chr1.fa >> mouse_ref.fa
    cat mm_ref_chr2.fa >> mouse_ref.fa
    cat mm_ref_chr3.fa >> mouse_ref.fa
    cat mm_ref_chr4.fa >> mouse_ref.fa

    You could keep doing this for all of the mouse chromosomes or if you're savvy at creating bash scripts, you could pretty much automate the above in a small script.

    Cheers,

    // Michael

    Comment

    • sdriscoll
      I like code
      • Sep 2009
      • 436

      #3
      cool thanks. it just wasn't clear in the documentation that you could just cat files together to make one larger reference. now i just need to figure out this jump database thing and i'll be off and running. Mosaik chews up some serious RAM and i've only got 16 GB on the system I'm running it on. looks like a jump database will help for running full genome alignments on this system.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment

      • sdriscoll
        I like code
        • Sep 2009
        • 436

        #4
        so i made this cat'd reference file (2.6 GB) and compiled it down. then i made a jump database and started a run with MosaikAligner using the jump database of this full genome reference. i ran pretty much the default settings listed in the manual except with only 4 cpu cores. it looks like it munched up about 19GB of RAM to load the jump database files into memory but once the alignment actually started it wasn't using all 4 cores - it was only using about 4% of the CPU and it was processing only 3.5 reads per second with an ETA of 53 DAYS. what could be going wrong?
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment

        • donniemarco
          Member
          • Aug 2009
          • 17

          #5
          cat all files

          maybe concatting different files might get little tedious. i tried:
          cat chr*.fa >> human_ref.fa

          it worked well.

          Comment

          • mkeehan
            Member
            • Feb 2010
            • 13

            #6
            Are you still using your system with 16GB of RAM?
            You are probably swapping if it's using 19GB...

            I found reading the manual to get the right parameters made a huge difference to the reads per second. The magic parameters I found were
            -bw 13 -act 20 -mm 4 -mhp 100
            That took me from a few reads per second to 700 - 800 per second.

            I also needed around 20GB of RAM for the jump database.

            Comment

            • sdriscoll
              I like code
              • Sep 2009
              • 436

              #7
              thanks for sharing the magic.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment

              Latest Articles

              Collapse

              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM
              • seqadmin
                Investigating the Gut Microbiome Through Diet and Spatial Biology
                by seqadmin




                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                02-24-2025, 06:31 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 05:03 AM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              17 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              18 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              185 views
              0 reactions
              Last Post seqadmin  
              Working...