Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FixMateInformation after GATK realignment

    I'm using GATK to analyze some paired-end Illumina exome data and have been running into a problem after realignment using IndelRealigner. After realignment, I have been trying to use Picard's FixMateInformation to take the realigned query-name-sorted bams and produce fixed, coordinate-sorted bams. This function creates a bunch of temp files and does so fairly quickly. However, the process of merging these temp files together takes an extraordinarily long time. Not sure what I'm doing wrong here.

    Is it possible to simply use the FixMate and Sort functions in SamTools to perform the same task? This I've done and it works relatively quickly.

    Lastly, one other quick question, after realignment using IndelRealigner, a one exome bam file which is only about 7.3 GB (following initial alignment with BWA) becomes approximately 32.8 GB. Is this supposed to happen?

    Thanks, Toast

  • #2
    Originally posted by MolecularToast View Post
    However, the process of merging these temp files together takes an extraordinarily long time. Not sure what I'm doing wrong here.
    I thought it was normal :-) I had to process 6 bam files and it took slightly less than 3 weeks (from recalibration to final rmdup).

    Originally posted by MolecularToast View Post
    Lastly, one other quick question, after realignment using IndelRealigner, a one exome bam file which is only about 7.3 GB (following initial alignment with BWA) becomes approximately 32.8 GB. Is this supposed to happen?

    Thanks, Toast
    The BAM produced after the indelrealigner is huge, that is supposed to happen and I suspect it is related to the BAM file structure. Sorted files are smaller than unsorted ones. After InderRealigner you have the "less sorted" BAM file.

    Comment


    • #3
      As far as FixMateInformation, I did not find it taking that excessively long. This was on only around 50,000,000 reads, but it took less than two hours. You can definitely try bumping up the MAX_RECORDS_IN_RAM parameter assuming you do have a lot of RAM to speed it up. My guess based on what you said is that you are spending tons of time reading and writing to disk (tmp files) which you can reduce by using more RAM. Remember to pass JVM more RAM when you run it.

      As for the huge BAM file, is your final BAM file uncompressed? You can compress it further with multiple tools (does IndelRealigner have the option to output compressed BAM?).
      Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
      Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
      Projects: U87MG whole genome sequence [Website] [Paper]

      Comment


      • #4
        Thank you both for your replies. I tried increasing the MAX_RECORDS_IN_RAM once and all it seemed to do was create larger temp files that merged just as slowly. I'll try passing even more RAM to JVM and see if that speeds it up any but I'm nearly maxed out already.

        I completely missed that before but, yes, it is an uncompressed BAM file and IndelRealigner does have an additional argument you can use to compress the output. That makes sense now.

        Anyone know though whether SamTools FixMate and Sort functions can accomplish the same thing as Picards FixMateInformation?

        Thanks, Toast

        Comment


        • #5
          If you are using Fedora/Ubuntu linux *and* you have tons of RAM (128 or 256GB) you might want to use /dev/shm/ which is the temporary RAM drive as your tmp directory.

          Comment


          • #6
            Thank you - I did end up running it on a linux box with about 4x the comp power and 4x the RAM and it did speed it up (comparatively) quite a bit (at least within my tolerable range).

            Thank you everyone for your suggestions. As near as I can tell the samtools alternative method did allow for downstream processing but I haven't compared the calls.

            Comment


            • #7
              Originally posted by MolecularToast View Post
              Thank you - I did end up running it on a linux box with about 4x the comp power and 4x the RAM and it did speed it up (comparatively) quite a bit (at least within my tolerable range).

              Thank you everyone for your suggestions. As near as I can tell the samtools alternative method did allow for downstream processing but I haven't compared the calls.
              Could you post your parameter for FixMateInformation? I have the same problems, it is really slowly.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 07:03 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              37 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              43 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Working...
              X