Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA building index of full human (ensembl) fails

    I need to build a color space index for the complete human genome build GRCh37 including the haplotypic chromosomes.

    On the complete set (35 chromosomes) the BWA indexing fails, and seem to hang forever (>20 hrs) on converting the nt PAC to color PAC. I used the 'bwa index -c -a bwtsw 'fasta_file' option.

    After playing a bit, removing 1 or 2 chromosomes from this set resolves the issue; actually when the nt.pac file is 1009 MB it works, but adding a chromosome and raising the nt.pac filesize to 1049 MB fails the indexing completely.

    So there are no errors or chrashes with the complete build, but it just remains in building the cs version of the PAC file. We've tried this on several systems with 64 GB ram and plenty of disk space. Both version 0.5.1 and 0.5.5 seem to have this issue.

    hopefully, this can be fixed.

  • #2
    Use the reference genome here:

    ftp://ftp.ncbi.nih.gov/1000genomes/f...cal/reference/

    The toplevel fasta contains multiple haplotypes for the same chromosome. You should not use that.

    Comment


    • #3
      Thanks for your reply Heng. However, I would like to have the phenotypic data included since we have genes covered in this region and SNP combinations clearly affect mapping.

      Your suggestion is a work-around but not a solution to the issue. I imagine working with bigger plant genomes will result in problems as well and a real fix would be really welcome.

      Comment


      • #4
        In any case, you should not include the multiple copies of the entire chromosome 6 and 17. You should identify highly divergent regions first and then do the alignment; otherwise you will get no reads mapped to chr6 and chr17. If you want to minimize misalignment in HLA, you should use the HLA database from EBI.

        Bwa will not support genomes longer than 4GB. See the bwa homepage.

        EDIT: Alternatively, you may consider the mapper from 1001genomes.org. It aligns reads to multiple genomes simultaneously. I have not tried, though.
        Last edited by lh3; 12-23-2009, 05:58 AM.

        Comment


        • #5
          Ok, we will split off the variable part and treat it seperately.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            The Impact of AI in Genomic Medicine
            by seqadmin



            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
            02-26-2024, 02:07 PM
          • seqadmin
            Multiomics Techniques Advancing Disease Research
            by seqadmin


            New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

            A major leap in the field has
            ...
            02-08-2024, 06:33 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:12 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-23-2024, 04:11 PM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-21-2024, 08:52 AM
          0 responses
          74 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-20-2024, 08:57 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X