Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • reference size for BWA index

    Dear all,
    I would like to take a mapping with bwa_mem on a genomic reference of 10GB but I have a problem with bwa index.
    Is there a maximum size for the reference with bwa index?
    how to make the index of my reference of 10GB?
    Thanks for your help
    Morgane ARDISSON

  • #2
    Code:
    bwa index -a bwtsw genome.fasta
    Code:
    Usage:   bwa index [options] <in.fasta>
    
    Options: -a STR    BWT construction algorithm: bwtsw or is [auto]
             -p STR    prefix of the index [same as fasta name]
             -b INT    block size for the bwtsw algorithm (effective with -a bwtsw) [10000000]
             -6        index files named as <in.fasta>.64.* instead of <in.fasta>.*
    
    Warning: `-a bwtsw' does not work for short genomes, while `-a is' and
             `-a div' do not work not for long genomes.
    You could also try BBMap as I have used it on the 32Gbp axolotl genome.

    Comment


    • #3
      Hello,
      Thank you for your answer. But I have a new problem.
      When I try to create the BAM index, i encounter a new problem. The BAM index format seems to be not fit for Reference sequence above 512Mb.
      So with the picard tools I do have this error:
      picard.sam.MergeSamFiles INPUT=[./resultat_mapping.Tc3423_tmp/Tc3423.paired.bam, ./resultat_mapping.Tc3423_tmp/Tc3423.single.bam] OUTPUT=resultat_mapping.Tc3423.bam SORT_ORDER=
      coordinate MERGE_SEQUENCE_DICTIONARIES=true VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true ASSUME_SORTED=false USE_THREADING=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000
      CREATE_MD5_FILE=false
      [Tue Nov 06 13:37:53 CET 2018] Executing as ardisson@cc2-n7 on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_76-b13; Picard version: 1.130(8b3e8abe25f920f5aa569db482bb999f29
      cc447b_1427207353) IntelDeflater
      INFO 2018-11-06 13:37:54 MergeSamFiles Input files are in same order as output so sorting to temp directory is not needed.
      [Tue Nov 06 13:37:57 CET 2018] picard.sam.MergeSamFiles done. Elapsed time: 0.05 minutes.
      Runtime.totalMemory()=2058354688
      To get help, see http://broadinstitute.github.io/pica...ml#GettingHelp
      Exception in thread "main" htsjdk.samtools.SAMException: Exception when processing alignment for BAM index ST-J00115:130:HMNN3BBXX:4:1112:8044:48386 2/2 144b aligned read.
      at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:124)
      at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:178)
      at picard.sam.MergeSamFiles.doWork(MergeSamFiles.java:158)
      at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
      at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
      at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
      Caused by: htsjdk.samtools.SAMException: Exception creating BAM index for record ST-J00115:130:HMNN3BBXX:4:1112:8044:48386 2/2 144b aligned read.
      at htsjdk.samtools.BAMIndexer.processAlignment(BAMIndexer.java:92)
      at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:121)
      ... 5 more
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 32775
      at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:136)
      at htsjdk.samtools.BAMIndexer$BAMIndexBuilder.processAlignment(BAMIndexer.java:195)
      at htsjdk.samtools.BAMIndexer.processAlignment(BAMIndexer.java:90)
      ... 6 more
      Is there a way to solve this?

      Thank you for your help
      Morgane ARDISSON

      Comment


      • #4
        I am not sure I understand what you are trying to do. Are you trying to merge the mapped paired-end reads BAM file with the mapped single-end reads BAM file? Perhaps a newer version of Picard might help or perhaps using
        Code:
        samtools merge
        ?

        Comment


        • #5
          Hi,
          I mapped separately paired end and single end reads. Then I sorted the BAMs with the picard tools with the option CREATE_INDEX=FALSE and it goes well. Then I tried to merge them, but with the options CREATE_INDEX=TRUE. I did this plenty of times on smaller reference without any problems.

          The error message I have seems to be related the Bam Index creation which cannot handle chromosomes of size > 512MB. As I have chromosomes of more than 1 GB, I am stuck.

          Do you know any way to go around this limitation of the BAM index?

          Comment


          • #6
            Sorry, I don't know of a solution. The best I can suggest is to report this to the issues page on the Picard github repo.
            A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. - Issues · broadinstitute/picard


            Again, maybe a newer version of Picard or the newest version of samtools (samtools merge) or perhaps sambama (sambamba merge) might work?

            Another possibility is to do the merge with CREATE_INDEX=FALSE. Then see if samtools index or sambamba index can make the index of the merged BAM.
            Last edited by Gopo; 11-13-2018, 11:26 PM. Reason: clarity

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            66 views
            0 likes
            Last Post seqadmin  
            Working...
            X