Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Hi Brain,

    I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

    Code:
    $ bbmap.sh --version
    java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
    Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]
    
    BBMap version 37.10
    
    $ java -version
    openjdk version "1.8.0_232"
    OpenJDK Runtime Environment (build 1.8.0_232-b09)
    OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
    
    $ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
    ...
    Loaded Reference:	0.020 seconds.
    Loading index for chunk 1-7, build 1
    No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
    Indexing threads started for block 0-3
    Indexing threads finished for block 0-3
    No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
    Indexing threads started for block 4-7
    /usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
    
    $ echo $?
    137
    Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

    Comment


    • #32
      /usr/local/bin/bbmap.sh: line 349: 9039 Killed

      Hi Brain,

      I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

      Code:
      $ bbmap.sh --version
      java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
      Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]
      
      BBMap version 37.10
      
      $ java -version
      openjdk version "1.8.0_232"
      OpenJDK Runtime Environment (build 1.8.0_232-b09)
      OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
      
      $ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
      ...
      Loaded Reference: 0.020 seconds.
      Loading index for chunk 1-7, build 1
      No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
      Indexing threads started for block 0-3
      Indexing threads finished for block 0-3
      No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
      Indexing threads started for block 4-7
      /usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
      
      $ echo $?
      137
      Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

      Comment


      • #33
        /usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...

        Hi Brain,

        I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

        Code:
        $ bbmap.sh --version
        java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
        Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]
        
        BBMap version 37.10
        
        $ java -version
        openjdk version "1.8.0_232"
        OpenJDK Runtime Environment (build 1.8.0_232-b09)
        OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
        
        $ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
        ...
        Loaded Reference: 0.020 seconds.
        Loading index for chunk 1-7, build 1
        No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
        Indexing threads started for block 0-3
        Indexing threads finished for block 0-3
        No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
        Indexing threads started for block 4-7
        /usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
        
        $ echo $?
        137
        Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

        Comment


        • #34
          how may I create a masked genome reference for mice?

          Hi Brian,
          I would love to have a masked version of Mus musculus genome. Could you probably share the details how you created the human masked genome, such that I can replicate this for mice?
          Bests,
          Ulrike

          Comment


          • #35
            Originally posted by Brian Bushnell View Post
            ... and the entire silva ribosomal database. .
            Since then there were some SILVA updates, latest (V138) end of last year. Would you mind to update the human masked hg19? Much appreciated your effort in the community!
            Thank you so much!

            Comment


            • #36
              @uloeber: @Brian described the method of masking in first post of this thread. It uses "bbmask.sh" which is a component of BBMap suite. Take a look at in-line help for bbmask.sh to get an idea of what parameters you can use.

              1) Areas containing multiple repeat short kmers.
              2) Windows of low entropy, calculated using pentamer frequencies.
              3) Via mapping from sam files:
              The mapping is the more interesting approach. I used every fungal assembly on JGI's Mycocosm, every plant genome on JGI's phytozome, and a handful of others, including zebra danio (the closest relative to human I included), and the entire silva ribosomal database. First, I completely removed the assemblies that contained human contamination (a handful). Then, I shredded the remaining data into 70-80bp pieces, mapped it to human requiring a minimum of ~85% identity, and used BBMask to mask everything the shreds hit.

              Comment


              • #37
                Hi Everyone!

                Thank you for existing masked databases and the guidance on how to create your own database. I think it would be good if there would be a more formalised guide that includes the actual commands to execute so that there are no misunderstandings. I will try to write it up and would be happy if you would correct me if something is missing or wrong.

                The goal is to mask all bases from a host that could also come from the microbiome. I will focus only on Archaea and Prokaryotes.

                1. Input:
                - A genome/gene database with a as complete as possible representation of bacterial/archaeal species. E.g. 100k species from Progenomes2 (now referred to as progrenomes.fasta)
                - The host genome (now referred to as host.fasta)
                - The silva 138 database (now referred to as silva.fasta)
                - bbmap.sh/bbmask.sh

                2. Output:
                - A masked host genome (now referred to as host.masked.fasta)

                How to get there:

                - Take progenomes.fasta + silva.fasta and split all sequences into subsequences of 80bp. Should they be overlapping sequences extracted like in a sliding window or do you just chop each sequence into subsequences of equal length? (Output now referred to as progrenomes.silva.chopped.fasta). Does bbmap have a tool for this shredding?
                - Align the progenomes.silva.chopped.fasta against the host genome and store as sam file. command: bbmap.sh ref=host.fasta in=progrenomes.silva.chopped.fasta minid=0.85 out=progenomes.silva.chopped.sam mappedonly=t secondary=t idfilter=0.85 cigar=t
                - Use bbmask.sh with standard parameters and the progenomes.silva.chopped.sam as input. command: bbmask.sh in=host.fasta sam=progenomes.silva.chopped.sam out=host.masked.fasta

                Does this sound about right?

                Thanks a lot,

                Hans

                Comment


                • #38
                  Hello everyone,

                  I have a small problem in understanding the working of 'removehuman.sh' script from the BBTools (Version 39.01). Maybe I did something wrong.

                  I was trying to remove human reads from paired-end fastq.gz files with a cmd similar to this:

                  bash $BBMAP_dir/removehuman.sh in1=${R1_File} in2=${R2_File} out1=${sample_name}_cleaned_1.fastq.gz out2=${sample_name}_cleaned_2.fastq.gz t=12

                  It invoked various commands in the backend and I got 2 output files as expected. The output files are lesser in size than the original files, so I assumed the difference is due to human read removal.

                  For eg. one of the file sizes were as was such:

                  266K SRR1X_1.fastq.gz 252K SRR1X_cleaned_1.fastq.gz
                  249K SRR1X_2.fastq.gz 235K SRR1X_cleaned_2.fastq.gz
                  But when I looked into the read count of all 4 files (2 input & 2 output files) with FASTQC (Version 11.9) s/w, they were all the same.
                  I confirmed the read count with a simple cmd to count the number of lines and divide it by 4. It was the same read count even then.

                  Cmd to count reads: echo $(zcat ${FILENAME}.fastq.gz|wc -l)/4|bc
                  So my question is,
                  a. Is this normal behavior? (Are the reads not removed but retained & just masked in the original file somehow?) OR
                  b. Did the code not work the way I was expecting it to work? (i.e, to remove human reads and give me all other reads only)

                  Clearly, I am overthinking this thing and ran the code in a wrong way. But still, can you help me understand what went wrong?


                  The only difference that I saw that
                  1. The reads in the final files were in a different order than before
                  2. The id variable and other things (from the 3rd line of each read) were removed from the output file.

                  For eg. The green highlighted part was missing form the output file.
                  One read from the original file:
                  @SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
                  GAACGCTGGGCAGGGGGGGCGTCATGAGGCTACC
                  +
                  SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
                  DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIII

                  The same read from the output file:
                  @SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
                  GAACGCTGGGCAGGGGGGGCGTCATGAGGCTACC
                  +
                  DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                  I think the code ran fine without any errors. But maybe it was the wrong way of running it.
                  java -ea -Xmx15000m -Xms15000m -cp /home/gpfs/o_shinde/Softwares_Tejus/bbmap/current/ align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=$path_human_genome/GRCh38_human_genome pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount ztd=2 kfilter=25 maxsites=1 k=14 bloomfilter in1=$path/SRR1X_1.fastq.gz
                  in2=$path/SRR1X_2.fastq.gz
                  out1=SRR1X_cleaned_1.fastq.gz out2=SRR1X_cleaned_2.fastq.gz t=12

                  Executing align2.BBMap [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/home/gpfs/o_shinde/References_local/GRCh38_human_genome, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, ztd=2, kfilter=25, maxsites=1, k=14, bloomfilter,
                  in1=$path/SRR1X_1.fastq.gz,
                  in2=$path/SRR1X_2.fastq.gz,
                  out1=SRR1X_cleaned_1.fastq.gz, out2=SRR1X_cleaned_2.fastq.gz, t=12]

                  Version 39.00
                  P.S. Attaching the fast files for reference.
                  Attached Files

                  Comment


                  • #39
                    Any chance someone generated a bacterial genome masked human genome database using the latest human genome?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Choosing Between NGS and qPCR
                      by seqadmin



                      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                      10-18-2024, 07:11 AM
                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      10-07-2024, 08:07 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 11-01-2024, 06:09 AM
                    0 responses
                    15 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-30-2024, 05:31 AM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-24-2024, 06:58 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-23-2024, 08:43 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X