Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GenoMax
    replied
    I think you are running out of memory (@Brian will confirm). How much RAM do you have (looks like you are asking for ~18G)?

    Leave a comment:


  • Bioinform
    replied
    Hi Brian,

    I have a some error with BBsplit.sh (BBmap35.66). I get the following error.,

    Module slurm/15.08.1 loaded
    java -Djava.library.path=/home/software/BBMap/bbmap/jni/ -ea -Xmx18891m -cp /home/software/BBMap/bbmap/current/ align2.BBSplitter ow=t fastareadlen=500 minhits=1 minratio=0.56 maxindel=20 qtrim=rl untrim=t trimq=6 in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq ref=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta,/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta basename=PST_407_%.fq outu=PST_407_unmapped.fq
    Executing align2.BBSplitter [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, ref=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta,/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta, basename=PST_407_%.fq, outu=PST_407_unmapped.fq]

    Converted arguments to [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, basename=PST_407_%.fq, outu=PST_407_unmapped.fq, ref_Cmm382_genome=/home/Whole_genome_analysis/Reference/NCPPB382/Cmm382_genome.fasta, ref_Xcitri_jx6=/home/Whole_genome_analysis/Reference/NCPPB382/Xcitri_jx6.fasta]
    Creating merged reference file ref/genome/1/merged_ref_1588668246460521.fa.gz
    Ref merge time: 1.421 seconds.
    Executing align2.BBMap [ow=t, fastareadlen=500, minhits=1, minratio=0.56, maxindel=20, qtrim=rl, untrim=t, trimq=6, in=/home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/GGAGC_trimmed1.fastq, outu=PST_407_unmapped.fq, ref=ref/genome/1/merged_ref_1588668246460521.fa.gz, out_Cmm382_genome=PST_407_Cmm382_genome.fq, out_Xcitri_jx6=PST_407_Xcitri_jx6.fq]

    BBMap version 35.66
    Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.560
    Retaining first best site only for ambiguous mappings.
    NOTE: Deleting contents of ref/genome/1 because reference is specified and overwrite=true
    Writing reference.
    Executing dna.FastaToChromArrays2 [ref/genome/1/merged_ref_1588668246460521.fa.gz, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

    Set genScaffoldInfo=true
    Writing chunk 1
    Set genome to 1

    Loaded Reference: 0.005 seconds.
    Loading index for chunk 1-1, build 1
    No index available; generating from reference genome: /home/Whole_genome_analysis/HiseQ/HiseQ_data/PST/Trimmed_Seq/ref/index/1/chr1_index_k13_c7_b1.block
    Indexing threads started for block 0-1
    Indexing threads finished for block 0-1
    Generated Index: 4.556 seconds.
    Analyzed Index: 2.561 seconds.
    Started output stream: 0.526 seconds.
    Cleared Memory: 0.209 seconds.
    Processing reads in single-ended mode.
    Started read stream.
    Started 16 mapping threads.
    java.lang.ArrayIndexOutOfBoundsException: 0
    at stream.FASTQ.quadToRead(FASTQ.java:723)
    at stream.FASTQ.toReadList(FASTQ.java:653)
    at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:111)
    at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:96)
    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:659)
    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:638)
    Exception in thread "Thread-22" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-23" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-17" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-21" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-25" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-30" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-16" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Detecting finished threads: 0, 1Exception in thread "Thread-19" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-20" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-28" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-29" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-26" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-18" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    , 2, 3, 4, 5, 6, 7Exception in thread "Thread-27" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    Exception in thread "Thread-24" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    , 8, 9, 10, 11, 12, 13, 14Exception in thread "Thread-31" java.lang.NullPointerException
    at align2.AbstractMapThread.run(AbstractMapThread.java:563)
    , 15

    **************************************************************************
    Warning! 16 mapping threads did not terminate normally.
    Check the error log; the output may be corrupt or incomplete.
    Please submit the full stderr output as a bug report, not just this message.
    **************************************************************************




    ------------------ Results ------------------

    Genome: 1
    Key Length: 13
    Max Indel: 20
    Minimum Score Ratio: 0.56
    Mapping Mode: normal
    Reads Used: 3200 (446131 bases)

    Mapping: 2.454 seconds.
    Reads/sec: 1303.92
    kBases/sec: 181.79


    Read 1 data: pct reads num reads pct bases num bases

    mapped: 0.9688% 31 0.7738% 3452
    unambiguous: 0.3438% 11 0.3373% 1505
    ambiguous: 0.6250% 20 0.4364% 1947
    low-Q discards: 0.0000% 0 0.0000% 0

    perfect best site: 0.0313% 1 0.0043% 19
    semiperfect site: 0.0313% 1 0.0043% 19

    Match Rate: NA NA 85.1230% 2941
    Error Rate: 17.0455% 30 14.8770% 514
    Sub Rate: 17.0455% 30 14.4718% 500
    Del Rate: 1.7045% 3 0.0868% 3
    Ins Rate: 3.4091% 6 0.3184% 11
    N Rate: 0.0000% 0 0.0000% 0
    Exception in thread "main" java.lang.RuntimeException: BBMap terminated in an error state; the output may be corrupt.
    at align2.BBMap.testSpeed(BBMap.java:489)
    at align2.BBMap.main(BBMap.java:34)
    at align2.BBSplitter.main(BBSplitter.java:46)

    Is there any thing you help with?

    Thanks

    Leave a comment:


  • Brian Bushnell
    replied
    Thanks for finding the specific part that triggers the error, that will be really helpful! I'll try to address that as soon as possible (hopefully tomorrow).

    Edit: It's fixed now; I'll upload the new version soon.
    Last edited by Brian Bushnell; 11-12-2015, 03:42 PM.

    Leave a comment:


  • vmikk
    replied
    cutprimers.sh error

    Hello Brian!
    I tried to extract a region of interest from the SILVA database with cutprimers.sh (from BBMap v.35.66) and encountered the following error:

    Code:
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
    	at java.lang.System.arraycopy(Native Method)
    	at java.util.Arrays.copyOfRange(Arrays.java:2551)
    	at jgi.CutPrimers.process(CutPrimers.java:166)
    	at jgi.CutPrimers.main(CutPrimers.java:35)
    I split the database and found that only a small part causes this error, but I can not figure out what is wrong with it. I would appreciate your help.

    Reproducible example:
    Code:
    msa.sh in=ERR.fasta out=pr1.sam literal="CCGTTCTTAGTTGGTGG"
    msa.sh in=ERR.fasta out=pr2.sam literal="TAACAAGGTTTCCGTAGGTG"
    cutprimers.sh in=ERR.fasta out=ERR_cut.fasta sam1=pr1.sam sam2=pr2.sam
    File is in the attachment.

    Edit: Thanks a lot for the fix! With v.35.68 everything works fine!
    Attached Files
    Last edited by vmikk; 11-17-2015, 09:49 PM.

    Leave a comment:


  • Brian Bushnell
    replied
    BBSplit is ideal for purpose of getting one file per organism, and there's no upper limit to the number of genomes it can handle. Note that BBSplit needs to have each reference in its own file (e.g. ref=virus1.fa,virus2.fa, etc), and also it should not be used to produce sam files, just fasta/fastq. There's also Seal, which can function similarly to BBSplit, and produces either one output file per reference file or one per reference sequence, so all your organisms can be in the same reference file if you want. It also has no upper limit on the number of reference sequences.

    I'll add a flag to tell BBMap/Pileup to ignore deletions when calculating coverage. It does seem fairly useful, so I'll probably do that soon.

    ...OK, I added it. I'll upload the new version next week, probably on Monday.
    Last edited by Brian Bushnell; 10-30-2015, 05:11 PM.

    Leave a comment:


  • rocksd
    replied
    Thanks Brian!

    What if I use BBSplit? How many genomes can BBsplit handle? What I ultimately want to do is to figure out how many reads mapped to each genome and separate those reads by genome so that we can evaluate them later if necessary. For instance, we generate a fastq file with MiSeq, we bbsplit on say 1000 viral genomes, and see which virus has more hits. We then somehow report what percentage of the genome is covered with reads and on average at what fold coverage.

    Your kind input is highly appreciated.

    James

    Leave a comment:


  • Brian Bushnell
    replied
    It looks like you are doing it correctly. Bases covered by deletion events are treated as covered for "percent covered", but "fold coverage" is calculated by the total number of mapped bases divided by the length, so that is not affected by long deletions. So, only the percent covered column is affected by long deletions. You simply ban long deletions with a flag like "strictmaxindel=10" if you don't want those reads.

    Alternately, produce a sam file, and then run it through a different program that treats bases covered by deletion events as not covered. I'm not sure which tool would do that, but probably one exists. Maybe samtools or bedtools?

    Leave a comment:


  • rocksd
    replied
    Hi Brian,

    Thanks a lot for your quick response!

    Perhaps you can give me some advice: I have a metagenomic sequencing fast file. I want to know how many reads are mapped to each of the viral genomes and what % of each viral genome is covered at what fold. What is the best way to do that?

    Thanks in advance!

    James

    Leave a comment:


  • Brian Bushnell
    replied
    Hi James,

    BBMap and Pileup consider bases covered by a deletion event as covered. So presumably one or both of your reads mapped with a long deletion ("D" symbol in the cigar string). Please let me know if that's not the case.

    Covered_percent is simply 100*Covered_bases/Length.

    Leave a comment:


  • rocksd
    replied
    Hi Brian,

    I am using bbmap.sh to evaluate coverage on a collection of viral genomes (viral all.fna downloaded from NCBI). The command I am using is as follow:

    bbmap.sh in=XXX.fastq ref=XXX/all.fna nodisk covstats=xxx.txt

    I got results in the XXX.txt file but I am not sure if I understand it correctly. For instance, for one of the gi| entry, I got the following information:

    #ID Avg_fold Length Ref-GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC

    gi|XXXXXX 0.0926 195859 0.6661 9.2561 18129 1 1 0.9444

    How come with one Plus_reads and one minus_reads you can get covered_base=18129? How does the Covered_percent calculated?

    Your kind help is highly appreciated.

    James

    Leave a comment:


  • Brian Bushnell
    replied
    Unfortunately, I'm not sure how this could be accomplished without changing the scoring matrix, which is currently hard-coded, or else doing custom post-processing the output.

    It's an odd request, though, as PacBio reads have substitution errors as well as indels. If not for the indels, the "semiperfectmode" might accomplish what you want - it requires perfect mappings, aside from the parts that go off the ends of contigs.

    Perhaps the best bet would be to align with "bbmapskimmer.sh", which tries to report all alignments above a certain score cutoff, rather than looking for the single best alignment like normal BBMap (and it is designed for PacBio reads). Then, subfilter=0 would get rid of sites with substitutions, but shorter sub-free alignments would be retained.

    Leave a comment:


  • lankage
    replied
    mapPacBio.sh - weighting soft clipped mappings higher than mappings with substitution

    Hi Brian,
    I was wondering if there was a way to adjust the mapping score weights such as to favor a mapping that perfectly maps to a reference with soft clipped ends over a mapping with a number of substitutions. Our problem is that we have allele databases of variable length sequences that closely resemble one another (in this case Major Histocompatibility complex class II - macaques) where our pacbio reads end up extending well past the reference sequence, but map perfectly to the reference. We want to include gapped mappings but not mappings with substitutions for the purpose of counting reads to individual ref sequences taking into account pacbio's tendency to introduce random indels.

    We have been using bbmap - with the pacbio option and the subfilter=0 option as packaged within Geneious 9.2. This has the result of discarding perfect mappings to allele references where there is a higher scoring hit to a longer reference sequence with substitutions that subsequently gets filtered out post-mapping with the subfilter=0 option.

    This looks to give the correct result if we use a full length allele reference file, however many of the alleles we need to count for genotyping don't have a full-length ref sequence identified as of yet.

    Leave a comment:


  • mhgseq
    replied
    Thank you very much. After specify K in both command, it works.

    Leave a comment:


  • Brian Bushnell
    replied
    BBMap needs a fairly large amount of memory dependent on kmer size, regardless of how small the reference is. You can estimate it like this:

    Code:
    C:\temp\ecoli>java jgi.AssemblyStats2 in=ecoli_K12.fa k=13
    A       C       G       T       N       IUPAC   Other   GC      GC_stdev
    0.2462  0.2542  0.2537  0.2459  0.0000  0.0000  0.0000  0.5079  0.0000
    
    Main genome scaffold total:             1
    Main genome contig total:               1
    Main genome scaffold sequence total:    4.640 MB
    [I](stuff)...[/I]
    
    [B]BBMap minimum memory estimate at k=13:     [COLOR="DarkRed"]-Xmx900m[/COLOR]     (at least 1000 MB physical RAM)[/B]
    So, it looks like K=13 with a tiny genome like E.coli still needs 900MB! Let's try K=12:

    Code:
    C:\temp\ecoli>java jgi.AssemblyStats2 in=ecoli_K12.fa k=12
    A       C       G       T       N       IUPAC   Other   GC      GC_stdev
    0.2462  0.2542  0.2537  0.2459  0.0000  0.0000  0.0000  0.5079  0.0000
    
    Main genome scaffold total:             1
    Main genome contig total:               1
    Main genome scaffold sequence total:    4.640 MB
    [I](stuff)...[/I]
    
    [B]BBMap minimum memory estimate at k=12:     [COLOR="DarkRed"]-Xmx480m[/COLOR]     (at least 540 MB physical RAM)[/B]
    That's more reasonable; K=12 will work with -Xmx480m. Note that you must specify K both when building the index AND mapping.

    Leave a comment:


  • mhgseq
    replied
    out of memory error

    even with a very small ref and input file, I got the out of memory error. Would you check what is happen?


    bbmap.sh in=~/Documents/align/test.fa out=~/Documents/align/outputu.sam

    java -Djava.library.path=/home/local/bbmap/jni/ -ea -Xmx700m -cp /home/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=/home/Documents/align/HLAtest.fa out=/home/Documents/align/outputu.sam
    Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in=/home/Documents/align/test.fa, out=/home/Documents/align/outputu.sam]


    BBMap version 35.51
    Retaining first best site only for ambiguous mappings.
    Set genome to 1

    Loaded Reference: 0.023 seconds.
    Loading index for chunk 1-1, build 1
    Generated Index: 0.817 seconds.
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at align2.BBIndex.analyzeIndex(BBIndex.java:106)
    at align2.BBMap.loadIndex(BBMap.java:410)
    at align2.BBMap.main(BBMap.java:32)

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X