Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sschavan
    replied
    Thank you Brian, this information certainly helps. I will try it out.

    SSC

    Leave a comment:


  • Brian Bushnell
    replied
    BBSplit supports all the flags supported by BBMap, including these:

    Code:
    idfilter=0              Independant of minid; sets exact minimum identity 
                            allowed for alignments to be printed.  Range 0 to 1.
    subfilter=-1            Ban alignments with more than this many substitutions.
    insfilter=-1            Ban alignments with more than this many insertions.
    delfilter=-1            Ban alignments with more than this many deletions.
    indelfilter=-1          Ban alignments with more than this many indels.
    editfilter=-1           Ban alignments with more than this many edits.
    inslenfilter=-1         Ban alignments with an insertion longer than this.
    dellenfilter=-1         Ban alignments with a deletion longer than this.
    Those are postfiltering operations - they take place after alignment, after ambiguity was decided, etc. and just suppress alignments failing the filters.

    You can alternately map with the "semiperfectmode" flag, which will only allow alignments in which there are no mismatches or indels (this is different from "perfectmode" in that semiperfectmode allows a read to go off the end of a contig; perfectmode requires all bases to match the reference). Or, you could use the flag "minid=0.99" which will require 99% identity. These latter options are faster than mapping at default sensitivity, then imposing a specific filter.
    Last edited by Brian Bushnell; 06-15-2015, 12:07 PM.

    Leave a comment:


  • sschavan
    replied
    Hello Brian,

    Thank you for the detailed explanation, I appreciate your response.

    I was wondering if I can set BBSplit to allow (a) maximum one mismatch and (b) 0 mismatch. If I could set it in this way probably I could use it to separate Mouse and Human reads, at a high stringency level, to start with.

    About the error in filterbyname.sh, the result output (OnlyHumReads_BBv4.sam) was an incomplete file with only two header lines, presumably its truncated output. I am re-running it with -da flag, I will update if that worked for me.

    Thanks,
    SSC

    Leave a comment:


  • Brian Bushnell
    replied
    By default BBSplit uses fairly strict mapping parameters; you can get the same sensitivity as BBMap by adding the flags "minid=0.76 maxindel=16k minhits=1". With those parameters it is extremely sensitive.

    BBSplit has different ambiguity settings for dealing with reads that map to multiple genomes. In any case, if the alignment score is higher to one genome than another, it will be associated with that genome only (this considers the combined scores of read pairs - pairs are always kept together). But when a read or pair has two identically-scoring mapping locations, on different genomes, the behavior is controlled by the "ambig2" flag - "ambig2=toss" will discard the read, "all" will send it to all output files, and "split" will send it to a separate file for ambiguously-mapped reads (one per genome to which it maps).

    As for your specific error - that seems to be possibly related to a read with an invalid cigar string with more clipped bases than actual bases, or a read of zero length, or something like that. The program may have problems with secondary alignments in which the bases are suppressed (with a * symbol instead) - I have not extensively tested it on sam files, just fasta/fastq. I'll see if I can replicate and fix it (if it's a bug), and at least make it print out the actual line that causes the problem, since the current error message is not very useful. In the meantime, the error message appears to be harmless and you can skip it by disabling assertions with the -da flag:

    bbmap/filterbyname.sh in=HumReads_BBv4.sam names=MouReads_BBv4.sam out=OnlyHumReads_BBv4.sam include=f -da

    Leave a comment:


  • sschavan
    replied
    Hello GenoMax,

    Thank you for the suggestion, yes I am trying to separate Human and Mouse reads, I will look into BBSplit as well, however, I was wondering about the alignment stringency that is considered by BBSplit to call a read as mapped to Human or Mouse reference, and how much that would differ as compared to BWA/Stampy.

    SSC

    Leave a comment:


  • GenoMax
    replied
    @sschavan: It appears that you are trying to separate human/mouse data. While Brian responds to your specific question you can do the separation using BBSplit with original reads: http://seqanswers.com/forums/showthread.php?t=41288

    Leave a comment:


  • sschavan
    replied
    Hello Brian,

    Thanks for making this tool available. I am trying to use it to compare two sam files and find the reads that are unique to one of the bam files. However, I encountered the below error. Could you possibly point out the problem here?

    Your help is highly appreciated.

    Thank you,
    SSC

    Command:
    bbmap/filterbyname.sh in=HumReads_BBv4.sam names=MouReads_BBv4.sam out=OnlyHumReads_BBv4.sam include=f

    Console:
    java -ea -Xmx55181m -cp /Align2MouseGenome/bbmap/current/ driver.FilterReadsByName in=HumReads_BBv4.sam names=MouReads_BBv4.sam out=OnlyHumReads_BBv4.sam include=f
    Executing driver.FilterReadsByName [in=HumReads_BBv4.sam, names=MouReads_BBv4.sam, out=OnlyHumReads_BBv4.sam, include=f]

    java.lang.AssertionError: 24056740, 24056739
    at stream.SamLine.toRead(SamLine.java:1490)
    at stream.SamLine.toRead(SamLine.java:1451)
    at stream.SamReadInputStream.toReadList(SamReadInputStream.java:125)
    at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:98)
    at stream.SamReadInputStream.nextList(SamReadInputStream.java:81)
    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667)
    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:646)
    Time: 1900.461 seconds.
    Reads Processed: 17281200 9.09k reads/sec
    Bases Processed: 1732191375 0.91m bases/sec
    Reads Out: 0
    Bases Out: 0
    Exception in thread "main" java.lang.RuntimeException: driver.FilterReadsByName terminated in an error state; the output may be corrupt.
    at driver.FilterReadsByName.process(FilterReadsByName.java:363)
    at driver.FilterReadsByName.main(FilterReadsByName.java:42)

    Leave a comment:


  • Marisa_Miller
    replied
    Originally posted by Brian Bushnell View Post
    OK, I've fixed that too, in v34.48 I had never actually run it before on sam files, just fasta and fastq, so thanks for finding those bugs!

    FYI, the reason it does not complain if the filename is wrong for "names" is because if it is a filename, it will use the names in the file; if it is not a valid filename, it will use that as a literal list of names to filter against.
    Thank you! Just tested the new version and it's working now. And that makes sense about the "names" input. I will keep that in mind for the future.

    Leave a comment:


  • Brian Bushnell
    replied
    OK, I've fixed that too, in v34.48 I had never actually run it before on sam files, just fasta and fastq, so thanks for finding those bugs!

    FYI, the reason it does not complain if the filename is wrong for "names" is because if it is a filename, it will use the names in the file; if it is not a valid filename, it will use that as a literal list of names to filter against.

    Leave a comment:


  • Marisa_Miller
    replied
    Originally posted by Brian Bushnell View Post
    Wow, I'm just full of fail! Yes, that would work. I will fix that bug in the next release.
    Sorry to bug you with another bug! I thought filterbyname.sh was working fine but I realized made an error in my command. Due to a typing error I was using a file that doesn't exist as my "names" file for filterbyname.sh. I realized I can input any made up file name for the "names" file, but if I put in a made up or wrong file name for the "in" file I will get an error message saying no such file exists.

    When I run the command using both the "in" and "names" sam files with no headers, I am now getting the following error:

    Code:
     java -ea -Xmx5534m -cp /home/kianians/millerm2/scripts/bbmap/current/ driver.FilterReadsByName in=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/mitochondria/chinese_spring/2_cs_mt_mapped_noheader.sam names=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/chloroplast/chinese_spring/2_cs_chlr_mapped_noheader.sam out=2_cs_mt_chlr_shared.sam include=t
    Executing driver.FilterReadsByName [in=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/mitochondria/chinese_spring/2_cs_mt_mapped_noheader.sam, names=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/chloroplast/chinese_spring/2_cs_chlr_mapped_noheader.sam, out=2_cs_mt_chlr_shared.sam, include=t]
    
    Input is being processed as unpaired
    Exception in thread "Thread-3" java.lang.AssertionError: Sam format cannot be used as input to this program when no genome build is loaded.
    Please index the reference first and rerun with e.g. 'build=1', or use a different input format.
    	at stream.SamLine.<init>(SamLine.java:93)
    	at stream.ReadStreamByteWriter.writeSam(ReadStreamByteWriter.java:442)
    	at stream.ReadStreamByteWriter.processJobs(ReadStreamByteWriter.java:86)
    	at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:41)
    	at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:27)
    Exception in thread "main" java.lang.RuntimeException: Writing to a terminated thread.
    	at stream.ConcurrentGenericReadOutputStream.write(ConcurrentGenericReadOutputStream.java:198)
    	at stream.ConcurrentGenericReadOutputStream.addDisordered(ConcurrentGenericReadOutputStream.java:193)
    	at stream.ConcurrentGenericReadOutputStream.add(ConcurrentGenericReadOutputStream.java:97)
    	at driver.FilterReadsByName.process(FilterReadsByName.java:354)
    	at driver.FilterReadsByName.main(FilterReadsByName.java:42)
    Any suggestions? And thanks again for all the support you provide for your package!

    Leave a comment:


  • Brian Bushnell
    replied
    Wow, I'm just full of fail! Yes, that would work. I will fix that bug in the next release.

    Leave a comment:


  • Marisa_Miller
    replied
    Originally posted by Brian Bushnell View Post
    Hi Marisa,

    I think you're using an old version; Reformat did not use to be able to input and output sam files, but it can now. Just kill that and rerun it with the latest version (34.46).

    -Brian
    Hi Brian,

    I downloaded the new version and I was able to successfully filter out the mapped reads only.

    When I try to get the intersection between the two files I am getting this error:

    Code:
    Exception in thread "main" java.lang.AssertionError: Tried to make a SamLine from a header: @HD
    	at stream.SamLine.<init>(SamLine.java:420)
    	at stream.SamLine.<init>(SamLine.java:49)
    	at driver.FilterReadsByName.<init>(FilterReadsByName.java:144)
    	at driver.FilterReadsByName.main(FilterReadsByName.java:41)
    Can I fix this by removing the headers from the sam files containing only mapped reads?

    Thank you,
    Marisa

    Leave a comment:


  • Brian Bushnell
    replied
    Hi Marisa,

    I think you're using an old version; Reformat did not use to be able to input and output sam files, but it can now. Just kill that and rerun it with the latest version (34.46).

    -Brian
    Last edited by Brian Bushnell; 02-11-2015, 10:23 AM.

    Leave a comment:


  • Marisa_Miller
    replied
    Originally posted by Brian Bushnell View Post
    It's possible to do so with BBTools like this:

    reformat.sh in=file1.sam out=mapped1.sam mappedonly
    reformat.sh in=file2.sam out=mapped2.sam mappedonly


    That gets you the mapped reads only. Then:

    filterbyname.sh in=mapped1.sam names=mapped2.sam out=shared.sam include=t

    ...which gets you the set intersection;

    filterbyname.sh in=mapped1.sam names=mapped2.sam out=only1.sam include=f
    filterbyname.sh in=mapped2.sam names=mapped1.sam out=only2.sam include=f


    ...which get you the set subtractions.
    Hi Brian,

    Thanks again for the help. I am currently running the program now and it gave me an error message:

    Code:
    java -ea -Xmx200m -cp /home/kianians/millerm2/scripts/bbmap/current/ jgi.ReformatReads in=1_trimq10_100bpminlen_PE_cs_mt.sam out=1_trimq10_100bpminlen_PE_cs_mt_mapped.sam mappedonly
    Executing jgi.ReformatReads [in=1_trimq10_100bpminlen_PE_cs_mt.sam, out=1_trimq10_100bpminlen_PE_cs_mt_mapped.sam, mappedonly]
    
    Input is being processed as unpaired
    Exception in thread "Thread-3" java.lang.AssertionError: Sam format cannot be used as input to this program when no genome build is loaded.
    Please index the reference first and rerun with e.g. 'build=1', or use a different input format.
    	at stream.SamLine.<init>(SamLine.java:138)
    	at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:117)
    	at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:19)
    The program is still running so I'm not sure if it will complete normally or not. An output file has been created but it is 0 bytes currently.

    Leave a comment:


  • dpryan
    replied
    This is a simple enough sort of thing to write in python with pysam or C with htslib that I'd personally just do that. For example, here is a small C program that will accept two BAM files and tell you which alignments have different mapping positions or CIGAR strings. It'd be simple to modify that to compare only mapped vs unmapped and keep a count.

    Note: That's not production code, just an ad hoc solution with no error checking.
    Last edited by dpryan; 02-11-2015, 12:18 AM.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    Today, 01:16 PM
  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:15 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 10:28 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:35 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-22-2024, 02:06 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Working...
X