Thank you Brian, this information certainly helps. I will try it out.
SSC
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
BBSplit supports all the flags supported by BBMap, including these:
Code:idfilter=0 Independant of minid; sets exact minimum identity allowed for alignments to be printed. Range 0 to 1. subfilter=-1 Ban alignments with more than this many substitutions. insfilter=-1 Ban alignments with more than this many insertions. delfilter=-1 Ban alignments with more than this many deletions. indelfilter=-1 Ban alignments with more than this many indels. editfilter=-1 Ban alignments with more than this many edits. inslenfilter=-1 Ban alignments with an insertion longer than this. dellenfilter=-1 Ban alignments with a deletion longer than this.
You can alternately map with the "semiperfectmode" flag, which will only allow alignments in which there are no mismatches or indels (this is different from "perfectmode" in that semiperfectmode allows a read to go off the end of a contig; perfectmode requires all bases to match the reference). Or, you could use the flag "minid=0.99" which will require 99% identity. These latter options are faster than mapping at default sensitivity, then imposing a specific filter.Last edited by Brian Bushnell; 06-15-2015, 12:07 PM.
Leave a comment:
-
Hello Brian,
Thank you for the detailed explanation, I appreciate your response.
I was wondering if I can set BBSplit to allow (a) maximum one mismatch and (b) 0 mismatch. If I could set it in this way probably I could use it to separate Mouse and Human reads, at a high stringency level, to start with.
About the error in filterbyname.sh, the result output (OnlyHumReads_BBv4.sam) was an incomplete file with only two header lines, presumably its truncated output. I am re-running it with -da flag, I will update if that worked for me.
Thanks,
SSC
Leave a comment:
-
By default BBSplit uses fairly strict mapping parameters; you can get the same sensitivity as BBMap by adding the flags "minid=0.76 maxindel=16k minhits=1". With those parameters it is extremely sensitive.
BBSplit has different ambiguity settings for dealing with reads that map to multiple genomes. In any case, if the alignment score is higher to one genome than another, it will be associated with that genome only (this considers the combined scores of read pairs - pairs are always kept together). But when a read or pair has two identically-scoring mapping locations, on different genomes, the behavior is controlled by the "ambig2" flag - "ambig2=toss" will discard the read, "all" will send it to all output files, and "split" will send it to a separate file for ambiguously-mapped reads (one per genome to which it maps).
As for your specific error - that seems to be possibly related to a read with an invalid cigar string with more clipped bases than actual bases, or a read of zero length, or something like that. The program may have problems with secondary alignments in which the bases are suppressed (with a * symbol instead) - I have not extensively tested it on sam files, just fasta/fastq. I'll see if I can replicate and fix it (if it's a bug), and at least make it print out the actual line that causes the problem, since the current error message is not very useful. In the meantime, the error message appears to be harmless and you can skip it by disabling assertions with the -da flag:
bbmap/filterbyname.sh in=HumReads_BBv4.sam names=MouReads_BBv4.sam out=OnlyHumReads_BBv4.sam include=f -da
Leave a comment:
-
Hello GenoMax,
Thank you for the suggestion, yes I am trying to separate Human and Mouse reads, I will look into BBSplit as well, however, I was wondering about the alignment stringency that is considered by BBSplit to call a read as mapped to Human or Mouse reference, and how much that would differ as compared to BWA/Stampy.
SSC
Leave a comment:
-
@sschavan: It appears that you are trying to separate human/mouse data. While Brian responds to your specific question you can do the separation using BBSplit with original reads: http://seqanswers.com/forums/showthread.php?t=41288
Leave a comment:
-
Hello Brian,
Thanks for making this tool available. I am trying to use it to compare two sam files and find the reads that are unique to one of the bam files. However, I encountered the below error. Could you possibly point out the problem here?
Your help is highly appreciated.
Thank you,
SSC
Command:
bbmap/filterbyname.sh in=HumReads_BBv4.sam names=MouReads_BBv4.sam out=OnlyHumReads_BBv4.sam include=f
Console:
java -ea -Xmx55181m -cp /Align2MouseGenome/bbmap/current/ driver.FilterReadsByName in=HumReads_BBv4.sam names=MouReads_BBv4.sam out=OnlyHumReads_BBv4.sam include=f
Executing driver.FilterReadsByName [in=HumReads_BBv4.sam, names=MouReads_BBv4.sam, out=OnlyHumReads_BBv4.sam, include=f]
java.lang.AssertionError: 24056740, 24056739
at stream.SamLine.toRead(SamLine.java:1490)
at stream.SamLine.toRead(SamLine.java:1451)
at stream.SamReadInputStream.toReadList(SamReadInputStream.java:125)
at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:98)
at stream.SamReadInputStream.nextList(SamReadInputStream.java:81)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:646)
Time: 1900.461 seconds.
Reads Processed: 17281200 9.09k reads/sec
Bases Processed: 1732191375 0.91m bases/sec
Reads Out: 0
Bases Out: 0
Exception in thread "main" java.lang.RuntimeException: driver.FilterReadsByName terminated in an error state; the output may be corrupt.
at driver.FilterReadsByName.process(FilterReadsByName.java:363)
at driver.FilterReadsByName.main(FilterReadsByName.java:42)
Leave a comment:
-
Originally posted by Brian Bushnell View PostOK, I've fixed that too, in v34.48I had never actually run it before on sam files, just fasta and fastq, so thanks for finding those bugs!
FYI, the reason it does not complain if the filename is wrong for "names" is because if it is a filename, it will use the names in the file; if it is not a valid filename, it will use that as a literal list of names to filter against.
Leave a comment:
-
OK, I've fixed that too, in v34.48I had never actually run it before on sam files, just fasta and fastq, so thanks for finding those bugs!
FYI, the reason it does not complain if the filename is wrong for "names" is because if it is a filename, it will use the names in the file; if it is not a valid filename, it will use that as a literal list of names to filter against.
Leave a comment:
-
Originally posted by Brian Bushnell View PostWow, I'm just full of fail! Yes, that would work. I will fix that bug in the next release.
When I run the command using both the "in" and "names" sam files with no headers, I am now getting the following error:
Code:java -ea -Xmx5534m -cp /home/kianians/millerm2/scripts/bbmap/current/ driver.FilterReadsByName in=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/mitochondria/chinese_spring/2_cs_mt_mapped_noheader.sam names=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/chloroplast/chinese_spring/2_cs_chlr_mapped_noheader.sam out=2_cs_mt_chlr_shared.sam include=t Executing driver.FilterReadsByName [in=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/mitochondria/chinese_spring/2_cs_mt_mapped_noheader.sam, names=/home/kianians/millerm2/mitochondria_sequencing/samples1_14/mapping/chloroplast/chinese_spring/2_cs_chlr_mapped_noheader.sam, out=2_cs_mt_chlr_shared.sam, include=t] Input is being processed as unpaired Exception in thread "Thread-3" java.lang.AssertionError: Sam format cannot be used as input to this program when no genome build is loaded. Please index the reference first and rerun with e.g. 'build=1', or use a different input format. at stream.SamLine.<init>(SamLine.java:93) at stream.ReadStreamByteWriter.writeSam(ReadStreamByteWriter.java:442) at stream.ReadStreamByteWriter.processJobs(ReadStreamByteWriter.java:86) at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:41) at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:27) Exception in thread "main" java.lang.RuntimeException: Writing to a terminated thread. at stream.ConcurrentGenericReadOutputStream.write(ConcurrentGenericReadOutputStream.java:198) at stream.ConcurrentGenericReadOutputStream.addDisordered(ConcurrentGenericReadOutputStream.java:193) at stream.ConcurrentGenericReadOutputStream.add(ConcurrentGenericReadOutputStream.java:97) at driver.FilterReadsByName.process(FilterReadsByName.java:354) at driver.FilterReadsByName.main(FilterReadsByName.java:42)
Leave a comment:
-
Wow, I'm just full of fail! Yes, that would work. I will fix that bug in the next release.
Leave a comment:
-
Originally posted by Brian Bushnell View PostHi Marisa,
I think you're using an old version; Reformat did not use to be able to input and output sam files, but it can now. Just kill that and rerun it with the latest version (34.46).
-Brian
I downloaded the new version and I was able to successfully filter out the mapped reads only.
When I try to get the intersection between the two files I am getting this error:
Code:Exception in thread "main" java.lang.AssertionError: Tried to make a SamLine from a header: @HD at stream.SamLine.<init>(SamLine.java:420) at stream.SamLine.<init>(SamLine.java:49) at driver.FilterReadsByName.<init>(FilterReadsByName.java:144) at driver.FilterReadsByName.main(FilterReadsByName.java:41)
Thank you,
Marisa
Leave a comment:
-
Hi Marisa,
I think you're using an old version; Reformat did not use to be able to input and output sam files, but it can now. Just kill that and rerun it with the latest version (34.46).
-BrianLast edited by Brian Bushnell; 02-11-2015, 10:23 AM.
Leave a comment:
-
Originally posted by Brian Bushnell View PostIt's possible to do so with BBTools like this:
reformat.sh in=file1.sam out=mapped1.sam mappedonly
reformat.sh in=file2.sam out=mapped2.sam mappedonly
That gets you the mapped reads only. Then:
filterbyname.sh in=mapped1.sam names=mapped2.sam out=shared.sam include=t
...which gets you the set intersection;
filterbyname.sh in=mapped1.sam names=mapped2.sam out=only1.sam include=f
filterbyname.sh in=mapped2.sam names=mapped1.sam out=only2.sam include=f
...which get you the set subtractions.
Thanks again for the help. I am currently running the program now and it gave me an error message:
Code:java -ea -Xmx200m -cp /home/kianians/millerm2/scripts/bbmap/current/ jgi.ReformatReads in=1_trimq10_100bpminlen_PE_cs_mt.sam out=1_trimq10_100bpminlen_PE_cs_mt_mapped.sam mappedonly Executing jgi.ReformatReads [in=1_trimq10_100bpminlen_PE_cs_mt.sam, out=1_trimq10_100bpminlen_PE_cs_mt_mapped.sam, mappedonly] Input is being processed as unpaired Exception in thread "Thread-3" java.lang.AssertionError: Sam format cannot be used as input to this program when no genome build is loaded. Please index the reference first and rerun with e.g. 'build=1', or use a different input format. at stream.SamLine.<init>(SamLine.java:138) at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:117) at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:19)
Leave a comment:
-
This is a simple enough sort of thing to write in python with pysam or C with htslib that I'd personally just do that. For example, here is a small C program that will accept two BAM files and tell you which alignments have different mapping positions or CIGAR strings. It'd be simple to modify that to compare only mapped vs unmapped and keep a count.
Note: That's not production code, just an ad hoc solution with no error checking.Last edited by dpryan; 02-11-2015, 12:18 AM.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...-
Channel: Articles
07-08-2024, 03:19 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:46 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:46 AM
|
||
Started by seqadmin, 07-24-2024, 11:09 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
07-24-2024, 11:09 AM
|
||
Started by seqadmin, 07-19-2024, 07:20 AM
|
0 responses
159 views
0 likes
|
Last Post
by seqadmin
07-19-2024, 07:20 AM
|
||
Started by seqadmin, 07-16-2024, 05:49 AM
|
0 responses
127 views
0 likes
|
Last Post
by seqadmin
07-16-2024, 05:49 AM
|
Leave a comment: