Thanks for your quick reply. I ran a test run with both the references in the same command.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Brian,
I'm trying to use bbsplit to separate rnaseq reads from two mixed fungal samples. I'm using the individual transcriptomes as references. I was getting some unexpected results. It seemed that more reads were unambiguously mapping to the reference that is listed first, so I swapped the order of the references and the results changed dramatically. I have ambiguous2=toss, but it seems like it's still using the first best site. Below are my commands and refstats output. Is there anything I'm doing wrong?
Thanks,
Brian
Code:bbsplit.sh ref=53.fasta,17.fasta \ in=53_30_r1_S7_R1_001.fastq.gz in2=53_30_r1_S7_R2_001.fastq.gz \ out_17=map17_53_30_r1_S7_R#_001.fastq.gz \ out_53=map53_53_30_r1_S7_R#_001.fastq.gz \ refstats=53_30_r1_S7.stats ambiguous2=toss #name %unambiguousReads unambiguousMB %ambiguousReads ambiguousMB unambiguousReads ambiguousReads 53 41.51013 1625.01508 57.30665 2219.25878 11241396 15519266 17 1.13394 44.03152 57.30665 2219.25878 307084 15519266 bbsplit.sh ref=17.fasta,53.fasta \ in=53_30_r1_S7_R1_001.fastq.gz in2=53_30_r1_S7_R2_001.fastq.gz \ out_17=map17_53_30_r1_S7_R#_001.fastq.gz \ out_53=map53_53_30_r1_S7_R#_001.fastq.gz \ refstats=53_30_r1_S7.stats2 ambiguous2=toss #name %unambiguousReads unambiguousMB %ambiguousReads ambiguousMB unambiguousReads ambiguousReads 53 21.37940 838.36051 67.54242 2623.22348 5789774 18291224 17 11.02890 426.72088 67.54242 2623.22348 2986746 18291224
Last edited by GenoMax; 08-20-2018, 08:03 AM.
Comment
-
Contamination from human genome?
Hi,
I am working on non-model fish RNA-seq data, I am considering remove human contamination from reads, is this feasible since there is number of orthologs between human and fish?
Is there any recommendation regarding choice of "-minratio" for this case? It seems that 0.56 maybe too low? (I don't have reference genome for this non-model fish, by the way)
P.s: I think there should be different usage strategy of sensitivity or specificity for the case of binning (having 2 reference, i.e host vs contaminant, both have comparative alignment score to judge) AND for the case of decontaminating (only have the reference of contaminant, judgement only based on alignment to contaminant reference).
Thank you very much for your suggestion !
Comment
-
Question about BBsplit ambig2=toss and bam files
Hello!
I am using BBsplit to separate reads from a paired-end three-species bacterial RNASeq project. I set the flag ambig2=toss but then see this sentence in the print out for the code:
"Retaining first best site only for ambiguous mappings."
To me, that looks like default ambiguous=best. Is that what I should be seeing? How do I know if the ambiguous reads are being tossed?
Additionally, I am mapping directly into a bam file. From earlier posts, looks like BBsplit bam files are incompatible with IGV but would they be okay with a feature counter like HTseq or edgeR?
Thanks very much,
Amanda
Comment
-
@Amanda: I will need to dig through some past correspondence with Brian but I think he had recommended splitting first and then mapping to avoid the problem of having all references present in the BAM file. Which indeed causes issues with visualization programs.
If you look at the in-line help for "ambiguous2" you can see what it is doing:
Code:ambiguous2=<best> Set behavior only for reads that map ambiguously to multiple different references. Normal 'ambiguous=' controls behavior on all ambiguous reads; Ambiguous2 excludes reads that map ambiguously within a single reference.
Comment
-
Hi there,
I am trying to run BBSplit on a huge chr-level assembled reference genome (~24Gb) and its non-chr-level-assembled contigs (ca. 1Gb) with the following command on remote server (I specify the maximum memory use in the server as 64G).
bbsplit.sh -Xmx40g ambiguous=toss ambiguous2=toss in1=HKs_fq/HK002_L1_1_trimmed.fastq.gz in2=HKs_fq/HK002_L1_2_trimmed.fastq.gz ref=P.tabuliformis_V1.0_contig.fa,P.tabuliformis_V1.0_chr.fa basename=out_%_#.fq.gz
But the merging reference step produces much smaller (8Gb) fasta, and the mapping step also produce warning/error as follows:
Exception in thread "main" java.lang.AssertionError: Resizing to an non-longer array (2147483627); probable array size overflow.
at structures.ByteBuilder.expand(ByteBuilder.java:606)
at structures.ByteBuilder.append(ByteBuilder.java:379)
at dna.FastaToChromArrays2.nextScaffold(FastaToChromArrays2.java:539)
at dna.FastaToChromArrays2.makeNextChrom(FastaToChromArrays2.java:460)
at dna.FastaToChromArrays2.makeChroms(FastaToChromArrays2.java:345)
at dna.FastaToChromArrays2.main2(FastaToChromArrays2.java:153)
at align2.RefToIndex.makeIndex(RefToIndex.java:147)
at align2.BBMap.setup(BBMap.java:280)
at align2.AbstractMapper.<init>(AbstractMapper.java:58)
at align2.BBMap.<init>(BBMap.java:42)
at align2.BBMap.main(BBMap.java:30)
at align2.BBSplitter.main(BBSplitter.java:48)
---------------------------------
Is there anyway for me to handle this large genome and proceed adequate merging and mapping?
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-22-2024, 07:36 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:36 AM
|
||
Started by seqadmin, 11-22-2024, 07:04 AM
|
0 responses
76 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:04 AM
|
||
Started by seqadmin, 11-21-2024, 09:19 AM
|
0 responses
75 views
0 likes
|
Last Post
by seqadmin
11-21-2024, 09:19 AM
|
||
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
319 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
Comment