I would caution against blindly filtering your metagenomic data against any database of contaminants. Ideally, you would have negative controls run alongside your samples that could be checked for the presence of these contaminants instead....
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I won't say either one of you is right or wrong, and negative controls are always a good idea. But, the reason I put together the bacterial contaminant file is because JGI is not capable of distinguishing between actual samples, and contaminants in that file. With sufficient amplification (like single cells), some wells may have high levels of a contaminant that is zero in other wells, since it only takes one particle. Some of them, like Pseudomonas, are present in reagents. Others, like E.coli, are present on human skin, and often make their way into the libraries.
I have seen dozens of posters that incorrectly claim Pseudomonas or various other common contaminants are endemic to some environment. But it's likely artifact of poor quality control. So, I encourage you to be very cautious.
*Edit. For reference, JGI no longer sequences anything on that list.Last edited by Brian Bushnell; 07-17-2017, 08:58 AM.
Comment
-
RE: filtering soil metagenomics data
Thank you, fanli and Brian! I guess one possibility is not to filter initially, but then check the final assembly against the contaminant files just to find out if some of the species that I am detecting are known contaminants?
Comment
-
Hi Brain,
This might be a silly question, but do we need to index the reference every time for every sample?
Thanks!Last edited by yang zhang; 08-27-2017, 11:27 AM.
Comment
-
Originally posted by yang zhang View PostHi Brain,
This might be a silly question, but do we need to index the reference every time for every sample?
Thanks!Code:bbmap.sh in=ref.fa
In future when you want to use this index replace "ref=" with "path=/path_to_directory_containing_ref_folder" in your command line.
Comment
-
Originally posted by Brian Bushnell View Post
I downloaded the files above and have successfully indexed the cat and dog file. I tried the command:
bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits qtrim=rl trimq=10 untrim Xmx23g in=cleanAR1.fastq.gz outu=clean2AR1.fastq.gz outm=catAR1.fq
and got the following error:
Exception in thread “main” java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:542)
at java.lang.Integer.parseInt(Integer.java:615)
at align2.AbstractMapper.parse(AbstractMapper.java:449)
at align2.AbstractMapper.<init>(AbstractMapper.java:54)
at align2.BBMap.<init>(BBMap.java:42)
at align2.BBMap.main(BBMap.java:30)
The same happened when I tried the dog masked file. I was able to successfully remove the human contamination using the masked file you provided and the commands above. Is this command not applicable to the cat, dog, and mouse files you provided? Is there an extra step I am missing? I am not versed in java so I don't know how to interpret the error.
coyk
Comment
-
Hi Brian,
Just reviving an old thread here. I have been testing out a lot of different methods to clean human reads and I really love BBMap because it's such a well thought-out program. However, when I try to clean human reads with the settings you have specified, I routinely get a ton of reads remaining - upwards of 70% (so only 30% are cleaned). I have tried to adjust the various parameters, but the only thing that seems to make a difference for depletion is the 'minid' setting. Setting that at 0.50 (which is *very* low) depletes around 95% of reads. As a comparison, a default run with bwa mem depletes 100%.
Any idea how I might get BBMap to more accurately deplete human reads?
Comment
-
"bbsplit.sh" is a general purpose tool that will bin reads into any number of bins (depending on the reference sequences provided, you can provide as many as you want). In this case you would provide human_genome.fa (and any other reference you want to use). If you only use human then reads not mapping to human genome will be collected in other bin.
Comment
-
Hello Brian,
I am trying to use removehuman.sh on MSU HPCC.
Inputs (*_filtered.fastq.gz files) are phix filtered R1 and R2 files using BBduk as following: [leejooy5@dev-intel18 filtered_reads]$ bbduk.sh -Xmx10g in1=NFW_R1_trimmed.fastq.gz in2=NFW_R2_trimmed.fastq.gz out1=NFW_R1_filtered.fastq.gz out2=NFW_R2_filtered.fastq.gz ref=/opt/software/BBMap/37.93-foss-2018a/resources/phix174_ill.ref.fa.gz k=31 hdist=1 stats=GR25_stats.txt threadS=8)
As shown in below, error message "Exception in thread "main" java.lang.RuntimeException: Can't find file /global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt
" popped up when I tried to run "removehuman.sh". I tried without additional parameters such as -Xmx and threads, but same error happened. Also, I tried to find the find the
file "/global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt", but I couldn't. Could you tell me what mistake I did or let me know where I can find a solution? Thank you for your time and consideration.
Cheers,
Joo-Young
=======================
[leejooy5@dev-intel18 filtered_reads]$ removehuman.sh -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8
removehuman.sh -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8
java -Djava.library.path=/opt/software/BBMap/37.93-foss-2018a/jni/ -ea -Xmx10g -cp /opt/software/BBMap/37.93-foss-2018a/current/ align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/global/projectb/sandbox/gaag/bbtools/hg19 pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount usejni ztd=2 kfilter=25 maxsites=1 k=14 -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8
Executing align2.BBMap [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/global/projectb/sandbox/gaag/bbtools/hg19, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, usejni, ztd=2, kfilter=25, maxsites=1, k=14, -Xmx10g, in1=NFW_R1_filtered.fastq.gz, in2=NFW_R2_filtered.fastq.gz, out1=NFW_R1_clean.fastq.gz, out2=NFW_R2_clean.fastq.gz, threads=8]
Version 37.93 [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/global/projectb/sandbox/gaag/bbtools/hg19, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, usejni, ztd=2, kfilter=25, maxsites=1, k=14, -Xmx10g, in1=NFW_R1_filtered.fastq.gz, in2=NFW_R2_filtered.fastq.gz, out1=NFW_R1_clean.fastq.gz, out2=NFW_R2_clean.fastq.gz, threads=8]
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.650
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.900
Set threads to 8
Retaining first best site only for ambiguous mappings.
Exception in thread "main" java.lang.RuntimeException: Can't find file /global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt
at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:906)
at fileIO.ReadWrite.getInputStream(ReadWrite.java:871)
at fileIO.TextFile.open(TextFile.java:227)
at fileIO.TextFile.<init>(TextFile.java:71)
at dna.Data.setGenome2(Data.java:822)
at dna.Data.setGenome(Data.java:768)
at align2.BBMap.loadIndex(BBMap.java:313)
at align2.BBMap.main(BBMap.java:32)
Comment
-
@jylee: "/global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt" appears to refer to a location on JGI servers (if that is not your own). You will need to download and provide hg19 reference sequence. You can pre-index the genome with BBMap to use with path= or use ref= option to point to the genome sequence multi-fasta file location.
Comment
Latest Articles
Collapse
-
by seqadmin
The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...-
Channel: Articles
08-27-2024, 04:44 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 09-06-2024, 08:02 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
09-06-2024, 08:02 AM
|
||
Started by seqadmin, 09-03-2024, 08:30 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-03-2024, 08:30 AM
|
||
Started by seqadmin, 08-27-2024, 04:40 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
08-27-2024, 04:40 AM
|
||
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics
by seqadmin
Started by seqadmin, 08-22-2024, 05:00 AM
|
0 responses
382 views
0 likes
|
Last Post
by seqadmin
08-22-2024, 05:00 AM
|
Comment