Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Thanks for noting that; I've corrected it. Originally, for error-correction, you had to say "oute" for the output but I changed that a while ago so the current syntax is "out1" and "out2".
-
Since you are pulling the kmer information from randomly-sheared metagenomic reads, and using the variable region as a seed, it should be safe. However, there is a possibility of running into a tandem repeat region during extension when you give it a very high extension limit, so I recommend also adding the flag "ibb=f". It probably won't make any difference though.
Leave a comment:
-
I just saw this comment from Brian Bushnell:
P.S. DO NOT use read-extension or error-correction for metagenomic 16S or other amplicon studies! It is intended only for randomly-sheared fragment libraries. Error-correction or read-extension using any algorithm are a bad idea for any amplicon library with a long primer. For normal metagenomic fragment libraries, these operations should be useful and safe if you specify a sufficiently long K.
I recently used Tadpole for contig extension. After reading this comment, I am not sure Tadpole is a reliable tool for this. I have full-length 16S rRNA gene sequences from clone libraries. I took the variable region of 16S rRNA gene sequences (first 300 nt nucleotides) and mapped on metagenome contigs which are taxonomically classified as taxa of interest. These metagenome contigs were assembled from randomly-sheared metagenomic fragment libraries (sequencing platfrom: HiSeq). 300 nt nucleotide of 16S rRNA gene sequence perfectly mapped on a contig. I then run Tadpole to extend this contig using all pair-end reads in this metagenome. I could extend the contig up to 6500 nucleotide.
Is Tadpole reliable for such usages?
Leave a comment:
-
Hi gringer, thanks, I didn't realize the multiple fastq files wasn't concatenated properly before inputting them to tadpole. I'll fix it and try again.
Leave a comment:
-
Originally posted by gringer View PostFrom the error messages, your input read files are corrupted:
Do you have any other program that can read the files to check for errors? At the very least, a simple count of lines would be useful:
Code:wc -l read1.corrected.fq.gz; wc -l read2.corrected.fq.gz
Code:zcat read1.corrected.fq.gz | wc -l; zcat read2.corrected.fq.gz | wc -l
Code:gzip -t read1.corrected.fq.gz; gzip -t read2.corrected.fq.gz
Code:gzip: x.gz: unexpected end of file
Leave a comment:
-
From the error messages, your input read files are corrupted:
...Unexpected end of ZLIB input stream
...Unexpected end of ZLIB input stream
...The pairing may have been corrupted by an upstream process.
Code:wc -l read1.corrected.fq.gz; wc -l read2.corrected.fq.gz
Leave a comment:
-
Hi, I am using tadpole from BBMap v36.11 and received some java error messages. The command I used is:
tadpole.sh -Xmx1000g in1=read1.corrected.fq.gz in2=read2.corrected.fq.gz out=contigs.fa.gz
Code:java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) at fileIO.ByteFile1.fillBuffer(ByteFile1.java:217) at fileIO.ByteFile1.nextLine(ByteFile1.java:136) at fileIO.ByteFile2$BF1Thread.run(ByteFile2.java:264) open=true java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) at fileIO.ByteFile1.fillBuffer(ByteFile1.java:217) at fileIO.ByteFile1.nextLine(ByteFile1.java:136) at fileIO.ByteFile2$BF1Thread.run(ByteFile2.java:264) open=true Exception in thread "Thread-4" java.lang.AssertionError: There appear to be different numbers of reads in the paired input files. The pairing may have been corrupted by an upstream process. It may be fixable by running repair.sh. at stream.ConcurrentGenericReadInputStream.pair(ConcurrentGenericReadInputStream.java:480) at stream.ConcurrentGenericReadInputStream.readLists(ConcurrentGenericReadInputStream.java:345) at stream.ConcurrentGenericReadInputStream.run(ConcurrentGenericReadInputStream.java:189) at java.lang.Thread.run(Thread.java:745)
Leave a comment:
-
Originally posted by JeanLove View PostHi! Although this is a very helpful post, there is still so much unclarities about Tadpole.
As input, I have fasta file with ~11 mil reads, and I would like for Tadpole to take 50% random reads from this fasta file.
I see that there is reads=-1 input parameter, where -1 (default) stands for all reads, so I am curious what could stand for 50% etc? Cannot find any info on this parameter, so I would appreciate any help!
If you want to independently subsample the reads you could also use reformat.sh or seqtk (sample) from Heng Li.
Leave a comment:
-
Hi! Although this is a very helpful post, there is still so much unclarities about Tadpole.
As input, I have fasta file with ~11 mil reads, and I would like for Tadpole to take 50% random reads from this fasta file.
I see that there is reads=-1 input parameter, where -1 (default) stands for all reads, so I am curious what could stand for 50% etc? Cannot find any info on this parameter, so I would appreciate any help!
Leave a comment:
-
Originally posted by Brian Bushnell View PostThis works well with error-corrected PacBio reads, which have a very unbiased coverage distribution. It was able to isolate and assemble the mitochondria in 4/4 fungi tested. I'm not sure how well it works on Illumina reads, though. Also, note that the default k=100 is designed for reads of at least 150bp.
Leave a comment:
-
Sure:
Code:#First link reference as ref.fa and reads as reads.fa /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/kmercountexact.sh in=reads.fq.gz khist=khist_raw.txt peaks=peaks_raw.txt primary=`grep "haploid_fold_coverage" peaks_raw.txt | sed "s/^.*\t//g"` cutoff=$(( $primary * 3 )) bbnorm.sh in=reads.fq.gz out=highpass.fq.gz pigz passes=1 bits=16 min=$cutoff target=9999999 reformat.sh in=highpass.fq.gz out=highpass_gc.fq.gz maxgc=0.45 kmercountexact.sh in=highpass_gc.fq.gz khist=khist_100.txt k=100 peaks=peaks_100.txt smooth ow smoothradius=1 maxradius=1000 progressivemult=1.06 maxpeaks=16 prefilter=2 mitopeak=`grep "main_peak" peaks_100.txt | sed "s/^.*\t//g"` upper=$((mitopeak * 6 / 3)) lower=$((mitopeak * 3 / 7)) mcs=$((mitopeak * 3 / 4)) mincov=$((mitopeak * 2 / 3)) tadpole.sh in=highpass_gc.fq.gz out=contigs78.fa prefilter=2 mincr=$lower maxcr=$upper mcs=$mcs mincov=$mincov k=78 tadpole.sh in=highpass_gc.fq.gz out=contigs100.fa prefilter=2 mincr=$lower maxcr=$upper mcs=$mcs mincov=$mincov k=100 tadpole.sh in=highpass_gc.fq.gz out=contigs120.fa prefilter=2 mincr=$lower maxcr=$upper mcs=$mcs mincov=$mincov k=120 bbduk.sh in=highpass.fq.gz ref=contigs100.fa outm=bbd005.fq.gz k=31 mm=f mkf=0.05 tadpole.sh in=bbd005.fq.gz out=contigs_bbd.fa prefilter=2 mincr=$((mitopeak * 3 / 8)) maxcr=$((upper * 2)) mcs=$mcs mincov=$mincov k=100 bm1=6 ln -s contigs_bbd.fa contigs.fa
Leave a comment:
-
Originally posted by Brian Bushnell View PostDon't feel like you have to use all aspects of Tadpole in order to use it effectively! I am currently using it for mitochondrial assembly also, because it's easy to set a specific depth band to assemble, and thus pull out the mito without the main genome after identifying it on a kmer frequency histogram (in fact, I wrote a script to do this automatically). But in that case I don't actually use the error-correction or extension capabilities, as they are not usually necessary as the coverage is already incredibly high and low-depth kmers are being ignored. I use those more for single-cell work, which has lots of very-low-depth regions.
Leave a comment:
-
Hi, Brian,
Thanks very much for your quick feedback. That does help, thank you. I was able to specify reads=1000 and that approximated the 10% for some test cases. I took a quick shot at normalizing, and I'm having some issues with it, but they may have more to do with Geneious than anything.
Thanks again!
Rob
Leave a comment:
-
Hi Rob,
Tadpole does not do subsampling; you'd have to first sample 10% of the reads with another tool (such as Reformat, with the "samplerate=0.1" flag). However, you CAN restrict the input to a certain number of reads. "reads=400k", for example, will only use the first 400,000 reads (or if the reads are paired, the first 400,000 pairs).
Tadpole also has "mincr" and "maxcr" flags which allow you to ignore kmers with counts outside of some band, and is sometimes useful for ignoring low-depth junk than can occur when you have too much data:
mincountretain=0 (mincr) Discard kmers with count below this.
maxcountretain=INF (maxcr) Discard kmers with count above this.
Good luck!
Leave a comment:
-
Hi, Brian,
Thanks for building this amazing assembler. I have been using it from within the Geneious software package for a few weeks now with great success. Quick question: is it possible to enter a command line under 'custom Tadpole options' that will cause the assembler to only use a specific percentage of the data? Also, is it possible to specify usage of maximum number of reads when performing the assembly (as opposed to a percent)?
I ask because I am trying to assembly a large number of plasmids from MiSeq FASTQ data, and I've found that the results are much better when only using 10% of the data. However, there is a bug wherein this option is missing from the workflow version of Tadpole; and another where it defaults to 100% when attempting to assemble each sequence list separately. They know about the first bug and are fixing it, but I thought there might be a quick workaround via the command line.
Thanks!
Rob
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
[Article Coming Soon!]...-
Channel: Articles
Yesterday, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
1 response
31 views
0 likes
|
Last Post
by EmiTom
Yesterday, 06:46 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: