Would it be possible to post an image from IGV, or some kind of text diagram? I can't quite visualize what you're describing.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Sure Brian, here it is! A screenshot from Geneious on an alignment obtained with BBMap as
Executing align2.BBMap [in1=177291_Brachy_300_Brachypodium_distachyon_2x_R1.fastq, in2=177291_Brachy_300_Brachypodium_distachyon_2x_R2.fastq, out=Bdi_1H_AK250130_mapped_bis.sam,
ref=.\resources\Brachypodium_distachyon_2x_AH_AK250130.fasta, K=9, build=3, maxindel=30, pairlen=600, minid=0.9, padding=20, overwrite=t]
"A picture is worth a thousand words"
BestAttached Files
Comment
-
@babine: What do you mean by "not-so-accurate" reference? Is the reference as depicted in the picture wrong (i.e. there are two ends that are joined that should not be together)? Note: I don't use geneious but I assume one of the lines at the top represents the reference (and the other a consensus?)
Comment
-
I wonder if you should manually put a break (stretch on N's in that position) to force the two ends apart. While that may resolve this "end" of the reads issue not sure if that is how things are in the real genome.
Are you certain your reads (R1/R2) do not overlap? If they could then you could merge them first before alignment.
Are the reads in the image corresponding pairs (R1/R2)?
Comment
-
I wonder if maybe a polishing program like Quiver could automatically fix the reference for you. That would be the ideal solution. The problem, of course, is that there are misassemblies or major structural variations with respect to the reference. Hmmm...
I think there are 3 main options.
1) Map with the "local" flag. This will soft-clip the part of the read that extends across the misassembly, so it will not result in spurious variations being called. That's certainly the easiest approach.
2) Try polishing the assembly with something like Quiver, to automatically fix these things.
3) Make a new assembly, if this genome differs substantially enough from the reference that this issue is common.
Comment
-
Hi everyone,
I wanted to mention that there is now some additional documentation for BBTools in the form of guides/tutorials, in the /docs/ folder. Currently there are guides for BBDuk, BBMerge, Seal, Tadpole, Reformat, Dedupe, BBNorm, and Taxonomy, and I plan to add more in the near future. There's also an overview of the general usage of all BBTools (UsageGuide.txt) and a list of all the commonly-used tools with a brief description (ToolDescriptions.txt).
I hope these will be useful, and please let me know if anything is unclear or needs to be expanded, or there is a common use that the guides don't address.
-Brian
Comment
-
Hello, I am working with genomic data belonging to mammals. I have been aligning raw reads to all the genes within an organism's chromosomes. While working with this data set, I have noticed some odd values when calculating coverage with BBTool's built-in pileup feature. I would really appreciate it if anyone knew how to interpret this.
Here is the issue:
Data being used:
-CDS (W/ Introns, not spliced) of genes extracted from a Chromosome Genbank file (found @ NCBI ftp://ftp.ncbi.nlm.nih.gov/genomes/P...odytes/CHR_01/).
-Raw reads collected from ENA (Adapter free WGS sequences)
BBMap Commandline used to calculate coverage:
bbmap.sh in1=Chimp_1.fastq in2=Chimp_2.fastq ref=ChimpChrom1.fa local=t nodisk covstats=Chrom1Stats.txt covhist=Chrom1Hist.txt
Results for the chimp. The results for all other mammals I've worked with are very similar.
Coverage - 43.97
Standard Deviation - 507.47
I proceeded to look at the statistical output produced by BBMap to see if the SD values were caused by specific gene sequences; there I saw that a lot of genes had median folds equal to 0, others that had surpassed 1,000, and even some with negative values (-1). Why does this happen, is this normal? If not, is there a way to fix this?
I have attached the stats file produced by BBMap on this post if anyone would like to see how the output looks like. I took the liberty of parsing the file based on median fold values (equal to or more than 100), only keeping the columns belonging to gene ID and Median Fold. On a follow-up post, I will add attach a text file containing the low median fold values (anything below 100,).
Note: The coverage and SD tendency remains, even when using a reference fasta containing only the longest isoform per gene (1 isoform per gene).Attached FilesLast edited by JulesWinchester; 12-19-2015, 02:46 AM.
Comment
-
@Brian: I am having trouble using dedupe.sh with paired end reads (using BBMap v.35.59).
Code:Unknown parameter out1=sampleID_L001_R1_001.fastq.gz at jgi.Dedupe.<init>(Dedupe.java:383) at jgi.Dedupe.main(Dedupe.java:80)
Code:$ dedupe.sh in1=read1.fq in2=read2.fq out1=x1.fq out2=x2.fq ac=f
Comment
-
Originally posted by JulesWinchester View PostHello, I am working with genomic data belonging to mammals. I have been aligning raw reads to all the genes within an organism's chromosomes. While working with this data set, I have noticed some odd values when calculating coverage with BBTool's built-in pileup feature. I would really appreciate it if anyone knew how to interpret this.
Comment
-
Originally posted by Brian Bushnell View PostThanks for finding that! Looks like a typo crept in; I'll fix that in the next release. In the mean time, you can use "out=" instead of "out1=".
Comment
-
Originally posted by GenoMax View PostFor now I only want an estimation of duplicates for some HS4000 paired end data so I will just omit "out=". I assume a single "out=" will result in an interleaved file (for PE input) that would have to be resolved afterwards?
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
50 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment