You may simply need longer "reads". For example, PacBio reads, or MiSeq 300bpx2 with an average insert of around 500bp merged with bbmerge. Since you already have 300bp reads, it may be useful to merge them into consensi before using them, assuming your insert size is appropriate. Of course, reads with low-complexity sequence will not merge very well. Fortunately... low-complexity sequence (like ACACACAC...ACAC) is not very important for most purposes.
Have you tried ALE? If it doesn't work I can write something that will plot the pairing distance as a function of position, but ALE should already do this extremely well.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Additional information
Thank you for the suggestions. Let me give you some more detail, maybe somebody encountered similar problem. We are trying to fill in some gaps in the chicken genome sequence. Those gaps do contain genes but are extremely GC rich. EVEN WITH ~1000x COVERAGE (!) OF THE CHICKEN GENOME we cannot extend our sequence into the gaps by using the paired reads. What happens that as we get closer to the boundary of the difficult sequence, the paired distances diminish and at the border the paired reads basically overlap each other, so there is not chance to walk into the gap. Just by inspecting few genomic regions, I can see this happening repeatedly. So I wanted to visualize this better by plotting the paired distances (not on the entire genome, just on selected genomic regions, on the order of several kbp).
Leave a comment:
-
ALE (Assembly Likelihood Estimator) may be useful here. It can detect areas where the apparent insert size differs from usual, and generates a histogram that you can view, as well as probabilities that the reference represents the reads. It accepts an assembly (fasta) file and set of constituent mapped reads (sam), to generate output.
Leave a comment:
-
To save a bit of coding time trying to get around the intricacies of pysam, I would use samtools view and awk to generate the numbers:
Code:samtools view input.bam | \ awk -v 'OFS=,' 'BEGIN{print "RName,RPos,TLen"} {if($9>0){print $3,$4,$9}}' | \ gzip > template_lengths.csv.gz
Code:data.df <- read.csv("template_lengths.csv.gz"); pdf("MyTemplateLengthPlots.pdf",width=20,height=8); for(rName in levels(data.df$RName)){ sub.df <- subset(data.df, RName == rName); plot(sub.df$Rpos, sub.df$TLen, main = rName); } graphics.off()
Last edited by gringer; 04-02-2014, 02:18 AM.
Leave a comment:
-
I don't know of any premade packages to do that. However, you could use pysam or Rsamtools to iterate through alignments crossing each position and average the TLEN values. I don't recall if the underlying bam_fetch function will return both mates in a pair, but it so make sure to include only one (based on sign of TLEN), and if not you'll have to handle things accordingly.
You could also do this in C or java if you prefer, though it seems most people tend to prefer python or R.
Leave a comment:
-
Plot of NGS paired end distance at each genome position
Hi,
could anybody help in the following: standard Illumina paired end 300 bp NGS library is mapped on human genome. I would like FOR EACH GENOMIC POSITION to plot the average size of fragments that map across that position. The reason is, we see a huge decrease of fragments from the regular 300 down to ~100 in difficult GC rich areas.
Thank you,
DanielTags: None
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: