Originally posted by tinacai
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Zee,
Thanks for the suggestion. However, I am reticent to automate anything about the process as it very much depends on the quality and variability of the user data, the organism, etc. I think it's best for me to merely improve the documentation and give thorough explanations of why things are done the way they are.
Aaron
Comment
-
Aaron,
This is a samtools question but since the o/p is going into Hydra, I thought of posting here.
Following the workflow on http://code.google.com/p/hydra-sv/wiki/TypicalWorkflow, for extracting the discordant reads using
samtools view -uF 2 sample.tier1.bam | \
bamToFastq -bam stdin \
-fq1 sample.tier1.disc.1.fq \
-fq2 sample.tier1.disc.2.fq
Should we see exact same number of reads (and identical pairwise read Ids) in the 1.fq and 2.fq file?
For 1.fq and 2.fq files I have, I don't see corresponding match of read IDs. Do
you require BAM to be sorted based on read id?
Thanks in advance.Last edited by wdt; 02-08-2011, 12:17 PM.
Comment
-
Originally posted by wdt View PostAaron,
This is a samtools question but since the o/p is going into Hydra, I thought of posting here.
Following the workflow on http://code.google.com/p/hydra-sv/wiki/TypicalWorkflow, for extracting the discordant reads using
samtools view -uF 2 sample.tier1.bam | \
bamToFastq -bam stdin \
-fq1 sample.tier1.disc.1.fq \
-fq2 sample.tier1.disc.2.fq
Should we see exact same number of reads (and identical pairwise read Ids) in the 1.fq and 2.fq file?
For 1.fq and 2.fq files I have, I don't see corresponding match of read IDs. Do
you require BAM to be sorted based on read id?
Thanks in advance.
I have updated the workflow to indicate that bamToFastq expects query-ordered BAM files.
Best,
Aaron
Comment
-
missing reads
Aaron,
I have merged bam file resulting from multiple lanes of paired end alignments. when I extracted fastq from the alignemnt . I have unequal reads in pair1 and 2. when i examined the reads in bam file. I could see for some, there is only one mate (either fwd oir reverse) only aligned and other missing. Do I need to append missing read from the raw read lane? or exclude them from the analysis?.
Comment
-
get discordant pairs
Originally posted by gpcr View PostAaron,
I have merged bam file resulting from multiple lanes of paired end alignments. when I extracted fastq from the alignemnt . I have unequal reads in pair1 and 2. when i examined the reads in bam file. I could see for some, there is only one mate (either fwd oir reverse) only aligned and other missing. Do I need to append missing read from the raw read lane? or exclude them from the analysis?.
samtools view -hb -F 1038 orig.bam > discordant.bam
you get reads that have neither flag 2 (proper pair) nor flag 4 (read itself unmapped) nor flag 8 (mate unmapped) nor flag 1024 (is duplicate).
If the number of reads1 and reads2 is still not equal, maybe your aligner messed up? As Aaron wrote above, you need to have exactly one alignment per read.
By the way, the above also works for coordinate-sorted BAMs. Afterwards you just have to namesort discordant.bam with samtools sort -n option.
Comment
-
breakpoints
Does hydra also utilize partially mapped reads ?
I see a lot of softclipped alignments in my bwa aligned sam file. I am wondering whether this information is used when searching for breakpoints.
Edit1:
It seems that softclipped reads are implicitly used, since their edit distance is often abnormal after clipping. (depending from which end it is clipped, the outer or the inner distance changes...)
To me it seems: simply by looking at softclipped reads it might be possible to detect the exact breakpoint position (at single nucleotide resolution). Why nobody use it ? Do I miss something ?Last edited by plichel; 05-19-2011, 08:28 AM.
Comment
-
identify breakpoints with softclipped reads
Originally posted by plichel View PostTo me it seems: simply by looking at softclipped reads it might be possible to detect the exact breakpoint position (at single nucleotide resolution). Why nobody use it ? Do I miss something ?
Comment
-
I guess it will greatly depend on coverage and uniqueness of the alignment.
Suppose you have 30x and see indeed at a particular position 30 ore more softclipped reads supporting the breakpoint and the mapper/aligner doesnt report multiple hits. What could lead here to a false positive conclusion ?
Comment
-
The CREST algorithm described here:
Uses soft-clipped reads, but requires reads of 75bp or longer (so is out for my SOLiD project). It doesn't appear to use discordance information though, so there may be some benefit in using multiple tools as a part of a workflow.
Comment
-
I am trying to run with the typical workflow of hydra, but after tier 3 I am getting stuck with the script "pairDiscordants.py".
I am getting:
Traceback (most recent call last):
File "/usr/local/bin/pairDiscordants.py", line 294, in <module>
sys.exit(main())
File "/usr/local/bin/pairDiscordants.py", line 290, in main
pairReads(opts.inFile, opts.numMappings, opts.order, opts.dist, opts.minSpan, opts.minConcRange, opts.maxConcRange, opts.mode, opts.anchorThresh, opts.multiThresh, opts.editSlop)
File "/usr/local/bin/pairDiscordants.py", line 28, in pairReads
printHydraMappings(pairs, editDistance, editSlop)
UnboundLocalError: local variable 'editDistance' referenced before assignment
The input file looks like:
CHROMOSOME_II 9172809 9172833 000_1000_1326_R3/2 0 -
CHROMOSOME_IV 1641291 1641314 000_1000_171_F3/1 0 +
The command was:
> cat result | pairDiscordants.py -i stdin -m hydra -z 800
What am I doing wrong?
Comment
Latest Articles
Collapse
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
-
by seqadmin
The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...-
Channel: Articles
08-27-2024, 04:44 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 06:25 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Today, 06:25 AM
|
||
Started by seqadmin, Yesterday, 01:02 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 01:02 PM
|
||
Started by seqadmin, 09-18-2024, 06:39 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-18-2024, 06:39 AM
|
||
Started by seqadmin, 09-11-2024, 02:44 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-11-2024, 02:44 PM
|
Comment