Seqanswers Leaderboard Ad

**Nino** · 12-11-2013, 09:07 AM

Hello Kaiye,

I am running pindel on DNAseq whole exomes data from BWA output and out of the four samples I have one of them gives me segmentation fault. The other three run fine without any issues any idea why?

Thanks,
Nino

**KaiYe** · 12-11-2013, 09:15 AM

Originally posted by Nino View Post

Hello Kaiye,

I am running pindel on DNAseq whole exomes data from BWA output and out of the four samples I have one of them gives me segmentation fault. The other three run fine without any issues any idea why?

Thanks,
Nino

your pindel version?

**Nino** · 12-11-2013, 09:19 AM

Pindel version 0.2.5a2, September 17 2013.

**KaiYe** · 12-11-2013, 09:26 AM

Originally posted by Nino View Post

Pindel version 0.2.5a2, September 17 2013.

can you either provide your bam or isolate the regions causing the error? [email protected]

**Nino** · 12-11-2013, 09:53 AM

I have sent you an email please look out for a med.cornell.edu email.

Thanks,
Nino

**AlisonF** · 02-10-2014, 09:36 AM

Hi,

I am running Pindel version 0.2.5 on a set of samples with the command:
pindel -T 18 -f human_g1k_v37.fa.txt -i config070.txt -c ALL -L TESTconfig070.txt.log -o TESTconfig070.txt.out

It ran for a couple of minutes and gave an error message:
Error: chromosome with name : NC_007605 not yet loaded into memory. Aborting.

I am not sure what it means. Anyone has suggestion on how to solve the problem? Thank you.

**KaiYe** · 02-10-2014, 11:09 AM

your bam was aligned against a different reference genome. NC_007605 is in your bam file but not in the reference provided. Pindel is looking for interchromosomal split-reads but does not find the chr sequence specified in the mapping data.

**Topulaneus-Hattum** · 02-10-2014, 12:12 PM

I am new to pindel and am running version 0.2.5a3. I have large insert Illiumina data, with inserts in the range 6k to 10K. I have indicated 8000 in my config file but my first question is how does pindel know what the distribution is? Does it recover the distribution from the alignments?

My second question is that I am getting MANY warnings that look exactly like this:
warning: currentState.Reads_RP_Discovery[read_index].InsertSize 8000
Can I ignore these or are they telling me something is wrong?

T.Hattum

**KaiYe** · 02-10-2014, 12:29 PM

Originally posted by Topulaneus-Hattum View Post

I am new to pindel and am running version 0.2.5a3. I have large insert Illiumina data, with inserts in the range 6k to 10K. I have indicated 8000 in my config file but my first question is how does pindel know what the distribution is? Does it recover the distribution from the alignments?

My second question is that I am getting MANY warnings that look exactly like this:
warning: currentState.Reads_RP_Discovery[read_index].InsertSize 8000
Can I ignore these or are they telling me something is wrong?

T.Hattum

are you working on mate-pair data? you'd better to extract reads with the provided sam2pindel, then compute.

please ignore the warnings.

**Topulaneus-Hattum** · 02-10-2014, 12:49 PM

Hi, thanks for your very fast reply!

I'm not sure about your question. The data is pairs, two 150 bp reads expected to be about 8K apart. Does that fit the description of mate-pair?

My reference is human, and before I run the entire genome I tried a single chromosome test for chr22. My input is mappings from BWA-mem, as a single positionally sorted BAM file (≈30G bytes). Looking at the user manual, I am following step 1 option 1 which appears to indicate I can use my BAM file directly. Option3 discusses sam2pindel but the context there is for aligners other than BWA. Are you suggesting that I should use option 3 because I have long inserts?

**KaiYe** · 02-10-2014, 01:33 PM

the orientation of the reads differ between paired-end and mate-pair. normally mate-pair data has longer insert, in a range as your data. please make sure it is paired-end library. Pindel assumes paired-end data.

**Topulaneus-Hattum** · 02-10-2014, 02:05 PM

"orientation of the reads differ between paired-end and mate-pair"

I tried to find something that explains the difference online. I found a SeqAnswer that described some physical difference in the process but didn't elaborate on how this would affect orientations.

I understand that reads from opposite ends of the same molecule, read along different strands, will result in pairs that have opposite orientation (in the absence of any SV). And examining the orientations of the alignments for my pairs, my pairs have opposite orientations.

But I still don't know whether my data is paired-end or mate-pair (because I don't understand what the terms mean, except that they mean two things). Am I off base in thinking this is something I can determine just from looking at my data? Or do I need to go back to the people that did the sequencing and ask them? OR is it more than just that the orientations are opposite, that +- is different than -+?

Apologizing for my ignorance in this matter. And thanks very much for educating me.
T.Hattum

**Topulaneus-Hattum** · 02-11-2014, 08:09 AM

"orientation of the reads differ between paired-end and mate-pair"

That statement confuses me now, because these two descriptions from Illumina appear to indicate that read orientations are the same in both paired end and mate pair.

Page Not Found

http://www.illumina.com/technology/paired_end_sequencing_assay.ilmn

Page Not Found

http://www.illumina.com/technology/mate_pair_sequencing_assay.ilmn

The figures on those two pages show complementary reads for both.

Am I misinterpreting those two pages? Are the terms used inconsistently, with different meaning for different sequencers?

For pindel, does it expect reads from the same pair to be complementary or non-complementary? And can it handle inserts in the 8K range if the reads are correctly oriented?

Thanks,
T.Hattum

**HESmith** · 02-11-2014, 09:00 AM

[Note: Illumina-specific explanation] The confusion is due to ambiguity in usage. Paired-end is the type of sequencing, in contrast to single-end. These terms are also used to describe the types of library, since the early versions of Illumina sample prep were different depending upon whether you wanted to sequence one end or two. In both cases, the insert is a contiguous fragment of gDNA (or cDNA). The insert is sequenced from the end(s), and sequencing is 5'->3', which means that read two of paired-end sequencing is the reverse complement of read one. Alignment of each read produces the following orientation (sometimes referred to as head-to-head):

read1----> <----read2

Mate-pair libraries are not constructed from a contiguous segment of gDNA, but from a circular permutation that produces tail-to-tail orientation of the aligned reads:

read1<---- ---->read2

Note that the orientation is different for alignment only. Paired-end sequencing always reads into the insert and from the opposite strands.

HTH

**Topulaneus-Hattum** · 02-11-2014, 11:06 AM

Ahhh... (the sound of enlightenment on my end). Thanks HESmith.

So this looks like something I can deduce from a small sample from my data. I should see either an abundance of head-to-head (which I now believe is what pindel wants), or an abundance of tail-to-tail. If I see the latter then I should consider my data as "mate-pair". Please correct me if I am wrong.

In KaiYe's first reply to me, he indicates I should use sam2pindel if I have mate-pair data. But the pindel user manual (gmt.genome.wustl.edu/pindel/current/user-manual.html) only indicates sam2pindel for when my alignments weren't created by BWA. It's not at all clear to me how sam2pindel knows whether I am giving it alignments from mate-pair instead of alignments from paired-end. My alignments are in BAM, created by BWA. The advice to use sam2pindel seems contrary to the workflow picture at the top of the manual page. Clearly I am missing something.

Again, thanks for any help. I very much appreciate the help I've received so far.
T.Hattum

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News