Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello Kaiye,

    I am running pindel on DNAseq whole exomes data from BWA output and out of the four samples I have one of them gives me segmentation fault. The other three run fine without any issues any idea why?

    Thanks,
    Nino

    Comment


    • Originally posted by Nino View Post
      Hello Kaiye,

      I am running pindel on DNAseq whole exomes data from BWA output and out of the four samples I have one of them gives me segmentation fault. The other three run fine without any issues any idea why?

      Thanks,
      Nino
      your pindel version?

      Comment


      • Pindel version 0.2.5a2, September 17 2013.

        Comment


        • Originally posted by Nino View Post
          Pindel version 0.2.5a2, September 17 2013.
          can you either provide your bam or isolate the regions causing the error? [email protected]

          Comment


          • I have sent you an email please look out for a med.cornell.edu email.

            Thanks,
            Nino

            Comment


            • Hi,

              I am running Pindel version 0.2.5 on a set of samples with the command:
              pindel -T 18 -f human_g1k_v37.fa.txt -i config070.txt -c ALL -L TESTconfig070.txt.log -o TESTconfig070.txt.out

              It ran for a couple of minutes and gave an error message:
              Error: chromosome with name : NC_007605 not yet loaded into memory. Aborting.

              I am not sure what it means. Anyone has suggestion on how to solve the problem? Thank you.

              Comment


              • your bam was aligned against a different reference genome. NC_007605 is in your bam file but not in the reference provided. Pindel is looking for interchromosomal split-reads but does not find the chr sequence specified in the mapping data.

                Comment


                • I am new to pindel and am running version 0.2.5a3. I have large insert Illiumina data, with inserts in the range 6k to 10K. I have indicated 8000 in my config file but my first question is how does pindel know what the distribution is? Does it recover the distribution from the alignments?

                  My second question is that I am getting MANY warnings that look exactly like this:
                  warning: currentState.Reads_RP_Discovery[read_index].InsertSize 8000
                  Can I ignore these or are they telling me something is wrong?

                  T.Hattum

                  Comment


                  • Originally posted by Topulaneus-Hattum View Post
                    I am new to pindel and am running version 0.2.5a3. I have large insert Illiumina data, with inserts in the range 6k to 10K. I have indicated 8000 in my config file but my first question is how does pindel know what the distribution is? Does it recover the distribution from the alignments?

                    My second question is that I am getting MANY warnings that look exactly like this:
                    warning: currentState.Reads_RP_Discovery[read_index].InsertSize 8000
                    Can I ignore these or are they telling me something is wrong?

                    T.Hattum
                    are you working on mate-pair data? you'd better to extract reads with the provided sam2pindel, then compute.

                    please ignore the warnings.

                    Comment


                    • Hi, thanks for your very fast reply!

                      I'm not sure about your question. The data is pairs, two 150 bp reads expected to be about 8K apart. Does that fit the description of mate-pair?

                      My reference is human, and before I run the entire genome I tried a single chromosome test for chr22. My input is mappings from BWA-mem, as a single positionally sorted BAM file (≈30G bytes). Looking at the user manual, I am following step 1 option 1 which appears to indicate I can use my BAM file directly. Option3 discusses sam2pindel but the context there is for aligners other than BWA. Are you suggesting that I should use option 3 because I have long inserts?

                      Comment


                      • the orientation of the reads differ between paired-end and mate-pair. normally mate-pair data has longer insert, in a range as your data. please make sure it is paired-end library. Pindel assumes paired-end data.

                        Comment


                        • "orientation of the reads differ between paired-end and mate-pair"

                          I tried to find something that explains the difference online. I found a SeqAnswer that described some physical difference in the process but didn't elaborate on how this would affect orientations.

                          I understand that reads from opposite ends of the same molecule, read along different strands, will result in pairs that have opposite orientation (in the absence of any SV). And examining the orientations of the alignments for my pairs, my pairs have opposite orientations.

                          But I still don't know whether my data is paired-end or mate-pair (because I don't understand what the terms mean, except that they mean two things). Am I off base in thinking this is something I can determine just from looking at my data? Or do I need to go back to the people that did the sequencing and ask them? OR is it more than just that the orientations are opposite, that +- is different than -+?

                          Apologizing for my ignorance in this matter. And thanks very much for educating me.
                          T.Hattum

                          Comment


                          • "orientation of the reads differ between paired-end and mate-pair"

                            That statement confuses me now, because these two descriptions from Illumina appear to indicate that read orientations are the same in both paired end and mate pair.


                            The figures on those two pages show complementary reads for both.

                            Am I misinterpreting those two pages? Are the terms used inconsistently, with different meaning for different sequencers?

                            For pindel, does it expect reads from the same pair to be complementary or non-complementary? And can it handle inserts in the 8K range if the reads are correctly oriented?

                            Thanks,
                            T.Hattum

                            Comment


                            • [Note: Illumina-specific explanation] The confusion is due to ambiguity in usage. Paired-end is the type of sequencing, in contrast to single-end. These terms are also used to describe the types of library, since the early versions of Illumina sample prep were different depending upon whether you wanted to sequence one end or two. In both cases, the insert is a contiguous fragment of gDNA (or cDNA). The insert is sequenced from the end(s), and sequencing is 5'->3', which means that read two of paired-end sequencing is the reverse complement of read one. Alignment of each read produces the following orientation (sometimes referred to as head-to-head):

                              read1----> <----read2

                              Mate-pair libraries are not constructed from a contiguous segment of gDNA, but from a circular permutation that produces tail-to-tail orientation of the aligned reads:

                              read1<---- ---->read2

                              Note that the orientation is different for alignment only. Paired-end sequencing always reads into the insert and from the opposite strands.

                              HTH

                              Comment


                              • Ahhh... (the sound of enlightenment on my end). Thanks HESmith.

                                So this looks like something I can deduce from a small sample from my data. I should see either an abundance of head-to-head (which I now believe is what pindel wants), or an abundance of tail-to-tail. If I see the latter then I should consider my data as "mate-pair". Please correct me if I am wrong.

                                In KaiYe's first reply to me, he indicates I should use sam2pindel if I have mate-pair data. But the pindel user manual (gmt.genome.wustl.edu/pindel/current/user-manual.html) only indicates sam2pindel for when my alignments weren't created by BWA. It's not at all clear to me how sam2pindel knows whether I am giving it alignments from mate-pair instead of alignments from paired-end. My alignments are in BAM, created by BWA. The advice to use sam2pindel seems contrary to the workflow picture at the top of the manual page. Clearly I am missing something.

                                Again, thanks for any help. I very much appreciate the help I've received so far.
                                T.Hattum

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X