Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compute paired-end distance distribution?

    I'm guessing one of the existing packages out there will do this, but haven't quite found it. If I have aligned paired-end / mate-pair data, what open source tool(s) will compute the mean & stddev of the distances between pairs?

    I may just write a Perl one as an exercise, but even then it will be useful to have a comparator.

  • #2
    Are you talking about when you have a reference genome, mapping the paired end reads onto it, and then tallying up the observed separations?

    Comment


    • #3
      Originally posted by krobison View Post
      I'm guessing one of the existing packages out there will do this, but haven't quite found it. If I have aligned paired-end / mate-pair data, what open source tool(s) will compute the mean & stddev of the distances between pairs?

      I may just write a Perl one as an exercise, but even then it will be useful to have a comparator.
      You can use the "dnaa/dutil/dbampairedenddist" C program in the DNAA package to compute a histogram of the paired end distribution. It takes as input a BAM file.

      Nils

      Comment


      • #4
        Thanks -- apologies if it was a bit unclear in the original posting -- I was looking for something that would take aligned reads as input and give the distribution as output.

        I'm not having much luck with the sourceforge pages -- the links to the files don't seem to be there.

        Comment


        • #5
          Originally posted by krobison View Post
          Thanks -- apologies if it was a bit unclear in the original posting -- I was looking for something that would take aligned reads as input and give the distribution as output.

          I'm not having much luck with the sourceforge pages -- the links to the files don't seem to be there.
          The input is an aligned file (in binary SAM format). You can checkout the latest code using git as there are no releases yet:
          Code:
           git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
          Nils

          Comment


          • #6
            Originally posted by krobison View Post
            I'm guessing one of the existing packages out there will do this, but haven't quite found it. If I have aligned paired-end / mate-pair data, what open source tool(s) will compute the mean & stddev of the distances between pairs?
            Using the mean and standard deviation implicity assume the variable you are measuring are normally distributed. This is not true for insert lengths from mapping to a reference genome, as genomic rearrangements and ambiguous mappings will cause tjhe distribution to have a fat right tail - disruptive outliers.

            The MODE, or maybe the median would be a better estimate for the centroid, and the mean absolute distance for the standard deviation.

            Comment


            • #7
              Novoalign maps reads to reference sequences and by default outputs a statistical distribution of the fragment lengths in the standard alignment report.

              Additionally you can output SAM format when running novoalign ("-o S" option) and use the DNAA package as suggested by nilshomer.

              Send me a private message or see www.novocraft.com wiki for more information.


              Originally posted by krobison View Post
              I'm guessing one of the existing packages out there will do this, but haven't quite found it. If I have aligned paired-end / mate-pair data, what open source tool(s) will compute the mean & stddev of the distances between pairs?

              I may just write a Perl one as an exercise, but even then it will be useful to have a comparator.

              Comment


              • #8
                DNAA pair-end (mate pair) distance distribution

                Originally posted by krobison View Post
                I'm guessing one of the existing packages out there will do this, but haven't quite found it. If I have aligned paired-end / mate-pair data, what open source tool(s) will compute the mean & stddev of the distances between pairs?

                I may just write a Perl one as an exercise, but even then it will be useful to have a comparator.
                Did you make it to run dnaa ?
                after adding the needed lib from samtool source and bfast tools i have the following error message :

                ~/NGSSOFT/dnaa/dutil$ gcc -o prdist dbampairedenddist.c
                /tmp/ccu8EfwQ.o: In function `main':
                dbampairedenddist.c: (.text+0x115): undefined reference to `PrintError'
                dbampairedenddist.c: (.text+0x18d): undefined reference to `samopen'
                dbampairedenddist.c: (.text+0x1c5): undefined reference to `PrintError'
                dbampairedenddist.c: (.text+0x34d): undefined reference to `samread'
                dbampairedenddist.c: (.text+0x3f9): undefined reference to `samclose'
                collect2: ld returned 1 exit status

                Is there any other alternative solution available ?

                Thanks in advance,

                Regards,
                Ramzi
                Last edited by ramouz87; 11-12-2009, 06:34 AM. Reason: bug
                Research Scientist - Bioinformatics
                Sidra Medical and Research Center

                Comment


                • #9
                  Hi,
                  I am trying to download the DNAA package from http://sourceforge.net/projects/dnaa/files/ But It seems as if the link doesn't work.

                  Comment


                  • #10
                    Ah, I have already found the download link git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa. But I couldn't download it via Mozilla Firefox.
                    Firefox doesn't know how to open this address, because the protocol (git) isn't associated with any program.

                    Comment


                    • #11
                      Originally posted by ramouz87 View Post
                      Did you make it to run dnaa ?
                      after adding the needed lib from samtool source and bfast tools i have the following error message :

                      ~/NGSSOFT/dnaa/dutil$ gcc -o prdist dbampairedenddist.c
                      /tmp/ccu8EfwQ.o: In function `main':
                      dbampairedenddist.c: (.text+0x115): undefined reference to `PrintError'
                      dbampairedenddist.c: (.text+0x18d): undefined reference to `samopen'
                      dbampairedenddist.c: (.text+0x1c5): undefined reference to `PrintError'
                      dbampairedenddist.c: (.text+0x34d): undefined reference to `samread'
                      dbampairedenddist.c: (.text+0x3f9): undefined reference to `samclose'
                      collect2: ld returned 1 exit status

                      Is there any other alternative solution available ?

                      Thanks in advance,

                      Regards,
                      Ramzi
                      Try
                      Code:
                      ./configure
                      make
                      make install
                      Use autotools when available instead of trying gcc!

                      Comment


                      • #12
                        Originally posted by MerFer View Post
                        Ah, I have already found the download link git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa. But I couldn't download it via Mozilla Firefox.
                        Firefox doesn't know how to open this address, because the protocol (git) isn't associated with any program.
                        You must use the "git" to retrieve the source code. Once git is installed, type:

                        Code:
                        git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advanced Tools Transforming the Field of Cytogenomics
                          by seqadmin


                          At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                          Yesterday, 06:26 AM
                        • seqadmin
                          How RNA-Seq is Transforming Cancer Studies
                          by seqadmin



                          Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                          09-07-2023, 11:15 PM
                        • seqadmin
                          Methods for Investigating the Transcriptome
                          by seqadmin




                          Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

                          Whole Transcriptome RNA-seq
                          Whole transcriptome sequencing...
                          08-31-2023, 11:07 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 06:57 AM
                        0 responses
                        6 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 07:53 AM
                        0 responses
                        8 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 09-25-2023, 07:42 AM
                        0 responses
                        14 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 09-22-2023, 09:05 AM
                        0 responses
                        44 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X