Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mysteries of the Bioscope pairing pipeline

    Hello to all,

    I'm currently trying to extract some reasonable data from the Bioscope pairing tool.

    We have paired-end reads from a library size-selected for 200 bp.

    We have not set a value for the parameters
    insert.start and insert.end in the pairing.ini file.

    The description in the Bioscope manual says: "The minimum(maximum) insert size to define a good mate. If a value is not set, the tool tries to measure the best value"

    My question is: were can I see this value afterwards. It would be quite interesting, what measure defines a good mate/pair.

    Can somebody give a hint?

    Many thanks

  • #2
    The BAM file should have the insert ranges. But for a quick overview my understanding is that the lower and upper ranges in the pairing.dat.freq file (not the full file) gives the insert range. On the other hand, the recent LifeTech 'pairing_stats_n_clean_bam' which is supposedly generating 'official' statistics gives a different (and smaller) range than pairing.dat.freq file. Since the new program is looking through the BAM file (and taking forever to do so!) I'd trust it more.

    Comment


    • #3
      Thank you westerman. Yes that was exactly my confusion. The pairing.stats gives:
      Insert range 62-207 in the header, while the pairing.dat.freq file gives values from 35-207. If I want to take the 'official' numbers for AAA pairs from the pairing.stats which range do you think is used?

      What do you mean with 'new program'

      best regards

      Julia

      Comment


      • #4
        Ah, you have a 'pairing.stats' file. This indicates that you ran your analysis with bioscope version 1.2 -- the version before LifeTech took away the stats file. In v.1.3. they did away with the stats file but, within the last couple of weeks, they issued a program called 'pairing_stats_n_clean_bam' which restores the stats file as well as cleans up the mapped reads BAM file (which, erroneously, has unmapped reads in it.) You should not run the 'pairing_stats_n_clean_bam' program on v.1.2 and earlier files.

        In your case just take the range from 'pairing.stats'.

        Comment


        • #5
          Ah, most interesting! This might also explain another observation I made:

          I used Picard to remove duplicate reads in the BAM file. Picard reported a number of 'records' that did not match any of the numbers reported in the pairing.stats. I was scratching my head about this, too. Maybe it's best to switch to v. 1.3. - too much muddle here.

          THX J

          Comment


          • #6
            Hey, it's me again.

            In the meantime things cleared up. The faulty BAM files were introduced by v1.3. So it's better to stick with v1.2 right now. Right?

            I take the size range from the pairing.stats! BTW- it matches the values in the pairing.dat.freq, it did not before because I took the wrong file from a another library:-( Sorry for the confusion.

            The problem with the 'records' count acc. to Picard still remains. But I will try to figure this out next week.

            I go for weekend now.

            THX J

            Comment


            • #7
              Originally posted by jbeck View Post
              ... The faulty BAM files were introduced by v1.3. So it's better to stick with v1.2 right now. Right?
              You can use v.1.3 but should run the 'pairing_stats_n_clean_bam' program if you want a clean BAM file. Be aware that said program takes a long time to run. I hesitate to tell someone to not use the latest and greatest version since there should be bug fixes and speed-ups between v1.2 and v1.3.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                06-06-2024, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 06-07-2024, 06:58 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-06-2024, 08:18 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-06-2024, 08:04 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-03-2024, 06:55 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Working...
              X