Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • amurocw
    Junior Member
    • Jul 2010
    • 3

    Low pairing rate in SOLiD 4 pair-end transcriptome sequencing

    Dear all,

    We have conducted a transciptome sequencing projects on Zebrafish using SOLiD 4 PE protocal (50X35). After analysis using Bioscope 1.3.1 WTA pipeline, we found that among 80% mapped reads there are only 31% pairs located in same chromosomes, and more than 45% paired ends located in different chromosomes. Is this rate normal, or something was going wrong in our expriment and analysis?

    Best wishes
    Chen
    Last edited by amurocw; 03-14-2011, 06:25 PM.
  • zee
    NGS specialist
    • Apr 2008
    • 249

    #2
    Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
    The scenario you are describing sounds like a sample with a huge number of structural variations.

    Comment

    • amurocw
      Junior Member
      • Jul 2010
      • 3

      #3
      Originally posted by zee View Post
      Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
      The scenario you are describing sounds like a sample with a huge number of structural variations.
      Thanks for your suggestion. I will try Bfast again.

      However, when using Tophat, nearly the same pairing rates were got. Although Zebrafish genome is preliminary assembled, the low pairing rate still can not be explained.

      Zee, do you have any SOLiD 4 PE data? What is the pairing rate looks like?

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
        Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.


        81398098 the read is paired in sequencing
        40699049 Total first read (50.00% total read)
        40699049 Total second read (50.00% total read)
        33107767 the query sequence itself is unmapped (40.67% total read)
        10726487 Unmapped first read (26.36% total first read)
        22381280 Unmapped second read (54.99% total second read)
        48290331 Total mapped reads (59.33% total read)
        29972562 mapped first read (62.07% total mapped, 73.64% total first read)
        18317769 mapped second read (37.93% total mapped, 45.01% total second read)
        35022988 both reads mapped (72.53% total mapped, 43.03% total read)
        31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
        33107767 singletons (mates unmapped) (40.67%)
        22613871 strand of the query is reverse
        22614187 strand of the mate is reverse
        0 the alignment is not primary
        0 the read fails platform/vendor quality checks
        274708 the read is either a PCR or an optical duplicate

        Comment

        • Sheila
          Member
          • Jun 2009
          • 17

          #5
          Originally posted by westerman View Post
          Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
          Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.


          81398098 the read is paired in sequencing
          40699049 Total first read (50.00% total read)
          40699049 Total second read (50.00% total read)
          33107767 the query sequence itself is unmapped (40.67% total read)
          10726487 Unmapped first read (26.36% total first read)
          22381280 Unmapped second read (54.99% total second read)
          48290331 Total mapped reads (59.33% total read)
          29972562 mapped first read (62.07% total mapped, 73.64% total first read)
          18317769 mapped second read (37.93% total mapped, 45.01% total second read)
          35022988 both reads mapped (72.53% total mapped, 43.03% total read)
          31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
          33107767 singletons (mates unmapped) (40.67%)
          22613871 strand of the query is reverse
          22614187 strand of the mate is reverse
          0 the alignment is not primary
          0 the read fails platform/vendor quality checks
          274708 the read is either a PCR or an optical duplicate

          Hi Westerman,
          Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

          Thanks in advance.
          Best regards,

          S.

          Comment

          • westerman
            Rick Westerman
            • Jun 2008
            • 1104

            #6
            Originally posted by Sheila View Post
            Hi Westerman,
            Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

            Thanks in advance.
            Best regards,

            S.
            Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

            Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

            You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.


            11742104 the read is paired in sequencing
            5871052 Total first read (50.00% total read)
            5871052 Total second read (50.00% total read)

            4766137 the query sequence itself is unmapped (40.59% total read)
            2530271 Unmapped first read (43.10% total first read)
            2235866 Unmapped second read (38.08% total second read)

            6975967 Total mapped reads (59.41% total read)
            3340781 mapped first read (47.89% total mapped, 56.90% total first read)
            3635186 mapped second read (52.11% total mapped, 61.92% total second read)
            5301952 both reads mapped (76.00% total mapped, 45.15% total read)
            2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
            s)

            4766137 singletons (mates unmapped) (40.59%)
            2640283 strand of the query is reverse
            2640358 strand of the mate is reverse

            0 the alignment is not primary
            0 the read fails platform/vendor quality checks
            0 the read is either a PCR or an optical duplicate

            402 Mean Insert Size
            50 - 14999 Insert Size Range

            Comment

            • Sheila
              Member
              • Jun 2009
              • 17

              #7
              Originally posted by westerman View Post
              Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

              Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

              You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.


              11742104 the read is paired in sequencing
              5871052 Total first read (50.00% total read)
              5871052 Total second read (50.00% total read)

              4766137 the query sequence itself is unmapped (40.59% total read)
              2530271 Unmapped first read (43.10% total first read)
              2235866 Unmapped second read (38.08% total second read)

              6975967 Total mapped reads (59.41% total read)
              3340781 mapped first read (47.89% total mapped, 56.90% total first read)
              3635186 mapped second read (52.11% total mapped, 61.92% total second read)
              5301952 both reads mapped (76.00% total mapped, 45.15% total read)
              2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
              s)

              4766137 singletons (mates unmapped) (40.59%)
              2640283 strand of the query is reverse
              2640358 strand of the mate is reverse

              0 the alignment is not primary
              0 the read fails platform/vendor quality checks
              0 the read is either a PCR or an optical duplicate

              402 Mean Insert Size
              50 - 14999 Insert Size Range

              Hi Westerman,
              Thanks very much for the info!
              One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?

              Our statistics for resequencing studies are very similar to yours but still not so similar to your transcriptomics results in maize in terms of the number of properly paired reads.

              Regards,

              S.

              Comment

              • westerman
                Rick Westerman
                • Jun 2008
                • 1104

                #8
                Originally posted by Sheila View Post
                Hi Westerman,
                Thanks very much for the info!
                One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?
                F3 is 50 bases, F5 is 35 bases. No trimming.

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:58 AM
                0 responses
                13 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                25 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                36 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                60 views
                0 reactions
                Last Post SEQadmin2  
                Working...