Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low pairing rate in SOLiD 4 pair-end transcriptome sequencing

    Dear all,

    We have conducted a transciptome sequencing projects on Zebrafish using SOLiD 4 PE protocal (50X35). After analysis using Bioscope 1.3.1 WTA pipeline, we found that among 80% mapped reads there are only 31% pairs located in same chromosomes, and more than 45% paired ends located in different chromosomes. Is this rate normal, or something was going wrong in our expriment and analysis?

    Best wishes
    Chen
    Last edited by amurocw; 03-14-2011, 06:25 PM.

  • #2
    Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
    The scenario you are describing sounds like a sample with a huge number of structural variations.

    Comment


    • #3
      Originally posted by zee View Post
      Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
      The scenario you are describing sounds like a sample with a huge number of structural variations.
      Thanks for your suggestion. I will try Bfast again.

      However, when using Tophat, nearly the same pairing rates were got. Although Zebrafish genome is preliminary assembled, the low pairing rate still can not be explained.

      Zee, do you have any SOLiD 4 PE data? What is the pairing rate looks like?

      Comment


      • #4
        Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
        Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.


        81398098 the read is paired in sequencing
        40699049 Total first read (50.00% total read)
        40699049 Total second read (50.00% total read)
        33107767 the query sequence itself is unmapped (40.67% total read)
        10726487 Unmapped first read (26.36% total first read)
        22381280 Unmapped second read (54.99% total second read)
        48290331 Total mapped reads (59.33% total read)
        29972562 mapped first read (62.07% total mapped, 73.64% total first read)
        18317769 mapped second read (37.93% total mapped, 45.01% total second read)
        35022988 both reads mapped (72.53% total mapped, 43.03% total read)
        31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
        33107767 singletons (mates unmapped) (40.67%)
        22613871 strand of the query is reverse
        22614187 strand of the mate is reverse
        0 the alignment is not primary
        0 the read fails platform/vendor quality checks
        274708 the read is either a PCR or an optical duplicate

        Comment


        • #5
          Originally posted by westerman View Post
          Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
          Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.


          81398098 the read is paired in sequencing
          40699049 Total first read (50.00% total read)
          40699049 Total second read (50.00% total read)
          33107767 the query sequence itself is unmapped (40.67% total read)
          10726487 Unmapped first read (26.36% total first read)
          22381280 Unmapped second read (54.99% total second read)
          48290331 Total mapped reads (59.33% total read)
          29972562 mapped first read (62.07% total mapped, 73.64% total first read)
          18317769 mapped second read (37.93% total mapped, 45.01% total second read)
          35022988 both reads mapped (72.53% total mapped, 43.03% total read)
          31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
          33107767 singletons (mates unmapped) (40.67%)
          22613871 strand of the query is reverse
          22614187 strand of the mate is reverse
          0 the alignment is not primary
          0 the read fails platform/vendor quality checks
          274708 the read is either a PCR or an optical duplicate

          Hi Westerman,
          Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

          Thanks in advance.
          Best regards,

          S.

          Comment


          • #6
            Originally posted by Sheila View Post
            Hi Westerman,
            Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

            Thanks in advance.
            Best regards,

            S.
            Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

            Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

            You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.


            11742104 the read is paired in sequencing
            5871052 Total first read (50.00% total read)
            5871052 Total second read (50.00% total read)

            4766137 the query sequence itself is unmapped (40.59% total read)
            2530271 Unmapped first read (43.10% total first read)
            2235866 Unmapped second read (38.08% total second read)

            6975967 Total mapped reads (59.41% total read)
            3340781 mapped first read (47.89% total mapped, 56.90% total first read)
            3635186 mapped second read (52.11% total mapped, 61.92% total second read)
            5301952 both reads mapped (76.00% total mapped, 45.15% total read)
            2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
            s)

            4766137 singletons (mates unmapped) (40.59%)
            2640283 strand of the query is reverse
            2640358 strand of the mate is reverse

            0 the alignment is not primary
            0 the read fails platform/vendor quality checks
            0 the read is either a PCR or an optical duplicate

            402 Mean Insert Size
            50 - 14999 Insert Size Range

            Comment


            • #7
              Originally posted by westerman View Post
              Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

              Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

              You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.


              11742104 the read is paired in sequencing
              5871052 Total first read (50.00% total read)
              5871052 Total second read (50.00% total read)

              4766137 the query sequence itself is unmapped (40.59% total read)
              2530271 Unmapped first read (43.10% total first read)
              2235866 Unmapped second read (38.08% total second read)

              6975967 Total mapped reads (59.41% total read)
              3340781 mapped first read (47.89% total mapped, 56.90% total first read)
              3635186 mapped second read (52.11% total mapped, 61.92% total second read)
              5301952 both reads mapped (76.00% total mapped, 45.15% total read)
              2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
              s)

              4766137 singletons (mates unmapped) (40.59%)
              2640283 strand of the query is reverse
              2640358 strand of the mate is reverse

              0 the alignment is not primary
              0 the read fails platform/vendor quality checks
              0 the read is either a PCR or an optical duplicate

              402 Mean Insert Size
              50 - 14999 Insert Size Range

              Hi Westerman,
              Thanks very much for the info!
              One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?

              Our statistics for resequencing studies are very similar to yours but still not so similar to your transcriptomics results in maize in terms of the number of properly paired reads.

              Regards,

              S.

              Comment


              • #8
                Originally posted by Sheila View Post
                Hi Westerman,
                Thanks very much for the info!
                One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?
                F3 is 50 bases, F5 is 35 bases. No trimming.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X