Seqanswers Leaderboard Ad

**zee** · 03-14-2011, 05:41 PM

Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
The scenario you are describing sounds like a sample with a huge number of structural variations.

**amurocw** · 03-14-2011, 06:22 PM

Originally posted by zee View Post

Perhaps you should confirm your alignment results with BFAST or NovoalignCS. Both tools can write out SAM/BAM format from which you could generate some useful statistics on proper pairs, etc.
The scenario you are describing sounds like a sample with a huge number of structural variations.

Thanks for your suggestion. I will try Bfast again.

However, when using Tophat, nearly the same pairing rates were got. Although Zebrafish genome is preliminary assembled, the low pairing rate still can not be explained.

Zee, do you have any SOLiD 4 PE data? What is the pairing rate looks like?

**westerman** · 03-16-2011, 12:24 PM

Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.

81398098 the read is paired in sequencing
40699049 Total first read (50.00% total read)
40699049 Total second read (50.00% total read)
33107767 the query sequence itself is unmapped (40.67% total read)
10726487 Unmapped first read (26.36% total first read)
22381280 Unmapped second read (54.99% total second read)
48290331 Total mapped reads (59.33% total read)
29972562 mapped first read (62.07% total mapped, 73.64% total first read)
18317769 mapped second read (37.93% total mapped, 45.01% total second read)
35022988 both reads mapped (72.53% total mapped, 43.03% total read)
31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
33107767 singletons (mates unmapped) (40.67%)
22613871 strand of the query is reverse
22614187 strand of the mate is reverse
0 the alignment is not primary
0 the read fails platform/vendor quality checks
274708 the read is either a PCR or an optical duplicate

**Sheila** · 04-13-2011, 03:25 AM

Originally posted by westerman View Post

Stats from a recent partial SOLiD 4 PE run mapped to Arabidopsis. 1st read is F3, 2nd read is F5. As usual the SOLiD gives all reads -- high quality or not -- and relies on the mapping to discard poorer reads.
Note that "proper paired" reads (e.g., same chromosome, within a good insert distances) is 39% of the total reads and 66% of the total number of reads.

81398098 the read is paired in sequencing
40699049 Total first read (50.00% total read)
40699049 Total second read (50.00% total read)
33107767 the query sequence itself is unmapped (40.67% total read)
10726487 Unmapped first read (26.36% total first read)
22381280 Unmapped second read (54.99% total second read)
48290331 Total mapped reads (59.33% total read)
29972562 mapped first read (62.07% total mapped, 73.64% total first read)
18317769 mapped second read (37.93% total mapped, 45.01% total second read)
35022988 both reads mapped (72.53% total mapped, 43.03% total read)
31774598 the read is mapped in a proper pair (65.80% total mapped, 39.04% total reads)
33107767 singletons (mates unmapped) (40.67%)
22613871 strand of the query is reverse
22614187 strand of the mate is reverse
0 the alignment is not primary
0 the read fails platform/vendor quality checks
274708 the read is either a PCR or an optical duplicate

Hi Westerman,
Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

Thanks in advance.
Best regards,

S.

**westerman** · 04-13-2011, 06:57 AM

Originally posted by Sheila View Post

Hi Westerman,
Are you results from transcriptomics or resequencing studies? which mapping tool did you use for the analysis?

Thanks in advance.
Best regards,

S.

Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.

11742104 the read is paired in sequencing
5871052 Total first read (50.00% total read)
5871052 Total second read (50.00% total read)

4766137 the query sequence itself is unmapped (40.59% total read)
2530271 Unmapped first read (43.10% total first read)
2235866 Unmapped second read (38.08% total second read)

6975967 Total mapped reads (59.41% total read)
3340781 mapped first read (47.89% total mapped, 56.90% total first read)
3635186 mapped second read (52.11% total mapped, 61.92% total second read)
5301952 both reads mapped (76.00% total mapped, 45.15% total read)
2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
s)

4766137 singletons (mates unmapped) (40.59%)
2640283 strand of the query is reverse
2640358 strand of the mate is reverse

0 the alignment is not primary
0 the read fails platform/vendor quality checks
0 the read is either a PCR or an optical duplicate

402 Mean Insert Size
50 - 14999 Insert Size Range

**Sheila** · 04-13-2011, 07:18 AM

Originally posted by westerman View Post

Those statistics was a resequencing run. So perhaps it is not as applicable to a transcriptome study. The tool is LifeTech's Bioscope software.

Here are some statistics from a recent transcriptome run to Maize. This was a partial run thus the "small" number of reads. I'm still always amazed by the numbers we get from NGS machines compared to the Sanger methods of 5 years ago.

You can see that the mapping went well enough at ~50%. As is normal with SOLiD these are from the raw reads without consideration of quality, thus that percentage is more-or-less expected ... although we have seen better. The amount of RNA we had was very low and we are suspicious that this may be contributing to this.

11742104 the read is paired in sequencing
5871052 Total first read (50.00% total read)
5871052 Total second read (50.00% total read)

4766137 the query sequence itself is unmapped (40.59% total read)
2530271 Unmapped first read (43.10% total first read)
2235866 Unmapped second read (38.08% total second read)

6975967 Total mapped reads (59.41% total read)
3340781 mapped first read (47.89% total mapped, 56.90% total first read)
3635186 mapped second read (52.11% total mapped, 61.92% total second read)
5301952 both reads mapped (76.00% total mapped, 45.15% total read)
2385930 the read is mapped in a proper pair (34.20% total mapped, 20.32% total read
s)

4766137 singletons (mates unmapped) (40.59%)
2640283 strand of the query is reverse
2640358 strand of the mate is reverse

0 the alignment is not primary
0 the read fails platform/vendor quality checks
0 the read is either a PCR or an optical duplicate

402 Mean Insert Size
50 - 14999 Insert Size Range

Hi Westerman,
Thanks very much for the info!
One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?

Our statistics for resequencing studies are very similar to yours but still not so similar to your transcriptomics results in maize in terms of the number of properly paired reads.

Regards,

S.

**westerman** · 04-13-2011, 07:32 AM

Originally posted by Sheila View Post

Hi Westerman,
Thanks very much for the info!
One last thing, what was the read length in this experiment? did you trim any of your reads before mapping?

F3 is 50 bases, F5 is 35 bases. No trimming.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Low pairing rate in SOLiD 4 pair-end transcriptome sequencing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News