Unconfigured Ad

**colindaven** · 08-24-2012, 03:20 AM

You could try a number of things.

I don't know of any splice aware aligners that will work well with PE SOLiD data.

Firstly, your PE reads are probably bad quality - try trimming to perhaps 20bp. You can check their quality using FastQC or similar.

In terms of aligners, you could try LifeScope/Bioscope and CLC trial version. Bioscope has some RNA specific alignment tools which I haven't tried, and CLC seems to do a rather good job with SOLiD on bacterial genomes at least (more alignments than NovoalignCS, and more good SNPs called apparently, but I can't quantify this globally yet).

NovoalignCS is a capable aligner for SOLiD.

Also, have a look on this section of the Seqanswers site for further comments on alignment.

Lastly, can you get a spliced reference dataset of some sort from Tophat to align against ?

**endether** · 08-24-2012, 11:02 AM

Originally posted by colindaven View Post

You could try a number of things.

I don't know of any splice aware aligners that will work well with PE SOLiD data.

Firstly, your PE reads are probably bad quality - try trimming to perhaps 20bp. You can check their quality using FastQC or similar.

In terms of aligners, you could try LifeScope/Bioscope and CLC trial version. Bioscope has some RNA specific alignment tools which I haven't tried, and CLC seems to do a rather good job with SOLiD on bacterial genomes at least (more alignments than NovoalignCS, and more good SNPs called apparently, but I can't quantify this globally yet).

NovoalignCS is a capable aligner for SOLiD.

Also, have a look on this section of the Seqanswers site for further comments on alignment.

Lastly, can you get a spliced reference dataset of some sort from Tophat to align against ?

Thank you so much for the suggestions.

I ran the fastqc and indeed I found some problem with the qualities of our reads.

Here is what it looks like in the F3 end of one library.

Basically, the quality seems to drop every 5 bases. I am wondering if this is an indication that there's something wrong with the sequencing machine. I randomly picked several libraries and lanes and they all have the same pattern. other than discarding this "noisy" data, do you have any suggestions on that?

For aligners, actually I am wondering why in general, solid reads seem to have lower mapping rate from Bowtie? From what I read from the forum, the best case of Bowtie is only around 50%. Is this a result of general low quality of reads or aligners's 'incompatibility' with color-space coding? It seems that mis-matching tolerances is one issue, as Bowtie can only take 3 mismatches in maximum, but in colorspace reads 1 base mis-match translate into 2 in color-space.

I will definitely try LifeScope next. However the splicing junctions, I did not notice that tophat can generate the junction database separately. I think there might be some overheads when translating the alignment from junctions to the genome, but if there is an existing tool can deal with this issue automatically, I will try to not reinvent the wheel.

Again, thanks a lot for your help!

Best regards,
Zheng

**snetmcom** · 08-24-2012, 12:37 PM

this pattern is normal for solid chemistry. If lifescope is an option, you should always start there.

**morellr** · 11-08-2012, 07:43 AM

Outcome?

I was hoping to see an update of this thread -- Can you give us some details on how your 75-35 PE reads turned out? I'm interested in knowing what percentage of the 35 (F5) reads mapped to the same chromosome as the 75 (F3) reads.

**endether** · 11-08-2012, 09:44 PM

Originally posted by morellr View Post

I was hoping to see an update of this thread -- Can you give us some details on how your 75-35 PE reads turned out? I'm interested in knowing what percentage of the 35 (F5) reads mapped to the same chromosome as the 75 (F3) reads.

The results actually aren't so good. We ended up using Lifescope to do the alignment because it resulted in much better mapping rate (>70%). However, the reported mapping quality is really low, where more than 80% of alignment had mapping quality of 0. We later found that it was because those reads are potentially mapped to multiple loci. We are still on our way to find the actual reason of it. It seems to be a ribosomal RNA contamination right now. However, our protocol actually contains a step to remove rRNAs.

**hildebs** · 11-14-2012, 12:48 PM

Hey endether,

I have observed similar issues with PE SOLiD data. You may want to ask your library prep group which kit they used for RNA-depletion. The sequencing core I work with has used both the ribo-minus and ribo-zero kits for depletion. The ribo-minus kit is very hit-or-miss, and you may need to do it twice. After I map it may still contain up to 50% contamination.
The ribo-zero kit, however, gets consistently low (<5%) ribosomal levels.

If you use a filter fasta for LifeScope mapping, you should be able to quantify rRNA levels as well as mapping levels in the same step. If you still have a high number of low-quality reads after mapping, you may need to remove those (with samtools or some such) before transcript assembly (if you have any reads left).

**endether** · 11-14-2012, 03:17 PM

Originally posted by hildebs View Post

Hey endether,

I have observed similar issues with PE SOLiD data. You may want to ask your library prep group which kit they used for RNA-depletion. The sequencing core I work with has used both the ribo-minus and ribo-zero kits for depletion. The ribo-minus kit is very hit-or-miss, and you may need to do it twice. After I map it may still contain up to 50% contamination.
The ribo-zero kit, however, gets consistently low (<5%) ribosomal levels.

If you use a filter fasta for LifeScope mapping, you should be able to quantify rRNA levels as well as mapping levels in the same step. If you still have a high number of low-quality reads after mapping, you may need to remove those (with samtools or some such) before transcript assembly (if you have any reads left).

Hi hildebs,

Thank you so much for your information. Our library prep group indeed used the ribo-minus kits for depletion. We actually try different rRNA filter files, because there is no official rRNA annotation in the genome we are working on. For some filter files, we sometimes get >70% reads being filtered, but for some other filter, we only get around 10%. It might be the problem of the filter files though. After using the filter fasta, we should left with <1M reads can be mapped to the exon region per lib, which makes our downstream analysis really hard.

We did further analysis by blasting the gene, where a large quantity of 0 quality reads were mapped to, to a repeats database. We found that those genes are somehow associated with 45S rRNAs. I think it's now clear that it should be a rRNA contamination. We are now considering to "rescue" the data and materials besides re-doing the library preparation. and I will definitely let our group know the ribo-zero option.

Thank you so much!

Best,
Zheng

**snetmcom** · 11-15-2012, 03:40 PM

Originally posted by endether View Post

Hi hildebs,

Thank you so much for your information. Our library prep group indeed used the ribo-minus kits for depletion. We actually try different rRNA filter files, because there is no official rRNA annotation in the genome we are working on. For some filter files, we sometimes get >70% reads being filtered, but for some other filter, we only get around 10%. It might be the problem of the filter files though. After using the filter fasta, we should left with <1M reads can be mapped to the exon region per lib, which makes our downstream analysis really hard.

We did further analysis by blasting the gene, where a large quantity of 0 quality reads were mapped to, to a repeats database. We found that those genes are somehow associated with 45S rRNAs. I think it's now clear that it should be a rRNA contamination. We are now considering to "rescue" the data and materials besides re-doing the library preparation. and I will definitely let our group know the ribo-zero option.

Thank you so much!

Best,
Zheng

was this ribominus or ribominus v2? i just now heard about the v2 kits.

**carmeyeii** · 12-06-2012, 03:54 PM

Hello,

I am analyzing 9 RNA-seq libraries which were sequenced on SOLiD.
It seems ilke the best aligners for SOLiD data are Shrimp, Novalign and Lifescope. And from what I've read, Lifescope seems to be the only Colorspace aligner with splicing capabilities.

I've worked with Illumina data before and mapped using TopHat, but I don't really use the novel junction discovery option - I supply a reference transcriptome against which it maps during the first round, and the remaining reads are then mapped agains the genome [wihtout nover junction discovery].

Is there something similar that can be done using any of the former three colorspace aligners?

What have you found best when working with RNA SOLiD libraries ?

Thanks a lot for your help,

Carmen

**hildebs** · 12-07-2012, 07:27 AM

Originally posted by carmeyeii View Post

Hello,

I am analyzing 9 RNA-seq libraries which were sequenced on SOLiD.
It seems ilke the best aligners for SOLiD data are Shrimp, Novalign and Lifescope. And from what I've read, Lifescope seems to be the only Colorspace aligner with splicing capabilities.

I've worked with Illumina data before and mapped using TopHat, but I don't really use the novel junction discovery option - I supply a reference transcriptome against which it maps during the first round, and the remaining reads are then mapped agains the genome [wihtout nover junction discovery].

Is there something similar that can be done using any of the former three colorspace aligners?

What have you found best when working with RNA SOLiD libraries ?

Thanks a lot for your help,

Carmen

Hello Carmen,

If you have access to LifeScope, that is what I have used in the past. You can specify a .gtf file describing the transcriptome and use this for mapping, similar to what you have used tophat for in the past.

LifeScope first aligns to a "filter" fasta, if specified, to filter out the reads that map to "junk" sequences (adapters, rRNA sequences). The reads that map to this filter are excluded from further analysis. Then it maps to exon junctions (pulled from the gtf file, F3 read only) and to exons (F5 read only), and then finally to the genome, for those reads not mapped to the other references. It then merges all of the mapped reads into one file, pairs them and creates a .bam.

I personally do not have much experience with the other mappers. LifeScope came with our 5500 install and I decided it was the best way to go.

I hope this helps!

**carmeyeii** · 12-12-2012, 05:40 PM

Thanks hildebs!

I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.

Any knowledge would be greatly appreciated!

Thanks a lot,

Carmen

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 8 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 44 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 104 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

75+35 Pair-end SOLiD RNA-seq data analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News