Seqanswers Leaderboard Ad

**sheltonj** · 09-16-2013, 08:09 AM

What the difference between the library type single and paired, DeSeq

I am preparing read counts for DeSeq. I have no reference genome for these transcriptomes. I am writing a script to extract counts from sam files produced by bowtie2 https://github.com/i5K-KINBRE-script...anscriptome.pl.

It seems to me that Deseq assumes each pair counts as one alignment for "paired" data. Bowtie2 and many other packages attempt aligning single reads when the pair fails. I also have reads that have no pairs (one set failed my base quality filters).

It seems like addition, useful information may be gained using the pairs with a quality issue in only one end or an unexpected insert length. It also seems that feeding the pairing information to Bowtie2 could increase the quality of alignment for some ambiguously mapping reads. It also seems to me DeSeq makes no use of pairs other than to normalize between samples. Does this sound correct to you? If that is the case why label samples "paired" or "single."

My main questions are:

1) If I use "paired" in DeSeq should I force Bowtie2 to only report pairs where both align as a pair and divide the counts by 2?

2) If I include my single end alignments, should I count each single read as +1 and each pair as +1? Should I use "paired" or "single" in DeSeq and why does this setting matter?

Thanks

**dpryan** · 09-16-2013, 09:17 AM

The "paired" or "single" in the DESeq vignette is just an example of (1) how to deal with multiple factors and (2) that single-end and paired-end libraries can produce slightly different results (likely due to mappability). If you just have one type of library, then there's no reason to include that in your model. As you suspected, both read pairs and single reads count as 1, since they each represent a single original molecule.

Perhaps one of the DESeq authors has a different opinion, but I would only really worry about including orphaned reads if the rate of them was very different between samples (you'd see if this is causing an issue in a principal components graph or by doing clustering). This would likely indicate a general difference in overall quality between the two

BTW, your perl script could be replaced by htseq-count, where you make an annotation file wherein each contig is a feature (this has the benefit of not counting multimappers and allowing filtering by MAPQ).

**sheltonj** · 09-16-2013, 10:17 AM

DeSeq paired-end and orphaned reads

That you for your reply. I have no reference genome. I am assuming you mean that I create a gff where each feature is a contig. If this is correct, what software do you use to create the gff from contigs.

I am adding filtering steps to my script currently:
1) For Bowtie2 column 5 is map quality.

2) Column 7 indicates whether the read has a mate and which contig the mate aligns to. Keeping reads aligning with a value other than "=" or "*" in column 7 would remove pairs with reads that align to different contigs.

3) The optional field "XS:i:<N>" only appears for reads that aligns ambiguously. (Would you recommend filtering a pair if one read aligns ambiguously or only if both align ambiguously?)

4) I could also filter by by problematic pairing using the optional field "YT:Z:<S>" where a value of UU indicates the read was not part of a pair. Value of CP indicates the read was part of a pair and the pair aligned concordantly. Value of DP indicates the read was part of a pair and the pair aligned discordantly. Value of UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly.

Any thoughts on the filters listed above.

Thanks,
Jennifer

**dpryan** · 09-16-2013, 10:34 AM

Yeah, you'd need to make the GFF or GTF file. A GTF file would be easy to make from the header of one of your SAM/BAM files (since you just need the contig names and lengths). It sounds like the additions to your script should take care of most of the issues

)

Regarding (2), that will also filter any read with a secondary alignment (XS score) equivalent to that of the primary alignment and where the primary alignment contains mismatches. Regardless, you'll certainly want to get rid of those!

Regarding (3), the presence of an XS score actually just means that there's a secondary alignment with a score of at least that specified by --score-min. If AS=0 and XS=-60, I wouldn't call the alignment ambiguous. This is generally why it's better to just filter at a MAPQ (say 10 or 20).

Regarding (4), I don't recall off-hand what bowtie2's metric is for calling something unique or not, so I can't advise on the utility of this.

Best of luck,
Devon

**sheltonj** · 09-17-2013, 03:46 PM

Hi thanks,

I drafted a perl script to count paired end reads and broken pairs that align to de novo transcriptomes. The script takes Bowtie2 "best match" sam files as input.

The script "count_reads_aligned_to_de_novo_transcriptome.pl" is briefly described at the bottom of this README here https://github.com/i5K-KINBRE-script...ster/README.md.

The script is available here https://github.com/i5K-KINBRE-script...anscriptome.pl

Comments, questions, or suggestions would be appreciated

-Jennifer

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Using htseq-count with paired-end and orphaned reads

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News