Seqanswers Leaderboard Ad

**drio** · 06-07-2010, 02:02 PM

I don't think there is anything like that out there. You need alignments to detect duplicates.
About the SOLiD instrument filtering, perhaps you are talking about dropping reads with low quality?

**Bueller_007** · 06-07-2010, 02:18 PM

Originally posted by drio View Post

I don't think there is anything like that out there. You need alignments to detect duplicates.
About the SOLiD instrument filtering, perhaps you are talking about dropping reads with low quality?

I don't think I need alignments, as I'm talking about identical ~reads~. Removing these duplicates can be performed by Corona prior to data output using the --noduplicates option. However, I can't find an equivalent for data that has already been outputted by the SOLiD system.

There are multiple programs available for filtering out low-quality reads. That's not what I need.

**nilshomer** · 06-07-2010, 02:50 PM

Originally posted by Bueller_007 View Post

I don't think I need alignments, as I'm talking about identical ~reads~. Removing these duplicates can be performed by Corona prior to data output using the --noduplicates option. However, I can't find an equivalent for data that has already been outputted by the SOLiD system.

There are multiple programs available for filtering out low-quality reads. That's not what I need.

A few lines of your favorite programming language should be able to do it. Lexicographically sort by sequence and remove duplicates.

**drio** · 06-07-2010, 05:19 PM

Originally posted by nilshomer View Post

A few lines of your favorite programming language should be able to do it. Lexicographically sort by sequence and remove duplicates.

Something like this: http://github.com/drio/dups.fasta.qual

**Bueller_007** · 06-25-2010, 09:58 AM

Thanks. I didn't get email notifications that people had replied to my post, so I didn't find these until just now.

For what it's worth, I believe that FASTX_collapser ( http://hannonlab.cshl.edu/fastx_toolkit/ ) can also do this, with the caveat that your .csfasta and _QV.qual have to be merged into a .fastq first (with the .csfasta double-encoded) if you also want to remove the duplicates from your _QV.qual file.

**Chipper** · 06-26-2010, 02:57 PM

Wouldn't removing all identical reads result in enrichment of reads with errorrs? Perhaps filterting on the first part and allowing some duplicates would work better.

**Bueller_007** · 06-26-2010, 03:07 PM

Originally posted by Chipper View Post

Wouldn't removing all identical reads result in enrichment of reads with errorrs? Perhaps filterting on the first part and allowing some duplicates would work better.

Probably true. That's why it's better to remove duplicates after alignment/assembly. Unfortunately, I'm feeding the end-product to CLC Genomics Workbench and they don't have duplicate removal yet. The dupes are messing up my SNP discovery pretty badly.

I'd turn on a maximum coverage limit, but since it's a transcriptome, the coverage varies with expression level, so I'm hesitant to omit highly covered regions. I've tried exporting to BAM, removing dupes with Picard and importing back in, but the reimport didn't work for whatever reason.

Topics	Statistics	Last Post
Mechanical Forces in DNA Transcription Uncovered by Clemson Researchers by seqadmin Started by seqadmin, 10-02-2024, 04:51 AM	0 responses 13 views 0 likes	Last Post by seqadmin 10-02-2024, 04:51 AM
New Epigenetic Clock Links Cheek Cells to Mortality Risk by seqadmin Started by seqadmin, 10-01-2024, 07:10 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-01-2024, 07:10 AM
AI-Powered Blood Test Shows Promise for Early Ovarian Cancer Detection by seqadmin Started by seqadmin, 09-30-2024, 08:33 AM	0 responses 25 views 0 likes	Last Post by seqadmin 09-30-2024, 08:33 AM
Stem Cell Research Suggests Human Cells May Enter Developmental Pause by seqadmin Started by seqadmin, 09-26-2024, 12:57 PM	0 responses 18 views 0 likes	Last Post by seqadmin 09-26-2024, 12:57 PM

Seqanswers Leaderboard Ad

Announcement

Removing duplicate reads from multigig .csfasta

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News