FastX tool for removing duplicates

archie.chauhan

Junior Member

Join Date: Nov 2011

Posts: 9
- Share
- Tweet
#1

FastX tool for removing duplicates

05-15-2012, 05:40 AM

Hi,
I have gone through various SeqAns posts regarding duplicate removal but could not get desired answer. Since I am a mol biologist new to bioinformatics i have a few queries.
I am having illumina DNA 2x100 paired end reads. FAstQC analysis indicated a large number of duplicates which seem to be correct. Since the dataset is too big I wanted to remove the duplicates. Therefore, i used Galaxy. I first used Fastq groomer followed by FastX collapse for both R1 and R2 reads separately. My plan of action was : to first remove duplicates, filter and trim my seq and finally assemble them using velvet. As far as I know velvet requires shuffling of the paired end reads prior to assembly. Therefore I have few questions wrt my approach:
1) the fastX collapse tool gives its own headers to the seq. It seems that the paired end information is lost. Am I right OR it just that the headers have changed but the inf is still there. If so where is it?
2) I used R1 and R2 reads separately for grooming and FastX collapse analysis. Should i first shuffle my reads using velvet and than use the FastX collapse tool on the shuffled seq OR
3) I should first join the paired end data and then use FastX tool. But in this case how do i do shuffling with velvet?

I would appreciate if someone can answer the queries.

Regards,
Archana
Tags: None

Previous template Next

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 62 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad