remove duplicate reads 100% sequence identity and genomic coordinates

thiNGS

Member

Join Date: Sep 2014

Posts: 24
- Share
- Tweet
#1

remove duplicate reads 100% sequence identity and genomic coordinates

01-12-2015, 06:59 AM

Hi everyone. I have to analyze some paired end reads coming from a Illumina MiSeq experiment. What I want to do is removing duplicate reads that not only have the same start-end coordinates but also have 100% sequence identity. Is there any tool that can help me do that? I want to work with BAM files not with FastQ files. Thanks!
Tags: None
Brian Bushnell

Super Moderator

Join Date: Jan 2014

Posts: 2709
- Share
- Tweet
#2

01-12-2015, 09:53 AM

If sequences have 100% identity then they should have the same mapping coordinates, so there's no reason to work with bam files in this case. I wrote a program that can do this for fastq, but not for bam:

dedupe.sh in=reads.fq out=deduped.fq ac=f t=1

There should be tools that can do so on bam files by sorting by sequence, but I don't know what they are offhand.
Comment

Previous template Next

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad