Seqanswers Leaderboard Ad

**Richard Finney** · 08-12-2014, 08:44 AM

Are you trying to match a specific sequence ?

samtools view x.bam | grep TTTTCTGCCTGTTGGGCTGGAG | awk '{print $10}' | uniq -c

All reads with same sequence as another read, try this ...

samtools view x.bam | awk '{print $10}' | sort --buffer-size=20G | uniq -c | awk '{if ($1!=1) print $0}'

where x.bam is you bam file.
fine tune the --buffer-size parameter to sort

**swbarnes2** · 08-12-2014, 10:12 AM

It would take longer, but swap the second awk in Richard's command line with

Code:

sort -nr

To get the most common reads in order, starting with the most abundant. Tack a

Code:

| head -n 100

to get only the top 100

**Schelarina** · 08-20-2014, 08:11 AM

thank you! it worked!
can I do the same from a fastq file?

**Richard Finney** · 08-20-2014, 08:47 AM

Yes, you might want to use awk to only print every 4th line.
Note 1) the "mod" operator and 2) "line count" intrinsic variable in awk.
Perl/python/C/java if you prefer can address the issue of filtering for only the sequence also.

**Schelarina** · 08-24-2014, 06:32 AM

After I extracted the identical reads in a single sequence from the bam file I aligned them again to the genome. When I use igv to visualize the alignment now all the sequences are mapping in sense orientation.. even those sequences that are supposed to be antisense to the genome are shown in sense orientation. Why is that?

**Richard Finney** · 08-24-2014, 01:32 PM

The sequence for the read in a bam files may be reverse complemented to align to the reference.
Reads are supposed to be properly noted as reversed in the bitwise flags field in a line/entry of sam/bam.

You may wish to interrogate this flag for special processing.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

extracting identical reads from bam file

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News