Unconfigured Ad

**maasha** · 11-15-2010, 05:48 AM

My guess it that its a plain tab separated table with sequence and copy count.

You can use Biopieces (www.biopieces.org) to convert this table into something useful - like FASTA format:

Code:

read_tab -i test.tab -k SEQ,COUNT | add_ident -k SEQ_NAME | merge_vals -k SEQ_NAME,COUNT | write_fasta -x
>ID00000000_1
AAAACTGGTTCCAGAAGTTGAGAC
>ID00000001_1
AAAACTGGTTCTGGCAGGTAG
>ID00000002_5
AAAACTGGTTGGGCTTAAAACTGC
>ID00000003_2
AAAACTGGTTGTAAACGGAGGAGC
>ID00000004_2
AAAACTGGTTTTAGATGGATAGAA
>ID00000005_1
AAAACTGGTTTTGCACTATTGGGC
>ID00000006_1
AAAACTGTAAAACAGGTGGTT

You can also use Biopieces for mapping with bowtie, BWA, BLAST, etc ...

Cheers,

Martin

**fkrueger** · 11-15-2010, 05:52 AM

This format seems to have been processed with an adapter trimming program which results in varying sequence lengths.

I assume
"AAAACTGGTTGGGCTTAAAACTGC 5"

needs to be interpreted as
the sequence: "AAAACTGGTTGGGCTTAAAACTGC" was present exactly "5" times.

What we have done with formats like this is transform it to FastA format like this:

>1
AAAACTGGTTGGGCTTAAAACTGC
>2
AAAACTGGTTGGGCTTAAAACTGC
>3
AAAACTGGTTGGGCTTAAAACTGC
>4
AAAACTGGTTGGGCTTAAAACTGC
>5
AAAACTGGTTGGGCTTAAAACTGC

(to reflect the quantative aspect)

and then map it to a genome using Bowtie or something similar.

Good luck!

edit: doh I was late!

**vebaev** · 11-15-2010, 05:53 AM

Thanks!,

for the fasta I will managed to convert it, I was wandering if bowtie can get the small reads for the mapping in this fasta than..., because I saw all the times the input is fastq

**fkrueger** · 11-15-2010, 06:14 AM

yes, just specify

bowtie -f sequence_file.fa > output.txt

This is taken from the Bowtie manual:

-f The query input files (specified either as <m1> and <m2>, or as <s>) are FASTA files (usually having extension .fa, .mfa, .fna or similar). All quality values are assumed to be 40 on the Phred quality scale

**Torst** · 11-15-2010, 10:56 PM

Originally posted by vebaev View Post

AAAACTGGTTCCAGAAGTTGAGAC 1
AAAACTGGTTCTGGCAGGTAG 1
AAAACTGGTTGGGCTTAAAACTGC 5
AAAACTGGTTGTAAACGGAGGAGC 2
AAAACTGGTTTTAGATGGATAGAA 2
AAAACTGGTTTTGCACTATTGGGC 1
AAAACTGTAAAACAGGTGGTT 1

It looks like someone has SORTED the reads, and COUNTED their frequency of occurrence.

% grep -v '^>' reads.fasta | sort | uniq -c > vebaev.out

Topics	Statistics	Last Post
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 32 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM

Unconfigured Ad

what is this format?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News