Unconfigured Ad

**Endiel** · 09-20-2012, 07:02 AM

Just wanted to update, I think I found my own answer after digging a bit more on the ncbi website.

I dug into the Homo Sapien reads project, and found a read sequence with about 60M spots:

Rrp 24h - SRA - NCBI

http://www.ncbi.nlm.nih.gov/sra/DRX000538

The sample is described as:

"BromoUridine-labeled HeLa TO ARE cells transfected with RRP46-1 siRNA were incubated for 24 hours in the absence of BrU (chasing), followed by isolation of BrU-RNA by anti-BrU antibody. Purified BrU-RNAs were analyzed."

"Species: Homo Sapien"

So at least now I know I'm doing something relevant. Here's the download SRA:

ftp://ftp-trace.ncbi.nlm.nih.gov/sra.../DRR000886.sra

Is 60M spots a pretty high number? For my study, the more the better. If anybody knows of some that are much larger than this, please let me know.

Thanks.

**srasdk** · 09-21-2012, 07:31 PM

NA12878 is the most studied individual human genome so you may be able to cross-check your results if you need.
Look for "Whole genome sequencing" to get anticipated even coverage (not so in real life). Look also for HiSeq 2000 instrument to get very large number of reads.
As an example, check this experiment out (I did not spend much time researching it):

SRA - NCBI

http://www.ncbi.nlm.nih.gov/sra/ERX069505

It has 4 runs, and 170Gbases, which is 60x+ of human genome. This is as high as it gets now in sequencing of non-cancer human genomes.

**srasdk** · 09-21-2012, 07:36 PM

Even bigger one by Broad (MIT and Harvard):

Illumina whole genome shotgun sequencing of genomic DNA paired-end li... - SRA - NCBI

http://www.ncbi.nlm.nih.gov/sra/SRX176687

...
And it contains Broad's alignments, so you can compare the results.

**Endiel** · 09-23-2012, 12:51 PM

Originally posted by srasdk View Post

NA12878 is the most studied individual human genome so you may be able to cross-check your results if you need.
Look for "Whole genome sequencing" to get anticipated even coverage (not so in real life). Look also for HiSeq 2000 instrument to get very large number of reads.
As an example, check this experiment out (I did not spend much time researching it):

SRA - NCBI

http://www.ncbi.nlm.nih.gov/sra/ERX069505

It has 4 runs, and 170Gbases, which is 60x+ of human genome. This is as high as it gets now in sequencing of non-cancer human genomes.

sradsk-- this is absolutely perfect and what I was looking for. I will post back what my numbers look like once kick off a bunch of bowtie runs.

Question-- and you may or may not know how to answer this-- but according to the ncbi site for this project, these are paired end reads... which means that I think when I run fastq-dump I have to specify "--split-file" so that it creates two paired complement fastq files to give to bowtie. Does that sound right?

My second question is how best to run bowtie with this data. This is how I was planning on running it to get maximum coverage and alignment:

bowtie -t --best --tryhard --chunkmbs 2048 -X 1000 -p 8 genome -1 file1.fastq -2 file2.fastq genome.map

Should I specify a -S and have it output SAM data? Sorry if these are dumb questions-- I've read the documentation and searched extensively for examples but it's good to have a sanity check.

Thanks again for your help.

**Endiel** · 10-09-2012, 07:32 AM

Just wanted to update with my progress. Got some pretty good results.

403 Forbidden

http://www.mkei.org/bowtie-profiling

**Chipper** · 10-09-2012, 12:12 PM

Thanks for the report. For 100 bp reads you are likely to get better results with bowtie2. If you have access to a decent graphics card you could test the soap3-dp aligner as well, it's 10x faster than bowtie2 on a GTX580:

404 Not Found

http://www.cs.hku.hk/2bwt-tools/soap3-dp/

**Endiel** · 10-09-2012, 12:15 PM

Chipper-- good to know. I will check it out.

As far as the soap3-dp aligner, I'm curious how well the memory transfer to/from the card with two 69 GB paired-end read files would work. I will look into it though. Thanks!

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Profiling Bowtie Performance: Good Reference Dataset?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News