read depth analysis on WGS data stored as .sra from ncbi

Oscar.K

Junior Member

Join Date: Jan 2014

Posts: 2
- Share
- Tweet
#1

read depth analysis on WGS data stored as .sra from ncbi

04-02-2014, 04:16 AM

greeting all,

We are currently interested in preforming read depth analysis in order to investigate a possible CNV in a loci on human chr 10.
The data we want to analyze is from WGS of CHM1_1.1 (Hiseq 2000, paired reads), one issue is that the raw data consists of approximately 400gb stored as 35 different .sra files.
what is the procedure we need to follow in order to get to the point where we can preform the analysis?
from what I understand, we need to download the entire repository (since we don't know where the reads covering the region is, or is it possible to find in which of the .sra the reads we're interested in are located and only download those?) and then use the SRA-toolkit to convert the reads to fastq before we align it to a reference.
And then we can use a software of our choice to analyze the aligned data.
Am I correct in my thinking so far, or is there any unnecessary or plain wrong steps somewhere?

very grateful for any advice.
sincerely
//Oscar
Tags: fastq, ncbi, read depth, sra
dpryan

Devon Ryan

Join Date: Jul 2011

Posts: 3478
- Share
- Tweet
#2

04-02-2014, 04:21 AM

The reads are likely randomly distributed throughout all of the sra files, so you'll have to download all of them. Yes, you'll have to use the sra toolkit (specifically fastq-dump) to then convert to fastq and an aligner of your choice for the alignments (I hope you have access to a cluster). Once you have the sorted BAMs you can proceed with whatever method you prefer to look at CNVs.

Note, you might download just one of the SRA files and see if it contains alignments as well as reads. It's not very common, but you might get lucky.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

read depth analysis on WGS data stored as .sra from ncbi

Comment

Latest Articles

ad_right_rmr

News