Header Leaderboard Ad

Collapse

Inexperienced Question

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cambridge101
    replied
    Originally posted by GenoMax View Post
    Look for studies that have > 10 samples (since you need 10) or take 10 samples from different cancer types.

    http://sra.dnanexus.com/?result_type...q=tumor+exome+

    http://sra.dnanexus.com/?result_type...q=cancer+exome
    Thank you. I think I want to get the same region from 10 different people.

    Access an online cancer related DNA database resource and select ten (10) DNA sequence strings of length at least 1Mb related to a cancer gene from ten different individuals. Make sure the sequence data is in FASTQ format and stored in one file “DNA.fas”.

    Leave a comment:


  • Richard Finney
    replied
    The comments here have links to sequences for PUBLIC human cancers ...

    http://www.homolog.us/blogs/blog/201...able-from-bgi/

    BGI liver cancer
    Seoul Genomic Medicine Institute lung cancer
    Changhai Hospital prostate cancer
    MD Andersen Asian Gastric cancer

    I think the data is in NCBI's SRA

    You'll need a lot of disk space and, if you're relatively new, a lot of patience.

    Sadly, a "bam slicer" that cuts out the reads for a region isn't available; though they say some folks are working on it.

    Leave a comment:


  • GenoMax
    replied
    Look for studies that have > 10 samples (since you need 10) or take 10 samples from different cancer types.

    http://sra.dnanexus.com/?result_type...q=tumor+exome+

    http://sra.dnanexus.com/?result_type...q=cancer+exome
    Last edited by GenoMax; 12-20-2014, 02:07 PM.

    Leave a comment:


  • cambridge101
    replied
    Alright... Let's say my oncogene of interest is in the region of 11:15000000..16000000. Therefore, all I need is that region from 10 different people.

    Problem:
    Where do I find that data???

    Any assistance is appreciated.

    Leave a comment:


  • GenoMax
    replied
    I like SNPsaurus' interpretation. 1 Mb (total amount of data or length of region covered) worth of fastq reads in and/or around a cancer gene makes sense.

    @SNPsaurus: Main input for the assignment is still in post #6. The rest of the assignment was informatics goals.

    This will entail a significant amount of work (data collection part) and I hope the assignment has an appropriate amount of credit (unless it is a PhD qualifier exam).

    Leave a comment:


  • SNPsaurus
    replied
    I only vaguely remember the details of the project from your deleted post, but if I were to guess what a reasonable assignment would be, it would be to select a cancer gene, then extract 1 Mb of sequence around the gene from ten different individual genomes, then analyze those 1 Mb regions for the various things asked for in the post.

    You aren't going to find 1 Mb fastq reads, but you can find different individual genomes, or even different "cancer" genomes. You can definitely identify genes related to cancer. As others have said, I'd check back with the assigner of this project for clarification.

    edit: I teach an upper level course in genomic methods and analysis, so am definitely curious what this assignment is about!

    Leave a comment:


  • cambridge101
    replied
    Originally posted by GenoMax View Post
    One place to look for long sequences (in fastq format) will be here: http://www.pacificbiosciences.com/ne.../publications/ I see only a couple of cancer related publication and they are from 2012 (when the reads were not as long as they are today).

    If you drop the cancer requirement then you will get some really long reads here: http://blog.pacificbiosciences.com/2...d-shotgun.html They are not going to be 1Mb so you would still need to do some assembly.
    Thanks! I'm not sure I can drop the cancer requirement. I checked: http://www.pacificbiosciences.com/ne.../publications/ I couldn't locate the data I need.

    I'm open to other suggestions if anyone has any.

    Leave a comment:


  • GenoMax
    replied
    One place to look for long sequences (in fastq format) will be here: http://www.pacificbiosciences.com/ne.../publications/ I see only a couple of cancer related publication and they are from 2012 (when the reads were not as long as they are today).

    If you drop the cancer requirement then you will get some really long reads here: http://blog.pacificbiosciences.com/2...d-shotgun.html They are not going to be 1Mb so you would still need to do some assembly.

    Leave a comment:


  • cambridge101
    replied
    Thanks for your suggestion. I don't think I'll delete it[Did decide to delete]. I don't feel that I'm doing anything unethical. I hope that it's clear that I'm not asking for anyone to complete the project for me. I'm just having a hard time finding that FASTQ data I need.
    Last edited by cambridge101; 12-19-2014, 08:10 AM. Reason: Inconsistant with edit to earlier post.

    Leave a comment:


  • GenoMax
    replied
    I think you should delete the description from the post above. This appears to be a class project/home work?

    Perhaps you should ask whoever assigned the project if they are certain about the fastq requirement.

    Leave a comment:


  • cambridge101
    replied
    Though I don't feel I was doing anything unethical, others may disagree. Therefore, I deleted the complete requirements to avoid any conflict. Thank you.
    Last edited by cambridge101; 12-19-2014, 08:08 AM. Reason: Information not necessary.

    Leave a comment:


  • cambridge101
    replied
    GenoMax - Thanks again for your suggestions and assistance. I'm just a little reluctant to say more about this project right now. I hope you understand.

    Leave a comment:


  • GenoMax
    replied
    Tell us more about what the entire project is about. DNA sequence is just a part of it?

    Leave a comment:


  • cambridge101
    replied
    I have a few days to work on this project. I'm lost.

    Leave a comment:


  • GenoMax
    replied
    Now you say

    You are not going to find DNA sequences in fastq format that are 1Mb in length unless you assemble them yourself preserving the Q-scores (assembly programs do not include Q-scores in final sequence).

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
    by seqadmin


    ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

    01-24-2023, 01:19 PM
  • seqadmin
    Introduction to Single-Cell Sequencing
    by seqadmin
    Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

    The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
    ...
    01-09-2023, 03:10 PM

ad_right_rmr

Collapse
Working...
X