Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyzing PacBio sequel data and SRA

    I am new to NCBI's SRA and pacbio sequencing data analysis. I am trying to download pacbio sequence data from SRA and my goal is get CCS (Consensus) reads [fastq/a format is best]. The data is generated from PacBio's sequel instrument. What I read so far relates to the data analysis methods from their previous instrument (RS II). I referred to online blogs and Biostars posts. My questions are related to both SRA and pacbio sequence data:

    1. What do runs mean in SRA (for example HG00733) when filtered for (Source: DNA, Platform: PacBioSMRT) on right yields 7 experiments. Some of them have multiple runs for an experiment. For example, this is a WGS on Sequel II and has 7 runs. The only difference I found between these runs is different library name but for other experiments with multiple runs I didn't find anything different between runs.

    2. I am particularly interested in this experiment (https://www.ncbi.nlm.nih.gov/sra/SRX4480530[accn]). I want to get consensus reads for this experiment. What I read from other posts say it is easy to start with raw data or subreads files. I went to the 'Data access' page for this experiment's run but there are multiple subreads.bam files in the 'Original format' (https://trace.ncbi.nlm.nih.gov/Trace...run=SRR7615963). My question is what does it mean when there are multiple subreads.bam files? And how do I get consensus reads from multiple bam files? I read about SMRTlink binary software and Canu to process subreads.bam to get consensus. Which tool should I use here? Or is there a direct way to get consensus reads from SRA?

    3. For the experiment mentioned above (SRX4480530), the original format bam files say subreads.bam but for other experiments (such as https://trace.ncbi.nlm.nih.gov/Trace...run=ERR3822935) the bam file doesn't say subreads in its filename. In such case what file is it?

  • #2
    Cross-posted for reference: https://www.biostars.org/p/476158/

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Best Practices for Single-Cell Sequencing Analysis
      by seqadmin



      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
      Today, 07:15 AM
    • seqadmin
      Latest Developments in Precision Medicine
      by seqadmin



      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      05-24-2024, 01:16 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 06-03-2024, 06:55 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-30-2024, 03:16 PM
    0 responses
    27 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-29-2024, 01:32 PM
    0 responses
    29 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-24-2024, 07:15 AM
    0 responses
    216 views
    0 likes
    Last Post seqadmin  
    Working...
    X