Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Steps prior and during de novo assembly (clcbio)

    Dear all,

    I have some questions considering steps prior a de novo assembly. I have normalized cDNA Miseq (pair end) data from two marine nematode species (no reference genome available of any marine nematodes) which I want to assembly to create a transcriptome. The sequencing company has done some things for me already:

    1. Quality trimming: We trim low quality ends (< Q20) with FastX 0.0.13 [1].
    2. Adapter trimming: The adapters are trimmed only at the end (at least 10bp
    overlap and 90% match) with cutadapt 1.2.1 [3].
    3. Quality fltering: Using FastX 0.0.13 and ShortRead 1.16.3, we remove in
    succession small reads (length < 50 bp), polyA-reads (more than 90% of the
    bases equal A), ambiguous reads (containing N), low quality reads (more than
    50% of the bases < Q25) and artifact reads (all but 3 bases in the read equal one
    base type).
    4. Making pairing consistent: Filtering reads may remove one read of a pair and
    make paired fastq-?les inconsistent. In this step we remove reads that belong
    to broken pairs and save them in separate fastq
    5. Removal of contaminants: Using bowtie 2.0.0-beta5, we identify reads that
    align to phixillumina and remove them.

    So here it ends and I step in. I have uploaded my sequences in CLCbio and trimmed the sequences for the cDNA adapters, which were required to amplify my normalized cDNA libraries to increase the amount of cDNA.

    My questions are:
    - Prior to a de novo assembly there is the option to merge pair end reads giving two data sets: one with merged sequences and one without. Is it a good option to merge paired end reads or should the de novo assembly start from the original fastq files? Or should we do both, merging the pair end data and using these merged sequences together with the original data for my de novo assembly?

    - During de novo assembly there is the option of scaffolding. I'm not sure whether this option is good. It indeed will create longer contigs but does it give downstream problems during annotation. I mean: If two genes are in very close proximity (or even on oposite strands) there is a possibility that they will end up in 1 contig. When blasting this contig won't you miss 1 of the 2 genes?

    - How is it possible that when mapping reads back to the transcriptome 10% was not mapped?


    Thanks in advance

  • #2
    Hi, I have exactly the same question, did you find the answer ?

    Comment


    • #3
      hi Jevcampe and rafaelbsvaladares

      Can you suggest me the library construction (e.g. 2K, 500pb) used for your illumina cDNA sequencing.

      As I am newly working on Metatranscriptomics, I have done RNA isolation and enrichment and its cDNA conversion. Now I want to sequence it with illumina Hiseq and for that I need to tell company the library I want to use for sequencing.

      Thanks in advance

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:35 AM
      0 responses
      15 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-09-2024, 02:46 PM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Working...
      X