Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GenoMax
    replied
    Look at it this way. If there is a problem with the sample/library itself (at this point if the qualities are good then there is likely no technical issue with sequencing) you would not be able to do much short of redoing the experiment over.

    Why not press ahead and give the de novo assembly a try. It may fail and you would be out of some compute cycles/time. Since it is a mammalian genome it is probably large(ish) so you are going to have to deal with a number of other computational challenges. Do you have enough sequence (theoretically) with adequate depth (10-15x or more) for the assembly tests?

    Leave a comment:


  • Fernando Seixas
    replied
    Is for denovo assembly. I understand what you said about the limits of the FastQC not being universally applicable but even though I should worry about the GC content and k-mer plot, no?

    An no, I'm still stuck in this part because I don't fell confident enough to go to the next steps.

    Thanks!

    Leave a comment:


  • GenoMax
    replied
    Originally posted by Fernando Seixas View Post
    But, is it really normal to have that slight fluctuations in the first 10 bp?
    Yes. Here is a "good" sample example report posted on the FastQC site. http://www.bioinformatics.babraham.a...qc_report.html

    But even when I removed pcr duplicate these problems persisted.
    Another thing I forgot to mention is that this is PE data.
    What is the aim of your experiment? Are you trying to do de novo assemblies or is there a closely related genome you can use as a reference?

    As Simon (author of FastQC) had mentioned in some past posts here it is difficult for him to set "limits" for various tests in FastQC that are universally applicable. So having a dataset get a "fail" in one or more categories in FastQC does not automatically mean that there is a problem with the sample.

    Have you tried doing analysis with the QC'ed data? How do those results look?
    Last edited by GenoMax; 01-31-2014, 04:28 AM.

    Leave a comment:


  • Fernando Seixas
    replied
    But, is it really normal to have that slight fluctuations in the first 10 bp? Regarding the GC content, this data is from a mammalian genome. But even when I removed pcr duplicate these problems persisted.
    And yes, the QS are good in the entire reads.

    Another thing I forgot to mention is that this is PE data.

    Leave a comment:


  • GenoMax
    replied
    The first two plots look ok. Is this a "GC" rich organism? Looks like there is some kind of duplication of sequences. Are the qualities acceptable across the entire read?
    Last edited by GenoMax; 01-30-2014, 05:29 PM.

    Leave a comment:


  • Fernando Seixas
    replied
    Thanks for the reply. But one thing I forgot to mention is the kind of data I have. It's whole genome sequencing data from hiseq2000 machine using Truseq library prep. And if I'm not wrong (my eyes are tired of so much reading xD), all the explanations I found for those behaviours I mentioned above refer to RNA-seq data, at least for the first 10 bp base content instability..

    FastQC images of the problematic parameters are attached.

    For the kmer analysis I attached both 7-mer and 10-mer analysis. I can see a repetitive pattern of 7bp if I allign the k-mers (CCTGGCTCCTGGCT) so looked for all possible 7bp sequences inside this pattern but still couldn't associate any of these to adapters/primers.

    Thanks!
    Attached Files

    Leave a comment:


  • GenoMax
    replied
    There are several posts here that cover illumina sequencing and FastQC. Search for "fastqc duplication".

    If one of the posts does not answer your question then can you post example plots?
    Last edited by GenoMax; 01-30-2014, 04:36 PM.

    Leave a comment:


  • Fernando Seixas
    replied
    Hi all,

    Saw the 1st post of this thread and realized that I see exactly the same patterns described in point 1_NB2 - increased 5-mer representation in the first 10 base pairs, and GC fluctuations in those first 10bps as well (although very slight; and the same happens in the per base sequence content). Even after adapter trimming with cutadapt at both 5' and 3' ends and quality trimming (on Trimmomatic) these 'problems' persist. Any ideas of what might be causing this?

    Also, and I don't know if this relates with the previous question, the per sequence GC content hasn't an exactly normal distribution - there's a slight bump at the right part of the distribution.

    Thanks!

    Leave a comment:


  • blanco
    replied
    Thanks for your quick reply fkrueger - this looks to be something really useful. I have already asked one question in the appropriate thread: http://seqanswers.com/forums/showthr...ht=trim+galore

    Leave a comment:


  • fkrueger
    replied
    Originally posted by blanco View Post
    I am also interested to know the answer to some of these questions.

    Perhaps to put it more simply: When trimming paired end reads, should the cutadapt command be exactly the same for both forward and reverse reads?
    Using the same command on both reads will most likely cause your paired-end files to go out of sync. We have written a small solution that calls Cutadapt with (what we think) sensible parameters (Trim Galore, available here); in it's default setting , e.g. trim_galore --paired file1.fq file 2.fq, it will trim Illumina adapters from both reads, quality trim reads to a Phred score of 20 and handle paired-end files as you would expect.

    Leave a comment:


  • blanco
    replied
    I am also interested to know the answer to some of these questions.

    Perhaps to put it more simply: When trimming paired end reads, should the cutadapt command be exactly the same for both forward and reverse reads?

    Leave a comment:


  • SEQnovice
    replied
    My apologies, this is my first post here! Thanks for the tip, and if you do have any feedback I would appreciate it though!

    Leave a comment:


  • ECO
    replied
    Originally posted by SEQnovice View Post
    My questions have not been answered. Could someone kindly reply to some of them or at least direct me to the proper threads where this may have been discussed? I am new to this field and any feedback would be much appreciated!
    Thank you,
    SEQNovice
    Patience, and searching. Please give your question more than 20 hours before bumping it.

    Leave a comment:


  • SEQnovice
    replied
    My questions have not been answered. Could someone kindly reply to some of them or at least direct me to the proper threads where this may have been discussed? I am new to this field and any feedback would be much appreciated!
    Thank you,
    SEQNovice

    Leave a comment:


  • SEQnovice
    started a topic Confusion regarding Illumina Adapter Trimming!

    Confusion regarding Illumina Adapter Trimming!

    Dear Experts,
    Please accept my apologies if this has been posted elsewhere. I am new to the analysis of RNA-seq data, and I am confused regarding trimming of my adapters from the FASTQ files using cutadapt. I have read through some of the posts but they have gotten me more confused!
    The details of my RNA-seq data are as follows:

    - The platform is Illumina, TruSeq
    - The FASTQ files are pair-ended (so I have an R1.fastq and R2.fastq for each of my samples). It is unknown which of the R1 and R2 represent the 'forward' or 'reverse' reads.
    - The files have been demultiplexed, so I have a barcode per sample which matches a specific barcode in a corresponding indexed adapter.
    - I have been provided with a Universal adapter and 5'-3' indexed adapters. I have checked the indexed adapters and they are all exactly identical except at the 6bp barcode in the middle of the sequence.

    Please kindly help me with the following:

    1. I am still trying to understand how Illumina TruSeq works but on principle, should the trimming be done at the 3' only, or also at the 5' end of the read? Or is it that only the Universal Adapter should be trimmed at the 5', and the indexed adapters at the 3'?

    NB1: Read length in 101bp as observed in FastQC. This was expected in the experimental setup but makes me wonder if I have any adapters to begin with.
    NB2: I have used FastQC to look at a sample of my data (around 198,000 seqs), I didn't find any overrpresented sequences but I did find increased 5-mer representation in the first 10 base pairs of my pairs (which I am assuming to be the 5' end?). There are also more GC fluctuations in those first 10bps as well.

    2. What is the minimum overlap that is effective to consitute a 'match' between the adapter and the read? Cutadapt has a default value of 3...but wouldn't that necessarily promote 'false matching' as well and lead to culling of sequences that don't have the adapter? I am considering a higher cutoff for the overlap, say 5bp, given the k-mer overrepresentations observed in FastQC.

    3. When providing the adapter sequences, seeing that the indexed adapters only differ at the barcode, is it still prudent to provide the entire sequence of the indexed adapters, in addition to entire sequence of the universal adapter? What is the bare minimum sequence people have provided for their adapters, both indexed and universal? Does it make a difference?

    4. I am assuming that the same indexed 5'-3' adapter is provided when trimming from both the R1 and R2 reads. I have not attempted to trim the reverse complement or the reversed sequence from either R1 or R2. If I am mistaken in this approach please correct me!

    My apologies for the multiple questions. Thank you in advance for your help with this!
    Much obliged!
    SEQNovice
    Last edited by SEQnovice; 11-29-2012, 11:02 AM.

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-11-2024, 06:55 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
110 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
114 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
1 response
121 views
0 likes
Last Post EmiTom
by EmiTom
 
Working...
X