Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FastQC analysis

    Hello,
    Can someone help me understand my FastQC analysis?
    The questions I am having are:
    Do I need to cut my index primers?
    Do per base sequence content and per base GC content graphs tell me that there is something wrong with my samples?
    Also I don't understand what could be the cause of 10+ duplication level?!
    Last thing I don't understand the Kmer graph. I watched a video that one could easily figure out the adapters used for the sequencing but my Kmer graph is so confusing I cannot understand anything!

    Your help would be very much appreciated.
    Parham
    Attached Files

  • #2
    Your data looks OK in general, apart from the duplication level.

    Have a look at Simon Andrews' explanation of the duplication level plot:


    What type of experiment is your data from?

    It is quite common to get funny values for the first few bases in the Per base sequence content plot for RNA-Seq experiments. This is thought to be due to the random priming step not actually being quite so random.

    You can figure out how many of your reads contain adapters by using grep.

    Comment


    • #3
      Thanks mastal for reviewing my data. The Simon Andrew's explanation was very helpful to read.
      My experiment is RNA-seq, and I am trying to build a transcriptome. I am new to this field and very confused with many steps.
      Regarding grep, I found adapters in middle of my reads not the beginning. Is that how it usually should be? FYI I don't see Indexes at beginning of my reads. Is that correct?

      Thanks again!

      Comment


      • #4
        In case of illumina sequencing the tag is read as a separate "read" and is not part of the actual sequence. Tag reads are taken into account when the illumina pipeline demultiplexes data (tag sequence will be added at the end of the sequence read ID if illumina CASAVA pipeline was used for demultiplexing) ref: http://en.wikipedia.org/wiki/FASTQ_f...ce_identifiers.

        You should not be seeing adapters in the middle of your reads. Are you sure they are real and not "grep" artifacts?

        Comment


        • #5
          Most Illumina adapters should be towards the 3' ends of the reads.

          However, there are always some reads where the insert is very short, or where there is no insert at all, so you start reading into the adapter sequence at an earlier point in the read, like at the 5' end of the read in the cases where you have only adapter or adapter dimers and no insert.

          Comment


          • #6
            Did your protocol use random hexamer priming? That could explain the per base sequence content and kmers (I have seen a similar thing with RNA-seq, but it showed 6-mers not 5-mers). If you need to do a de-novo assembly I would trim the start of the reads, maybe the first 8 nt. As for duplication level, you expect that sort of result with RNA-seq because some genes are in much higher copy number than others making it much more likely to get reads that are identical.

            Comment


            • #7
              No I think I made a mistake about that. Thanks for concidering GenoMax!

              Comment


              • #8
                Jeremy I used random hexamer priming! Could you explain why it should show 6-mers not 5-mers? As a test I trimmed the seqs 5N and 10N and I am attaching their FastQC result. Can you have a look on them please and tell me what you think?
                Attached Files

                Comment


                • #9
                  By default FastQC shows 5-mers. You can change the k-mer size to anything between 2 and 10 using the -k flag. But you may need to give it more memory if you use a larger k-mer size.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 05-10-2024, 06:35 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-09-2024, 02:46 PM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-07-2024, 06:57 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-06-2024, 07:17 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X