Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Isa0984
    Member
    • Aug 2014
    • 15

    FastQC

    Hello,
    we've done a RNA-Seq analysis (Illumina HiSeq2000, 50 bp, paired end) and I have checked the quality with FastQC. There raised some questions for me:

    1. Is FastQC as quality check alright for paired end reads?
    2. The program gives for all of my four samples a fail for the per base quality of the second read (only the last three bases show lower quartile less than 5 or a median less than 20). Is there a logical explanation?
    3. The sequencing and mapping was done by a company and they told us they trimmed the adapters. But I get a fail in the dublication level and if you look at the overrepresented sequences I can see only primer or adapter sequences. Did they a bad job?

    Thanks for your help! Isabelle
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    FastQC is appropriate for QC of PE reads.

    It would be better if you post screenshots/images of the FastQC results instead of just descriptions. Having something marked as "fail" does not automatically fail the entire sample. It is possible that the analysis done by your provider may not have removed all adapter dimers etc.

    Comment

    • Isa0984
      Member
      • Aug 2014
      • 15

      #3
      Ok, here are the images of FastQC...
      Attached Files

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Read 2 often has decreased quality at its 3' end. A bit of trimming can easily get rid of that.

        BTW, they likely sent you untrimmed sequences and aligned trimmed sequences, which is why fastQC is telling you that the raw sequences still have adapter contamination.

        Also, a fail on duplication level is pretty much expected for RNAseq data (that test is really only meant for whole-genome sequencing).

        Comment

        • Isa0984
          Member
          • Aug 2014
          • 15

          #5
          Thanks a lot, that looks for me that the quality check makes not really sense then, its more or less good for the per base quality... ?

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Yeah, just do a bit of quality/adapter trimming (e.g., with trimmomatic or trim_galore) and you should be good to go.

            Comment

            • Isa0984
              Member
              • Aug 2014
              • 15

              #7
              But can I be shure that the company used trimmed data for mapping? Maybe they didnt, how can I check this?

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #8
                Just look at the read lengths in the BAM file:

                Code:
                samtools view some_file.bam | cut -f 10 | awk '{print length($1)}' | uniq | sort | uniq
                If they trimmed the reads prior to alignment, you should get more than one value.

                Comment

                • Isa0984
                  Member
                  • Aug 2014
                  • 15

                  #9
                  I will, but unfortunately I cant do this from my private computer so I have to wait until I am back at the institute... but many thanks already at this point.

                  Comment

                  • Isa0984
                    Member
                    • Aug 2014
                    • 15

                    #10
                    Hello, its long time ago, but still/again present for me... It was not possible for me to check the data again at the institute with samtools, but shoudn't I see the same (different read sizes) if I look with IGV to my data? That in fact gives me for all reads the same size of 51 bases, which means the campany didn't trimm the data before mapping... am I right? Thanks for your help! Isabelle

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      Yes, it sounds like they didn't trim them then. Scroll through IGV and see if there are any soft-clipped alignments (alignments that appear shorter but where the original sequence is 51). Using an aligner that does soft-clipping alleviates some of the issues surrounding adapter contamination and quality. If, however, they did end-to-end alignment (i.e., there are no soft-clipped alignments) on untrimmed data then I'd say they did a half-ass job.

                      Comment

                      • Isa0984
                        Member
                        • Aug 2014
                        • 15

                        #12
                        Hey, thanks for the fast replay. I found some shorter ones... they did it with the -q option of BWA.
                        When I asked them for the mapping parameters I got following answer:

                        n NUM max #diff (int) or missing prob under 0.02 err rate
                        t:4 (number of threads)
                        M:3 (mismatch penalty)
                        q: (quality threshold for read trimming down to 35bp 0)

                        I am not shure if I understand this 35bp thing, because I can find reads with a length less then 35bp (The 0 is maybe a typing error)?
                        Another question is, how can I get alignments like that (see figure)??? If you have n=0.02, shouldtn there at most 2 mismatches per 50 bp? Isabelle
                        Attached Files

                        Comment

                        • dpryan
                          Devon Ryan
                          • Jul 2011
                          • 3478

                          #13
                          Can't say I'm overly familiar with bwa aln, since most people use bwa mem these days.

                          The -n option has to have one of the more confusing descriptions I've seen. If it's an integer then the explanation is simple. I assume that it uses a poisson distribution with fractional -n, so a value of 0.02 with 50bp reads would correspond to a maximal edit distance of 3 (in R: qpois(0.98, 50*0.02)).

                          The -q option in bwa aln doesn't really specify a minimum read length. It specifies a value used when determining the trim location:

                          The -q value is INT and the quality at position i is q_i. So, this basically sums the penalties and finds the maximum value. The position with the maximum value is where trimming will occur (essentially, obviously if the penalty is <0 then no trimming should occur).

                          Comment

                          • Isa0984
                            Member
                            • Aug 2014
                            • 15

                            #14
                            Ok, I think I got the -q option, its just the information of the company, which is strange, maybe they mean a quality treshold of 35...
                            But the -n value is absolutely confusing... I was reading a lot of threads about this topic, but still. If in my case the maximal edit distance is 3, what does that mean??? Is there any relation to the allowed amount of mismatches?

                            Comment

                            • dpryan
                              Devon Ryan
                              • Jul 2011
                              • 3478

                              #15
                              They are related, yes. "Edit distance" is a generalization of mismatches. If a read aligns with 3 mismatches then its edit distance is 3. However mismatches can't describe things like insertions or deletions. So if your read aligns with an insert of 2 bases then it has an edit distance of 2. If it has a single base mismatch and later a deletion of 3 bases then the edit distance is 4. The wikipedia article on edit distance is quite good. In short, "edit distance" is the minimum number of single character changes (insertion, deletion, or substitution) needed to convert one sequence to another.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              20 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...