Announcement

Collapse
No announcement yet.

HTseq to DeSeq/EdgeR to Heatmap

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by dpryan View Post
    I think cutadapt can do quality trimming to, but if not then trim_galore can do both (it's a wrapper around cutadapt, in fact). There's also trimmomatic, which is relatively popular. I haven't used ion proton reads myself so I can't make any specific recommendations there.
    Thank you. Sorry what I mean is that is same adatptor I need to remove "GGCCAAGGCG " like I did in Proton data?

    Besides, I think cutadaptor could to Qualitytrimming, pls see help

    Additional modifications to the reads:
    -q CUTOFF, --quality-cutoff=CUTOFF
    Trim low-quality ends from reads before adapter
    removal. The algorithm is the same as the one used by
    BWA (Subtract CUTOFF from all qualities; compute
    partial sums from all indices to the end of the
    sequence; cut sequence at the index at which the sum
    is minimal) (default: 0)
    --quality-base=QUALITY_BASE
    Assume that quality values are encoded as
    ascii(quality + QUALITY_BASE). The default (33) is
    usually correct, except for reads produced by some
    versions of the Illumina pipeline, where this should
    be set to 64. (default: 33)
    -x PREFIX, --prefix=PREFIX
    Add this prefix to read names
    -y SUFFIX, --suffix=SUFFIX
    Add this suffix to read names
    --strip-suffix=STRIP_SUFFIX
    Remove this suffix from read names if present. Can be
    given multiple times.
    -c, --colorspace Colorspace mode: Also trim the color that is adjacent
    to the found adapter.
    -d, --double-encode
    When in color space, double-encode colors (map
    0,1,2,3,4 to A,C,G,T,N).
    -t, --trim-primer When in color space, trim primer base and the first
    color (which is the transition to the first
    nucleotide)
    --strip-f3 For color space: Strip the _F3 suffix of read names
    --maq, --bwa MAQ- and BWA-compatible color space output. This
    enables -c, -d, -t, --strip-f3, -y '/1' and -z.
    --length-tag=TAG Search for TAG followed by a decimal number in the
    name of the read (description/comment field of the
    FASTA or FASTQ file). Replace the decimal number with
    the correct length of the trimmed read. For example,
    use --length-tag 'length=' to correct fields like
    'length=123'.
    -z, --zero-cap Change negative quality values to zero (workaround to
    avoid segmentation faults in old BWA versions)


    So is that -q set to 5?

    Comment


    • #47
      Ah no, the adapter is different. The quality thresholding works the same though.

      Comment


      • #48
        Originally posted by dpryan View Post
        Ah no, the adapter is different. The quality thresholding works the same though.
        But which adapter do I need to remove?
        Sorry I haven't got the raw data from Illumina.
        How could I find it?
        The adapter of proton's data 'GC....' is recommended from Ion Community.

        Comment


        • #49
          Usually it's something like AGATCGGAAGAGC, which is the invariant part (illumina uses Y-shaped adapters). You can always ask whomever is doing the sequencing for you if this leads to problems. BTW, you should also run fastQC on things after trimming as that'll tell you if you missed something obvious or trimmed off the wrong thing.

          Comment


          • #50
            Originally posted by dpryan View Post
            Usually it's something like AGATCGGAAGAGC, which is the invariant part (illumina uses Y-shaped adapters). You can always ask whomever is doing the sequencing for you if this leads to problems. BTW, you should also run fastQC on things after trimming as that'll tell you if you missed something obvious or trimmed off the wrong thing.
            Thank you. I also have found it on trim_galore --help.
            For you said (I trim off adapters and bases with a phred score of 5 or below)
            Is this command line work fine?
            trim_galore --length 100 --quality 20 --stringency 5 SampleSeq1.fastq

            Comment


            • #51
              You'll want something like:

              Code:
              trim_galore --length 20 --quality 5 -s 5 sample.fastq

              Comment


              • #52
                Originally posted by dpryan View Post
                You'll want something like:

                Code:
                trim_galore --length 20 --quality 5 -s 5 sample.fastq
                Thank you! That is so-called 'gentle trimmed'

                Comment


                • #53
                  Originally posted by dpryan View Post
                  Usually it's something like AGATCGGAAGAGC, which is the invariant part (illumina uses Y-shaped adapters). You can always ask whomever is doing the sequencing for you if this leads to problems. BTW, you should also run fastQC on things after trimming as that'll tell you if you missed something obvious or trimmed off the wrong thing.

                  If I find my mapping rate is by Tophat is ~80-85%, is it high, normal, or low?
                  In the Ion-proton data, the Community recommended me to follow that procedure (the Tophat mapping rate is ~50% and if we use bowtie2 to mapping the unmapped reads and merge , the rate will be increased to ~90%)
                  Do I need to re-aligned the unmapped reads and merged them together in Illumina as well? Or just leave it and go to Cufflinks and DE analysis?

                  Comment


                  • #54
                    Depending on the species that's not unreasonable. You might also try STAR, though that requires more memory. Tophat2 can use bowtie2 already, so just don't give it the --no-mixed option and it will try to map unmapped paired-end reads as single-end for you.

                    Comment


                    • #55
                      Originally posted by dpryan View Post
                      Depending on the species that's not unreasonable. You might also try STAR, though that requires more memory. Tophat2 can use bowtie2 already, so just don't give it the --no-mixed option and it will try to map unmapped paired-end reads as single-end for you.
                      Hi I am talking about Bovine cell and sing-ended reads,my mapping rate is by Tophat is ~80-85%.
                      Thank you!
                      MY question is the mapping the unmapped reads is meaningful or essential in mapping?
                      Last edited by super0925; 03-26-2014, 01:32 PM.

                      Comment


                      • #56
                        Ah, then you're unlikely to gain much by remapping unmapped reads with bowtie2 (while it does allow more mismatches by default, if you're getting up to 85% alignment already then we're looking at seriously diminishing returns). You could try on one sample and see how much of a difference it makes.

                        Comment


                        • #57
                          Originally posted by dpryan View Post
                          Ah, then you're unlikely to gain much by remapping unmapped reads with bowtie2 (while it does allow more mismatches by default, if you're getting up to 85% alignment already then we're looking at seriously diminishing returns). You could try on one sample and see how much of a difference it makes.
                          Ok probably I will leave the 85%.
                          i think it is high enough

                          Comment


                          • #58
                            Don't let the perfect be the enemy of the "good-enough, let's finish analysing the data"

                            Comment


                            • #59
                              Originally posted by dpryan View Post
                              Don't let the perfect be the enemy of the "good-enough, let's finish analysing the data"
                              Have you used STAR before? Or is it better than Tophat? I read some blogs talking about the mapping rate of STAR is higher than Tophat. But I don't know is it (higher or not) is really key to downstream analysis.

                              Comment


                              • #60
                                Have a look at my answer over on biostars to a similar question. That should tell you most of what you want to know (particularly given the included links and replies from others).

                                Comment

                                Working...
                                X