Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • debjit_ray
    replied
    FASTQC on my small RNA sequences identifies several overrepresented sequences. It might be because of the adapter sequences. I do a trimming for the adapter ('ACTA') using the command
    >fastx_clipper -C -v -i SRR519779.fastq -Q 33 -a ACTA -o SRR519779_trimmed.fastq
    The out put for this is:
    Clipping Adapter: ACTA Min. Length: 5 Clipped reads - discarded. Input: 4484151 reads. Output: 4440775 reads. discarded 0 too-short reads. discarded 0 adapter-only reads. discarded 0 clipped reads. discarded 43376 N reads.

    Seems there is no effect of this trimming, the FASTQC shows similar results on the trimmed sequence.
    Can the adpator be just 4 nucleotides? Am I doing something wrong? Please suggest.

    Leave a comment:


  • relipmoc
    replied
    Hi Mark,

    Recently I've written a software tool named skewer which is dedicated to the adapter trimming task of Illumina paired-end reads. It's very easy to use. I've compared the result of skewer and that of fastq-mcf. The overall gained uniquely-mapped read-pairs of skewer is higher than that of fastq-mcf in my case.

    Below is the related statistics of using fastq-mcf and skewer:
    run 1 (using a popular adapter trimmer fastq-mcf):
    70252244 reads; of these:
    70252244 (100.00%) were paired; of these:
    66639175 (94.86%) aligned concordantly 0 times
    3099256 (4.41%) aligned concordantly exactly 1 time
    513813 (0.73%) aligned concordantly >1 times
    6.27% overall alignment rate

    run 2 (using skewer):
    5136192 reads; of these:
    5136192 (100.00%) were paired; of these:
    192115 (3.74%) aligned concordantly 0 times
    4264035 (83.02%) aligned concordantly exactly 1 time
    680042 (13.24%) aligned concordantly >1 times
    97.01% overall alignment rate
    ---- trimming information of skewer ----
    70676932 read pairs processed
    29547 ( 0.04%) degenerative read pairs filtered out
    17685 ( 0.03%) short read pairs filtered out after trimming by size control
    65493508 (92.67%) empty read pairs filtered out after trimming by size control
    5136192 ( 7.27%) read pairs available; of these:
    1285606 (25.03%) trimmed read pairs available after processing
    3850586 (74.97%) untrimmed read pairs available after processing

    you may download skewer from https://sourceforge.net/projects/skewer/

    Cheers,
    Hongshan

    Originally posted by Mark View Post
    Hi All

    I recently downloaded the FASTX toolkit and tried to use it for trimming fastq reads of adapter sequences. This did not work, the tool simply discarded any reads containing adapter sequences though this is not seemingly its function according to the documentation. I wrote to the help contact for the tool but recieved no response (see below for details). Has anyone used this tool for this purpose successfully?

    Thanks for your help

    Mark
    Last edited by relipmoc; 09-25-2013, 07:44 AM. Reason: typo

    Leave a comment:


  • earonesty
    replied
    Originally posted by westerman View Post
    But, yes, trim and clip could also be considered synonyms.
    Yep. They are synonyms, and are used inconsistently.

    Personally, I say "clip", when I mean "looking for adapters or other sequences and removing them off the ends of reads", and "trim" when I mean "looking for qualities/base skew" and removing them off the ends of reads. (fastx-toolkit and fastq-mcf seem to use it this way.)

    Leave a comment:


  • westerman
    replied
    Originally posted by Oliviervg View Post
    I know my question should seem stupid for a native english speaker, but I still not understand the difference between trimming and clipper ...
    Maybe they are synonyms, and we can use both terms in each case ?

    Trim usually means an algorithmic determination of where to clip off sequences. E.g., trim all bases from 5' end where the quality value is 20 or less (Q20) in a running total of 4 bases.

    Clip is usually a hard and fast rule. E.g., clip 15 bases off of the 5' end.


    But, yes, trim and clip could also be considered synonyms.

    Leave a comment:


  • Oliviervg
    replied
    I know my question should seem stupid for a native english speaker, but I still not understand the difference between trimming and clipper ...
    Maybe they are synonyms, and we can use both terms in each case ?

    Leave a comment:


  • Oliviervg
    replied
    Can someone answer to my stupid question please ?
    What the difference between clip and trim ?
    Thank you

    Leave a comment:


  • Oliviervg
    replied
    Hello, and thank you for this great program.

    I have a stupid question, but I don't understand what does "trim" mean and what does "clip" mean ? What's the difference between them ?
    Is trim a synonym for "cut" ?

    Leave a comment:


  • earonesty
    replied
    -k 0 disables skew detection. Normally there's no reason to disable it... it can help find problems in data.

    Leave a comment:


  • earonesty
    replied
    Purity is illumina's purity filter. you can turn this off with -U ... bu you REALLY SHOULD NOT turn it off. Read up on illumina purity filtering... it is the result of confused signal from adjacent clusters.

    Leave a comment:


  • nasobema
    replied
    Hi earonesty!

    Thanks a lot for this great tool. I found it just today and the first test with fastq-mcf already left the impression that it is both fast and includes a lot of utilities for clippling.

    However, I didn't get the meaning of all the options and the output.
    In particular I would like to know what is meant by "Filtered x reads on purity flag".
    Here's a sample report of a test case where I lose some 15 % due to this filter (see last line):

    Code:
    Scale used: 2.2
    Filtering Illumina reads on purity field
    Phred: 33
    Warning: Too much skewing found (110), disabling skew clipping
    Threshold used: 251 out of 100000
    Adapter RNA-seq_PCR-primer_1_reverse (AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT): counted 21114 at the 'end' of '../rawdata/ado_pool_PE02_R2.fastq', clip set to 1
    Adapter RNA-seq_PCR-primer_2_reverse (AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG): counted 21340 at the 'end' of '../rawdata/ado_pool_PE02_R1.fastq', clip set to 1
    Adapter RNA-seq_PCR-primer_2_reverse (AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG): counted 449 at the 'end' of '../rawdata/ado_pool_PE02_R2.fastq', clip set to 6
    Files: 2
    Total reads: 42361724
    Too short after clip: 420423
    Clipped 'end' reads (../rawdata/ado_pool_PE02_R1.fastq): Count 20076452, Mean: 20.70, Sd: 24.83
    Trimmed 16073640 reads (../rawdata/ado_pool_PE02_R1.fastq) by an average of 22.84 bases on quality < 10
    Clipped 'end' reads (../rawdata/ado_pool_PE02_R2.fastq): Count 18776062, Mean: 22.10, Sd: 25.18
    Trimmed 15738855 reads (../rawdata/ado_pool_PE02_R2.fastq) by an average of 21.90 bases on quality < 10
    Filtered 6360682 reads on purity flag
    I ran fastq-mcf with option -k 100 (this disables skew clipping, does it?).

    What is purity? Were the reads bad (and in what sense)?
    Is there a way to switch this off?

    Leave a comment:


  • upendra_35
    replied
    Originally posted by Mark View Post
    Hi All

    I recently downloaded the FASTX toolkit and tried to use it for trimming fastq reads of adapter sequences. This did not work, the tool simply discarded any reads containing adapter sequences though this is not seemingly its function according to the documentation. I wrote to the help contact for the tool but recieved no response (see below for details). Has anyone used this tool for this purpose successfully?

    Thanks for your help

    Mark

    #############################################
    Hello

    I recently downloaded the FASTX toolkit (fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2) and attempted to use the fastx_clipper tool. I created a test fastq file (3 of the four sequences contain the default adapter CCTTAAGG):

    @test1
    CCTTAAGGAAAAAAAAAAGGGGGGGGGG
    +test1
    HHHHHHHHHHHHHHHHHHHHHHHHHHHH
    @test2
    CCTTAAGGAAAAAAAAAGGGGGGGGGGG
    +test2
    HHHHHHHHHHHHHHHHHHHHHHHHHHHH
    @test3
    AGAGAGAGAGAGAGAGAGAGAGAGAGAG
    +test3
    HHHHHHHHHHHHHHHHHHHHHHHHHHHH
    @test4
    CCTTAAGGTTGACGTGATCGACACCTGG
    +test4
    [[[[[[[[[[[[[[[[[[[[[[[[[[[[

    And then executed the command (as shown on FASTX toolkit website)

    -bash-3.2$ fastx_clipper -v -i test.fastq -a CCTTAAGG
    @test3
    AGAGAGAGAGAGAGAGAGAGAGAGAGAG
    +test3
    HHHHHHHHHHHHHHHHHHHHHHHHHHHH
    Clipping Adapter: CCTTAAGG
    Min. Length: 5
    Input: 4 reads.
    Output: 1 reads.
    discarded 0 too-short reads.
    discarded 3 adapter-only reads.
    discarded 0 N reads.

    As you can see, the three reads that contain the adapter are discarded as “adapter-only reads” which (in my way of looking at things) they are not nor are they too short (default <=5) after any trimming. What is going on here? Does this tool actually trim reads or only discard them if they are found. If the former would you please tell me what I am doing incorrectly? Also if the former, is it possible to supply the tool with multiple adapters to trim?

    Thanks for your help

    Mark
    I don't know if you have already sorted out this problem.But i figured out that fastx_clipper throws away all those reads that starts with adapters (adapter only reads) and there is no way you can tell fastx_clipper not to do that. However if the adapter sequence is found anywhere with in the read then it will clip the read starting with the adapter and keeps the rest of the read. Then you can either ask it to throw the clip read away (with -C option) or keep the clipped read (with -c option).

    Hope this helps a bit......

    Upendra

    Leave a comment:


  • earonesty
    replied
    Right.... maybe it should always run ... and -f should be a non-option. I've thought about that. But in my experience, it's better not to clip at all if the percentage clipped is very low. Better to just let those reads get discarded by the aligner... or marked as low-quality mappings and get washed out in the statistics later.

    Good aligners take into account quality scores when doing alignment, and variant callers do as well. We generally see higher repeatability on unclipped files... but only when the clipping percentage is low. In the 5-10% range. If 95% of the reads would be left alone anyway.. better not to run at all.

    I'll run some stats, we have about 10,000 samples to look at right now, so i can come up with a decent default threshold. Again... -f will force it to run always, so you can just always run it that way and get what you want.

    UPDATE: 5% is working well, I'm using it in production for new batches. If you want it to "always" try to clip, regardless of sampling, use -f.
    ALSO: I made it so short adapters work as "beginnings of sequence" adapters (they always worked for end of seq tests)
    Last edited by earonesty; 06-22-2011, 07:29 AM.

    Leave a comment:


  • fabrice
    replied
    Here I also confused with the parameters -f. If no adapters are found and no skewing is detected in the subsample, set -f what will happen? Will it do the trim?

    Why if more than 10% of the reads would be trimmed by that parameter, clipping will proceed? Does it mean fasta-mcf do the trim only when 10% of total reads need to trim?

    Do I have some misunderstand?

    In fact, here just my think.

    1, When we found adaptor at either end of read (for example, 10% mismatch), we do the trim.

    2, From the 3' (right part) of read, if the nucleotide's quality is less than the threshold (for example, -q 20), then do the trim.

    Because the adaptor contamination and low quality nucleotide will let the mapping not correctly.




    Originally posted by earonesty View Post
    Also, right now the default algorithm will "not clip" if no adapters are found and no skewing is detected in the subsample (unless you pass -f). I'm about to make a change that will also decide clipping is necessary if there are "significant" "low quality region" at either end of the reads. The definition of significance will be based on the -q parameter. If more than 10% of the reads would be trimmed by that parameter, clipping will proceed.

    Leave a comment:


  • earonesty
    replied
    Also, right now the default algorithm will "not clip" if no adapters are found and no skewing is detected in the subsample (unless you pass -f). I'm about to make a change that will also decide clipping is necessary if there are "significant" "low quality region" at either end of the reads. The definition of significance will be based on the -q parameter. If more than 10% of the reads would be trimmed by that parameter, clipping will proceed.

    Leave a comment:


  • fabrice
    replied
    earonesty,

    Thanks. I will try it.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:35 AM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 02:44 PM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-11-2024, 06:55 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
112 views
0 likes
Last Post seqadmin  
Working...
X