Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fkrueger
    replied
    If you wanted to you could cite its URL, there is no publication as such (apart from the Cutadapt reference). Cheers, Felix

    Leave a comment:


  • frozenlyse
    replied
    Hi Felix - I'm writing the methods sections for a few WGBS papers where I've used trim_galore, is there a paper I can cite for it?

    Leave a comment:


  • yasmin_friedmann
    replied
    trim_galore without adaptor trimming?

    Hi All,

    Here is my first question ever to this forum! :-)

    I have come across trim_galore when looking for a quality trimmer that would trim both paired end reads together. my fastq files are from illumina 1.9. I run the following command:

    trim_galore -q 20 --fastqc --gzip --paired filename1 filename3
    I get the following error message:

    No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

    Writing report to 'filename1_trimming_report.txt'

    SUMMARISING RUN PARAMETERS
    ==========================
    Input filename: filename1
    Trimming mode: paired-end
    Trim Galore version: 0.3.7
    Quality Phred score cutoff: 20
    Quality encoding type selected: ASCII+33
    Adapter sequence: 'AGATCGGAAGAGC'
    Maximum trimming error rate: 0.1 (default)
    Minimum required adapter overlap (stringency): 1 bp
    Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
    Running FastQC on the data once trimming has completed
    Output file(s) will be GZIP compressed

    Writing final adapter and quality trimmed output to filename1_trimmed.fq.gz


    >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file filename1 <<<
    Traceback (most recent call last):
    File "/Users/yasmin/cutadapt-1.4.2/bin//cutadapt", line 9, in <module>
    from cutadapt.scripts import cutadapt
    File "/Users/yasmin/cutadapt-1.4.2/cutadapt/scripts/cutadapt.py", line 69, in <module>
    from cutadapt.adapters import Adapter, ColorspaceAdapter, BACK, FRONT, PREFIX, ANYWHERE
    File "/Users/yasmin/cutadapt-1.4.2/cutadapt/adapters.py", line 4, in <module>
    from cutadapt import align, colorspace
    File "/Users/yasmin/cutadapt-1.4.2/cutadapt/align.py", line 225, in <module>
    from cutadapt._align import globalalign_locate, compare_prefixes
    ImportError: dlopen(/Users/yasmin/cutadapt-1.4.2/cutadapt/_align.so, 2): no suitable image found. Did find:
    /Users/yasmin/cutadapt-1.4.2/cutadapt/_align.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00


    Cutadapt terminated with exit signal: '256'.
    Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...
    if anybody came across this and solved it , please let me know!

    Many thanks!
    Yasmin
    Last edited by yasmin_friedmann; 09-12-2014, 01:16 AM. Reason: added error message

    Leave a comment:


  • rzwu0721
    replied
    Hi, I am using the software named CLC Genomics Workbench, and it can trim the adapter for just need several minutes, eg. CTGTCTCTTATACACATCT you have mentioned above.So I would recommend you can try to use it.

    Best Wishes!
    Renzhi Woo,
    Guangxi Academy of Sciences

    Leave a comment:


  • fkrueger
    replied
    Originally posted by shawpa View Post
    I have used trim galore before using Bismark many times. It just occurred to me though that there might be a problem in my use of the pipeline. What I usually do is trim the reads (both adaptor and quality trimming), then align with Bismark, remove duplicate reads using the deduplicatebismark script provided, then proceed with methylation calling. However, if I am trimming for quality, I am changing the start and end coordinates of the read, which I think would affect the detection of duplicate reads. Could someone please let me know if this is correct? Is trimming for quality, going to adversely affect the detection of duplicate reads?
    No, trimming should not affect the deduplication:

    Single-end deduplication uses the chromosome, the start coordinate and the orientation of a read. Since you are trimming from the 3' end of a read this has no influence on the start coordinate. (for reverse reads the start coordinate is calculated by adding the read length (using the CIGAR string for gapped alignments if required)).

    Paired-end deduplication uses the chromosome, the start coordinate of read 1, the end coordinate of read 2 and the orientation of the read pair (determined by read 1). Again, since you are trimming from the 3' end of both reads the relevant parameters are not affected.

    Leave a comment:


  • shawpa
    replied
    I have used trim galore before using Bismark many times. It just occurred to me though that there might be a problem in my use of the pipeline. What I usually do is trim the reads (both adaptor and quality trimming), then align with Bismark, remove duplicate reads using the deduplicatebismark script provided, then proceed with methylation calling. However, if I am trimming for quality, I am changing the start and end coordinates of the read, which I think would affect the detection of duplicate reads. Could someone please let me know if this is correct? Is trimming for quality, going to adversely affect the detection of duplicate reads?

    Leave a comment:


  • fkrueger
    replied
    I have just released a small fix to Trim Galore (v0.3.7) that makes paired-end trimming work again (which I had accidentally broken by introducing a small change...). The manual has now also been updated.

    Please find the latest release here: https://www.bioinformatics.babraham....s/trim_galore/

    Leave a comment:


  • fkrueger
    replied
    First of all apologies for not having released Trim Galore updates lately, I seem to have somehow always postponed and then forgotten them entirely...

    A new version of Trim Galore (v0.3.6) is now available from its project page (http://www.bioinformatics.babraham.a...s/trim_galore/), which adds several features and fixes:

    - Added the new options '--three_prime_clip_r1' and '--three_prime_clip_r2' to clip any number of bases from the 3' end after adapter/quality trimming has completed
    - Added a check to see if Cutadapt exits fine. Else, Trim Galore will bail a well
    - The option '--stringency' needs to be spelled out now since using -s was ambiguous because of '--suppress_warn'
    - Added the Trim Galore version number to the summary report
    - Added single-end or paired-end mode to the summary report
    - In paired-end mode, the Read 1 summary report will no longer state that no sequence have been discarded due to trimming. This will be stated in the trimming report of Read 2 once the validation step has been completed

    (Edit: The manual needs a little updating, too, I'll work on that...)

    Leave a comment:


  • Kmok
    replied
    Thanks Brian
    Shall try to run with BBMerge and see

    Leave a comment:


  • Brian Bushnell
    replied
    Well... if you plot the insert size histogram, and see very sharp peaks at certain lengths, those may be some kind of non-genomic molecules. And once you know the length, you might be able to guess what they are considering all the different reagents that were used. Or you could look at reads with those specific insert sizes and see what the sequence is, to determine what they are. Once you know, you can easily filter them out (digitally). That is of course IF there are sharp peaks in the insert size histogram.

    If they are non-genomic artifacts, you won't find them in the insert size histogram you would get from mapping, because they won't map. But (if you have paired reads) you can generate an insert size histogram by overlapping them with BBMerge.

    Leave a comment:


  • Kmok
    replied
    Hi Brain

    Thanks very much.
    I need to ask our lab on the QC of the library. How can we guess it is dimer from the library insert distribution? Is it the a peak of same size as we seen in the later peaks?

    Kin

    Leave a comment:


  • Brian Bushnell
    replied
    I don't know about the later anomalies, but in my tests, Nextera seems to have highly irregular base frequencies for the first ~20bp (as you say, probably due to non-random binding). They are still fairly accurate and do not need to be trimmed.

    It's possible that the later peaks are due to primer-dimers or other such artifacts. What is the insert-size distribution of the library?

    Leave a comment:


  • Kmok
    replied
    New Bee on Trim Galore

    I use Trim Galore to trim an exome seq data captured with Illumina Nextera. The script used is
    $myTrimGalore -q 15 -a CTGTCTCTTATACACATCT --stringency 3 --length 20 -e 0.1 -o $myoutDir --fastqc_args "--outdir $myoutDir" --dont_gzip --paired $myfastq1 $myfastq1.

    The Fastqc results after running Trim Galore show there are bias in the nucleotides in the first 15bp (perbaseSequence). I guess this may be related to the non-random binding of transposase. There are over-representations of Kmer also at the 5' as well as in the middle of the sequence. Can anyone help in telling me what is the cause of the Kmers ( ? adapters ?indexes)? How should these be trimmed if they are adapters or indexes?

    Thanks in advance

    Click image for larger version

Name:	Kmer.PNG
Views:	1
Size:	123.4 KB
ID:	304612

    Click image for larger version

Name:	PerBasesequence.PNG
Views:	1
Size:	72.1 KB
ID:	304613

    Leave a comment:


  • maria.gr
    replied
    Thanks ! Found it , but it gave again error of line 471...
    Actually I realised that I had to make the file executable ('chmod a+x build/cutadapt/bin/cutadapt' )...
    Don't know if it's needed for everyone after downloading the cutadapt, but I say it in case sb has the same problem
    So now it runs normally !
    Thanks again!

    Leave a comment:


  • fkrueger
    replied
    To supply the path to cutadapt you need to edit trim galore in a text editor and change the path as one of the first lines.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X