Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by alisrpp View Post
    Thanks a lot!!!

    Do you recommend me to do an individual FASTA file for each index?
    No, I use a single file with all of the Illumina TruSeq sequences in it. I also don't bother designating any of them for "Palindrome" search. I've read the description of Palindrome search several times on the Trimmomatic site and frankly still don't understand it. I just do simple searches for them and it seems to work fine for me.

    I have attached the file I use with Trimmomatic.
    Attached Files

    Comment


    • #32
      Thansk for the answers!

      About the palindrome clipping, some days ago i wrote one of the creators of Trimmomatic asking about an alternative explanation to the one in the web site (i couldn't understand it either).
      Here is the answer, for me was useful:

      Simple clipping is just finding a contaminant sequence somewhere within a read. Conceptually, you get contaminant and read, and you slide them across each other, until you get a perfect or close enough match. So, with R being read bases, and C being contaminant, you check

      1)
      RRRRRRRRRRR
      CCCC

      2)
      RRRRRRRRRRR
      CCCC ->

      etc.

      Palindrome clipping is a bit more complex - and related to actual palindromes only in a twisted mind like mine. In this case, you 'ligate' the presumed adapter sequence to the start of each read in a pair, and try sliding them over each other.

      So with F being bases from the forward read, R being bases from the reverse read, and A being either adapter (technically the two adapters are different, but lets ignore that for now).

      AAAAAAFFFFFFF ->
      <- RRRRRRRAAAAAA

      In this case, the aligning region is much longer, since it consists of the entire read length plus part of the adapter. This gives a very high confidence that an apparent 'read-though' is a true-positive.

      Comment


      • #33
        Originally posted by alisrpp View Post
        Here is the answer, for me was useful:

        Simple clipping is just finding a contaminant sequence somewhere within a read. Conceptually, you get contaminant and read, and you slide them across each other, until you get a perfect or close enough match. So, with R being read bases, and C being contaminant, you check

        1)
        RRRRRRRRRRR
        CCCC

        2)
        RRRRRRRRRRR
        CCCC ->

        etc.

        Palindrome clipping is a bit more complex - and related to actual palindromes only in a twisted mind like mine. In this case, you 'ligate' the presumed adapter sequence to the start of each read in a pair, and try sliding them over each other.

        So with F being bases from the forward read, R being bases from the reverse read, and A being either adapter (technically the two adapters are different, but lets ignore that for now).

        AAAAAAFFFFFFF ->
        <- RRRRRRRAAAAAA

        In this case, the aligning region is much longer, since it consists of the entire read length plus part of the adapter. This gives a very high confidence that an apparent 'read-though' is a true-positive.
        Yeah, still clear as mud.

        Comment


        • #34
          How does adapter trimming in Trimmomatic work?

          I have two adapter sequences of 58 bp and 66 bp that I would like to remove from my Illumina data set (if present). Can Trimmomatic recognise partial matches to these adapter sequences? For example, if I am using 100 bp reads and a particular sequence contains 90 bp of DNA from the source organism, the remaining 10 bp at the end of the read might be from the adapter. Would Trimmomatic be able to pick this up? Or must it find a match to the whole adapter sequence?

          I'm new at playing with NGS data, so any advice would be gratefully received!

          Comment


          • #35
            Maybe you can also check out cutadapt, that it is also useful for illumina data.

            Comment


            • #36
              Originally posted by claire.anderson1 View Post
              I have two adapter sequences of 58 bp and 66 bp that I would like to remove from my Illumina data set (if present). Can Trimmomatic recognise partial matches to these adapter sequences? For example, if I am using 100 bp reads and a particular sequence contains 90 bp of DNA from the source organism, the remaining 10 bp at the end of the read might be from the adapter. Would Trimmomatic be able to pick this up? Or must it find a match to the whole adapter sequence?
              In the case of paired-end data with adapter 'read-though' (where the DNA fragment is less than the read length, and the end of the reads are from the 'opposite' adapter), trimmomatic can remove even a single adapter base (if you use sufficiently aggressive settings). Older versions of trimmomatic required at least 8 bp of adapter in this case, but that was probably too conservative so i reduced it. The latest versions also include the recommended adapter sequences, which have been a common stumbling point.

              For other, less common, scenarios, where the adapter location/orientation isn't known in advance, or where you're using single end data, you'd typically want to be a bit more cautious, but 10bp or greater can usually be removed at a reasonable false positive rate.

              Hope this helps.

              Comment


              • #37
                Originally posted by kmcarr View Post
                Yeah, still clear as mud.
                Sorry that my explanation for this obviously sucks, and now that the adapter sequences are included directly in trimmomatic, there's probably not such a major need for everyone to understand it, but here goes anyway.

                During adapter read-though, with paired end data (and assuming the same length of forward and reverse reads) we get pairs with:
                • The forward read consisting of X useful bases, followed by Y bases from the end of the reverse read adapter.
                • The reverse read consisting of X useful bases, followed by Y bases from the end of the forward read adapter.

                The beauty is that those X bases in both the forward and reverse reads, are the same bases, though in reverse complement, and those Y bases are always specific known sequences starting immediately afterwards. So rather than fish for those Y bases in isolation (which is risky / difficult if Y is small), we can check simultaneously for 3 things:
                • The first X bases of both reads being reverse complements of each other.
                • The additional bases from the forward read match the reverse adapter.
                • The additional bases from the reverse read match the forward adapter.

                Since all three must be found to support the 'read-though' hypothesis in a given read pair/position, the false positive rate is very low. Naturally we don't know what X is, but we can check every possible X from zero to the read length.

                Comment


                • #38
                  What do the four columns following the read identifier in the trimlog represent? I can't find this in the documentation.

                  thanks!

                  Comment


                  • #39
                    Introducing the Trimmomatic

                    This is an extract from the trimmomatic web page:

                    specifying a trimlog file creates a log of all read trimmings, indicating the following details:

                    * the read name
                    * the surviving sequence length
                    * the location of the first surviving base, aka. the amount trimmed from the start
                    * the location of the last surviving base in the original read
                    * the amount trimmed from the end

                    Comment


                    • #40
                      Make trimmomatic a binary/executable

                      Hi Guys,

                      in case you prefer to run trimmomatic as binary ./trimmomatic

                      you can follow these steps:

                      1) download and gunzip stub.sh.gz (in attachment) where trimmomatic-0.X.jar is located
                      2) cat stub.sh trimmomatic-0.30.jar >> trimmomatic
                      3) chmod +x trimmomatic
                      4) add trimmomatic's home to your path

                      ref: https://coderwall.com/p/ssuaxa

                      in case you need to modify java's parameters we must modify stub.sh opportunely.

                      Ciao.
                      Attached Files

                      Comment


                      • #41
                        Hi everyone,

                        I've recently used Trimmomatic on some Illumina HiSeq PE fastq files. I then attempted to run the post-Trimmomatic fastq files through fastqc. My original illumina files run through fastqc just fine, but the post-trimmomatic files get stuck, which makes me think I've corrupted the files somehow while using Trimmomatic.

                        When I run fastqc on my post-trimmomatic fastq files, I get the following output after inputting my sequences:

                        Exception in thread "Thread-4" java.lang.NullPointerException
                        at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:141)
                        at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:105)
                        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
                        at java.lang.Thread.run(Unknown Source)

                        I also did get one error message after running trimmomatic. This error was:

                        Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

                        My original trimmomatic code was:

                        TrimmomaticPE: -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1.fastq unpaired_output1.fastq paired_output2.fastq unpaired_output2.fastq ILLUMINACLIP:TruSeq3_PE.fa:2:30:10 LEADING:20 TRAILING:20 MINLEN:30

                        I'd appreciate any thoughts on where I went wrong...

                        Comment


                        • #42
                          Originally posted by rmdoyle View Post
                          I'd appreciate any thoughts on where I went wrong...
                          Very strange indeed, and nothing i've seen before.

                          I would suspect something like a lack of disk space, or something killed the trimmomatic process. It may also be a one-off glitch, so perhaps running it again, and checking if the output is still broken might help.

                          Comment


                          • #43
                            Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

                            Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

                            Comment


                            • #44
                              Originally posted by rmdoyle View Post
                              Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

                              Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg
                              Ah, that was a trimmomatic error. Normally a FASTQ record should have the same number of bases and quality scores, and for some reason, this read appears to have fewer quality scores, which trimmomatic considers invalid (AFAIK this is correct behaviour). At this point, trimmomatic gives up, and probably leaves a partial output file, which may cause other issues.

                              The question is why the record is invalid. Can you find that fastq record within the file?

                              Of course, trimmomatic should really log the name of the record as well, rather than just the data, but i haven't seen this happen before.

                              Comment


                              • #45
                                Yup, the complete record is:

                                @FCB01CWABXX:1:2205:1823:145892
                                GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC
                                +FCB01CWABXX:1:2205:1823:145892
                                ggggggggggggfeggggggggcgggeggggggggeggg18207:146312

                                I suppose I could just cut this record out?

                                Interestingly, if I leave out the ILLUMINACLIP:TruSeqForTrimmomatic.fna:2:30:10 option, and leave my code as:

                                trimmomatic paired-end -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1b.fastq unpaired_output1b.fastq paired_output2b.fastq unpaired_output2b.fastq LEADING:20 TRAILING:20 MINLEN:30

                                I get files that I CAN run through fastqc without any problems (the results don't look great, but I can run the files through). Does that set off any red flags?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:09 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                13 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X