Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by rufessor View Post
    I may be answering in part my question- or at least clarifying my question.
    I am no longer certain that the sleeping process list actually really effectively did anything to the cache- its almost always used to capacity by linux so I guess my question is-

    what the heck are those 40 + sleeping processes doing- and are they effectively actually consuming any resource whatsoever. Do they really hold ram or is it just cached...

    Would be curious.
    Most modern linux distros manage memory internally so one can't depend on output of htop/top alone. With a TB of RAM you should have no worries about memory consumption

    See: http://www.linuxatemyram.com/

    Comment


    • Hey everyone,

      I've used trimmomatic to clean up my reads but when I use the cleaned files in tophat, I get an error. When I examined a bit closer I discovered that trimmomatic converted this:

      @D3VDZHS1:119:H036PADXX:1:1202:12533:34018 2:N:0:GGACTCCTTAGATCGC
      ATAGACAAATGCCTGCAACAACGCAGGGATCTCTTTCCCGGTAAACCAACCGTCGTCATTGAAGATATGCATGCTGGCTCGGGTATCCCATTGCTGATAC
      +
      @@CADDFFBBFHHJIGIJJIIJIBDDGE@ABGEHGGGIJIGF:CFFHGIJG<?8ADBDB@AC;.;>A>>@>:@ACC@@?C<BB(0>(:@(4::@(+4>A>
      @D3VDZHS1:119:H036PADXX:1:1202:12611:34155 2:N:0:GGACTCCTTAGATCGC
      GGTATCAACGCAGAGTACTTTTTTTTTTTCTTTTTTTTTTTTTTTTTTAAAGGAAAACCAGACAAATCATGAAGCCACATACGCTAGAGAAGCTCAATAC
      +
      B@@DFFDFHHGHHGIEHHIJIIJJJJJJI)BFGG):BCDDDDDDDDB#####################################################
      @D3VDZHS1:119:H036PADXX:1:1202:12731:34205 2:N:0:GGACTCCTTAGATCGC
      TAATAAATCCGCTACCGACGCTGACTAACATTTCGCGATCGTTCATCGCATCACCAAAGGCCGTGCAATCGCGCAACGATAAACCTAAATGTTGGGTCAG
      +
      @@CFFFFFHHHHHIJJIIJIIIJJJGIIJJJJJIIJJBAHG@GHEHFGF=BCEEEC?AC?AAB8888@CC??>BBDD>BBBBCDDAACDCDDC@8<?CBC


      TO:

      @D3VDZHS1:119:H036PADXX:1:1202:12533:34018 2:N:0:GGACTCCTTAGATCGC
      ATAGACAAATGCCTGCAACAACGCAGGGATCTCTTTCCCGGTAAACCAACCGTCGTCATTGAAGATATGCATGCTGGCTCGGGTAT+
      @@CADDFFBBFHHJIGIJJIIJIBDDGE@ABGEHGGGIJIGF:CFFHGIJG<?8ADBDB@AC;.;>A>>@>:@ACC@@?C<BB(0>
      @D3VDZHS1:119:H036PADXX:1:1202:12731:34205 2:N:0:GGACTCCTTAGATCGC
      TAATAAATCCGCTACCGACGCTGACTAACATTTCGCGATCGTTCATCGCATCACCAAAGGCCGTGCAATCGCGCAACGATAAACCTAAATGTTGGGTCAG
      +
      @@CFFFFFHHHHHIJJIIJIIIJJJGIIJJJJJIIJJBAHG@GHEHFGF=BCEEEC?AC?AAB8888@CC??>BBDD>BBBBCDDAACDCDDC@8<?CBC

      Obviously trimmomatic didn't put + on the next line and now tophat can't read the line properly. This has happened in multiple files. Does anyone know if a) this is normal for trimmomatic and I need to fix this manually or if b) I did something wrong to cause it?

      My input code:

      Code:
      java -jar /path/to/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 8 -phred33 -trimlog Sample1trimlog sample1_R1.fastq sample1_R2.fastq sample1_R1_TP.fastq sample1_R1_TU.fastq sample1_R2_TP.fastq sample1_R2_TU.fastq ILLUMINACLIP:/path/to/Trimmomatic-0.32/adapters/adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
      Thanks for your help!

      Comment


      • Originally posted by travelk View Post
        Obviously trimmomatic didn't put + on the next line and now tophat can't read the line properly. This has happened in multiple files. Does anyone know if a) this is normal for trimmomatic and I need to fix this manually or if b) I did something wrong to cause it?
        a) No. Not normal.

        b) Probably. But your command line looks ok and nothing obvious is popping up.

        Comment


        • Originally posted by travelk View Post
          Obviously trimmomatic didn't put + on the next line and now tophat can't read the line properly. This has happened in multiple files. Does anyone know if a) this is normal for trimmomatic and I need to fix this manually or if b) I did something wrong to cause it?
          It is certainly not normal - it's one of the strangest trimmomatic issue i have heard of. None the less, it should be possible to track down. Some questions:

          1) Does this happen consistently on the same lines of the same files if they are run more than once?
          1.1) If so, can you isolate and send me a short example where it happens?

          2) What OS are you using?

          Thanks,

          Tony.

          Comment


          • Ok, I re-ran the files using the exact same script on the exact same files as suggested to check if it happened consistently on the same line... but now there is no problem with the new output files and tophat runs them just fine. So, I'm not sure what I did the first time around to have the files corrupt like that.

            Thank you nevertheless for taking the time to help me!

            Comment


            • Trimmomatic is not working correctly in paired end mode:

              Read names in output files are not in the correct order. Correct phase was lost at read #27 and there are additional phase changes thereafter. Command line was as follows:

              java -jar /opt/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads -phred33 -trimlog Trim_Lesion6.txt 6_S1_L001_R1_001_clip2.fastq 6_S1_L001_R2_001_clip2.fastq 6_S1_L001_R1_001_paired.fastq 6_S1_L001_R1_001_unpaired.fastq 6_S1_L001_R2_001_paired.fastq 6_S1_L001_R2_001_unpaired.fastq MINLEN:40

              Comment


              • The command seems OK, except that you haven't specified the number of threads.

                Does trimmomatic give any error messages?

                Is there something wrong with the format of read 27 in one of your files
                that causes it to be read incorrectly?

                Do your 2 input files have the same number of reads, in the same order?

                What does the trimmomatic log file indicate is happening when the output files get out of phase?

                Comment


                • Originally posted by mastal View Post

                  Do your 2 input files have the same number of reads, in the same order?
                  This was the problem. Interesting, there was, to my knowledge, no stipulation in the original publication, or in the online manual, that the program requires input files with perfectly paired reads. Maybe I'm stupid but I would have thought that a program that is designed to take raw reads, quality trim them and sort them in to paired and unpaired datasets would realize that raw data coming off a sequencing machine often has large numbers of unpaired reads to start off with. As it stands, it appears I have to run one script to cull unpaired mates and then run Trimmomatic. How inefficient is that?

                  Comment


                  • Originally posted by drdna View Post
                    Maybe I'm stupid but I would have thought that a program that is designed to take raw reads, quality trim them and sort them in to paired and unpaired datasets would realize that raw data coming off a sequencing machine often has large numbers of unpaired reads to start off with. As it stands, it appears I have to run one script to cull unpaired mates and then run Trimmomatic. How inefficient is that?
                    Inefficient indeed. But I want to know what type of machine you have that has large numbers of unpaired reads? My miSeqs and hiSeqs always pair reads -- assuming that I tell them that the project is paired.

                    Comment


                    • We have a MiSeq which gives us scads of high quality data but almost never gives us perfectly paired reads. I'll have to check with our tech to see if she sets any kind of paired data flag. Where would that be?

                      Comment


                      • Settings would be in the sample sheet.

                        Are you getting the reads directly from the sequencer or via BaseSpace? The latter may be doing some sort of trimming for you. Because we have a hiSeq and pre-existing pipelines we do not use BaseSpace but rather just grab the raw reads. Thus I am not familiar with BaseSpace but I do know that it can do a lot of useful processing.

                        Comment


                        • Only way I would see that happen is you get consistently bad quality reads on one end that are removed by MiSeq reporter/BaseSpace. Perhaps you should ask the tech to turn off on-instrument analysis (adapter trimming etc) and you can do that offline.

                          Comment


                          • That's probably it - we use BaseSpace. My guess is that BaseSpace is filtering out pairs with poor quality. I'll have to look into that.

                            Comment


                            • Thanks for the insights. BaseSpace does a good job of adaptor trimming, demultiplexing. It's probably more efficient to just run the reads through a script to pair them up properly before downstream processing.

                              Comment


                              • repair.sh from BBTools will do that easily: http://seqanswers.com/forums/showpos...8&postcount=61

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X