Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read trimming and Picard

    Hi,


    Does anyone have a recommended read-trimming software that works with color-space data?


    Also, I'm not trying to-repost but I'm getting some odd-errors and the help-email list for SamTools seems dead. What is the source of this error:


    INFO 2010-10-29 09:39:34 MarkDuplicates Read 46000000 records. Tracking 687328 as yet unmatched pairs. 46728 records in RAM. Last sequence index: 9
    INFO 2010-10-29 09:39:45 MarkDuplicates Read 47000000 records. Tracking 686480 as yet unmatched pairs. 32624 records in RAM. Last sequence index: 9
    INFO 2010-10-29 09:40:08 MarkDuplicates Read 48000000 records. Tracking 684660 as yet unmatched pairs. 17477 records in RAM. Last sequence index: 9
    INFO 2010-10-29 09:40:18 MarkDuplicates Read 49000000 records. Tracking 682311 as yet unmatched pairs. 479 records in RAM. Last sequence index: 9
    [Fri Oct 29 09:40:37 CDT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=772931584
    Exception in thread "main" net.sf.picard.PicardException: Exception writing ReadEnds to file.
    at net.sf.picard.sam.ReadEndsCodec.encode(ReadEndsCodec.java:74)
    at net.sf.picard.sam.ReadEndsCodec.encode(ReadEndsCodec.java:32)
    at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:185)
    at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:140)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:269)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:109)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:93)
    Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:260)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at java.io.DataOutputStream.flush(DataOutputStream.java:106)
    at net.sf.picard.sam.ReadEndsCodec.encode(ReadEndsCodec.java:71)
    ... 7 more


    I can't seem to find any documentation on it and nobody answered my last post.

    Finally, I've been reading on some previous seq-answers posts and I wanted to see if anyone can clarify that samtools removes duplicates based on start/stop alone and doesn't consider identical sequences. Are you sure?

  • #2
    It seems you run out of space?

    Code:
    ...
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicat es.java:93)
    Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    ...
    Regarding samtools, look at this thread with comments from the author.

    It does not considers the sequence. Also, take a look to the mathematical models implemented in samtools. Entries 1.1 and 1.2 detail changes of getting duplicates at library and mapping level.
    -drd

    Comment


    • #3
      Originally posted by drio View Post
      It seems you run out of space?

      Code:
      ...
      at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicat es.java:93)
      Caused by: java.io.IOException: No space left on device
      at java.io.FileOutputStream.writeBytes(Native Method)
      ...
      Regarding samtools, look at this thread with comments from the author.

      It does not considers the sequence. Also, take a look to the mathematical models implemented in samtools. Entries 1.1 and 1.2 detail changes of getting duplicates at library and mapping level.

      Thanks for the reply, Drio. I run out of space, but I also set the MAX* param for a much higher value with the same end result. Still get the error...

      Comment


      • #4
        Originally posted by JohnK View Post
        Thanks for the reply, Drio. I run out of space, but I also set the MAX* param for a much higher value with the same end result. Still get the error...
        Why do you expect the setting the MAX* param would eliminate the "running out of space" error? Now if you said "I ran out and put a new 10 TB Raid-5 disk on my system and slapped on an extra 256 GB of memory with the same end result" then I would be concerned.

        More seriously, it is possible that -- assuming you are on running on a *nix based system -- that the program is set to saving temporary files in '/tmp'. On many system '/tmp' is actually memory instead of disk. Thus it is possible to run of out of "disk space" even though you have lots of disk space.

        Or you may simply be out of disk space. How much do you have free?

        Comment


        • #5
          Originally posted by westerman View Post
          Why do you expect the setting the MAX* param would eliminate the "running out of space" error? Now if you said "I ran out and put a new 10 TB Raid-5 disk on my system and slapped on an extra 256 GB of memory with the same end result" then I would be concerned.

          More seriously, it is possible that -- assuming you are on running on a *nix based system -- that the program is set to saving temporary files in '/tmp'. On many system '/tmp' is actually memory instead of disk. Thus it is possible to run of out of "disk space" even though you have lots of disk space.

          Or you may simply be out of disk space. How much do you have free?
          It was a similar issue, but my sys-admin found it. One program was eating /tmp and putting it over the top.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Addressing Off-Target Effects in CRISPR Technologies
            by seqadmin






            The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
            08-27-2024, 04:44 AM
          • seqadmin
            Selecting and Optimizing mRNA Library Preparations
            by seqadmin



            Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
            08-07-2024, 12:11 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 08-27-2024, 04:40 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 08-22-2024, 05:00 AM
          0 responses
          293 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 08-21-2024, 10:49 AM
          0 responses
          135 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 08-19-2024, 05:12 AM
          0 responses
          124 views
          0 likes
          Last Post seqadmin  
          Working...
          X