Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GenoMax
    replied
    You need to map paired end files together before merging the resulting alignment files using samtools merge.

    Leave a comment:


  • floem7
    replied
    Ok, so I've found samtools merge.

    Leave a comment:


  • floem7
    replied
    So my mapping crashed. Here is my command:
    ~/bin/bbmap/bbwrap.sh in1=hds1r1_09clean.fastq.gz,hds1r1_26clean.fastq.gz in2=hds1r2_09clean.fastq.gz,hds1r2_26clean.fastq.gz path=/media/mj/c8e2ccd2-6313-4092-be34-46144891720f/agp_v5 unpigz=t pigz=t threads=24 -Xmx100g outm=s68_to_b73v5.bam append showprogress=250000 statsfile=stats_s68_to_b73v5 covstats=covstats_s68_to_b73v5 bs=bs.sh

    Here is the error message
    [E::sam_parse1] no SQ lines present in the header
    samtools view: error reading file "-"
    Exception in thread "Thread-46" java.lang.RuntimeException: java.io.IOException: Przerwany potok
    at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:32)
    Caused by: java.io.IOException: Przerwany potok
    at java.base/java.io.FileOutputStream.writeBytes(Native Method)
    at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
    at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
    at stream.ReadStreamByteWriter.writeSam(ReadStreamByteWriter.java:659)
    at stream.ReadStreamByteWriter.processJobs(ReadStreamByteWriter.java:93)
    at stream.ReadStreamByteWriter.run2(ReadStreamByteWriter.java:42)
    at stream.ReadStreamByteWriter.run(ReadStreamByteWriter.java:28)
    Exception in thread "Thread-49" java.lang.RuntimeException: Writing to a terminated thread.
    at stream.ConcurrentGenericReadOutputStream.write(ConcurrentGenericReadOutputStream.java:202)
    at stream.ConcurrentGenericReadOutputStream.addDisordered(ConcurrentGenericReadOutputStream.java:197)
    at stream.ConcurrentGenericReadOutputStream.add(ConcurrentGenericReadOutputStream.java:97)
    at align2.AbstractMapThread.writeList(AbstractMapThread.java:625)
    at align2.AbstractMapThread.run(AbstractMapThread.java:591)
    MORE REPETITIONS OF SIMILAR BLOCKS

    **************************************************************************
    Warning! 24 mapping threads did not terminate normally.
    Check the error log; the output may be corrupt or incomplete.
    Please submit the full stderr output as a bug report, not just this message.
    **************************************************************************
    I don't have to necesarily use bbwrap, I just wanted to make mapping in elegant way.
    So just to make sure, could I map first pair and second pair of read separately with bbmap and then merge the results (bam files)? I don't have time for more tries, I have been waiting five days for this error.

    Leave a comment:


  • floem7
    replied
    Ok, thanks. I have huge dataset so I will have to wait for results.

    Leave a comment:


  • GenoMax
    replied
    That would be fine. Looks like you can use `outm=` if you need just mapped reads. I have normally used `out=` with `mappedonly=t` but those two look to be equivalent.

    It should not take 10 days to complete an analysis unless you have a huge dataset.

    Leave a comment:


  • floem7
    replied
    Thanks,

    so can I write:

    bbwrap.sh in1=hds1r1_09clean.fastq.gz,hds1r1_26clean.fastq.gz in2=hds1r2_09clean.fastq.gz,hds1r2_26clean.fastq.gz ... out=s68_to_b73v5.bam

    ?

    Why I have to use "out" but not "outm"? I want to retain only mapped reads, to keep my file as small as possible.

    It is also interesting, why bbwrap haven't alert that options such as "-in1" are invalid. Some output file has been written but I don't know if it is valid. At least bamqc accepted that file. Nevertheless I'd rather want properly generated file, even if it means that I have to wait for it ten days.
    Last edited by floem7; 05-18-2021, 01:02 PM.

    Leave a comment:


  • GenoMax
    replied
    You can't use `-in1=` those options are simply `in1=` and `in2=` etc. You should also use `out=` not `outm=` if you are writing an alignment file.
    Code:
    bbwrap.sh in1=read1.fq,singleton.fq in2=read2.fq,null out=mapped.sam append

    Leave a comment:


  • floem7
    started a topic bbwrap.sh, is output OK?

    bbwrap.sh, is output OK?

    Hi all,

    I have a library sequenced two times in paierd-end mode.

    I have used bbwrap.sh

    ~/bin/bbmap/bbwrap.sh -in1=hds1r1_09clean.fastq.gz,hds1r1_26clean.fastq.gz -in2=hds1r2_09clean.fastq.gz,hds1r2_26clean.fastq.gz path=/media/mj/c8e2ccd2-6313-4092-be34-46144891720f/agp_v5 unpigz=t pigz=t threads=24 -Xmx100g outm=s68_to_b73v5.bam showprogress=250000 statsfile=stats_s68_to_b73v5 covstats=covstats_s68_to_b73v5

    If I understand correctly it is possible to provide paired files in -in and -in2

    The procedure proceeded ok, like bbmap.sh, after processing the first pair it printed info, which I expected:

    ------------------ Results ------------------

    Genome: 1
    Key Length: 13
    Max Indel: 16000
    Minimum Score Ratio: 0.56
    Mapping Mode: normal

    Reads: 461959640
    Mapped reads: 455488907
    Mapped bases: 44824369462
    Ref scaffolds: 685
    Ref bases: 2182075994

    Percent mapped: 98.599
    Percent proper pairs: 80.240
    Average coverage: 20.542
    Average coverage with deletions: 27.430
    Standard deviation: 111.219
    Percent scaffolds with any coverage: 100.00
    Percent of reference bases covered: 98.53

    Total time: 393719.639 seconds.

    And later:

    Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, unpigz=t, pigz=t, threads=24, showprogress=250000, statsfile=stats_s68_to_b73v5, covstats=covstats_s68_to_b73v5, indexloaded=t, in=hds1r1_26clean.fastq.gz, in2=hds1r2_26clean.fastq.gz]
    Version 38.90

    Set threads to 24
    Retaining first best site only for ambiguous mappings.
    No output file.
    Cleared Memory: 0.308 seconds.
    Processing reads in paired-ended mode.
    Started read stream.
    Started 24 mapping threads.
    ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23

    ------------------ Results ------------------

    Genome: 1
    Key Length: 13
    Max Indel: 16000
    Minimum Score Ratio: 0.56
    Mapping Mode: normal

    Total time: 381683.967 seconds.


    I'm surprised to see "No output file." in the messages above. Also the entire output ended with "total time" and no statistics as for the first read-pair.

    So in the end I wonder if my resulting file is ok (I used outm to get only mapped reads). I.e. does it contain mapped reads from all four files?

    I don't want to test if, for example merged file from separate runs of bbmap for both pairs would give the same number of mapped reads. It would take too much time.
    bbwrap seems ok, but from my point of view it is not enough documented.

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X