Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • kmavrommatis
    replied
    reformat.sh Failed to auto-detect quality coding

    Hi,
    when I am running reformat.sh on a set of fastq files downloaded from SRA

    reformat.sh maxcalledquality=40 out=SRR1_R1.fq.gz qout=33 mincalledquality=2 out2=SRR1_R2.fq.gz qin=auto fixjunk=t in=/RNA-Seq/Raw/fastq.20190712/SRR1_1.fastq.gz in2=/RNA-Seq/Raw/fastq.20190712/SRR1_2.fastq.gz

    The program crashes with the following error:

    Set INTERLEAVED to false
    Changed from ASCII-33 to ASCII-64 on input quality 64 for base N while prescanning.
    Changed from ASCII-64 to ASCII-33 on input quality 35 while prescanning.
    Exception in thread "main" java.lang.AssertionError: Failed to auto-detect quality coding; quitting. Please manually set qin=33 or qin=64.
    at stream.FASTQ.testQuality(FASTQ.java:218)
    at stream.FASTQ.isInterleaved(FASTQ.java:129)
    at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentRea at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:119)
    at jgi.ReformatReads.process(ReformatReads.java:377)
    m(ConcurrentReadInputStream.java:55)
    at jgi.ReformatReads.process(ReformatReads.java:377)
    at jgi.ReformatReads.main(ReformatReads.java:45)
    The version of BBMAP used is 37.64.
    Any advice on how to address this issue?
    This is part of a bigger pipeline and the intention is to apply it on 100s of files.
    Thanks in advance for your help.

    Leave a comment:


  • kmavrommatis
    replied
    reformat.sh Failed to auto-detect quality coding

    Hi,
    when I process some fastq files from SRA using the following command



    reformat.sh maxcalledquality=40 out=SRR1_R1.fq.gz qout=33 mincalledquality=2 out2=SRR1_R2.fq.gz qin=auto fixjunk=t in=/RNA-Seq/Raw/fastq.20190712/SRR1_1.fastq.gz in2=/RNA-Seq/Raw/fastq.20190712/SRR1_2.fastq.gz

    reformat crashes with the following error:


    Set INTERLEAVED to false
    Changed from ASCII-33 to ASCII-64 on input quality 64 for base N while prescanning.
    Changed from ASCII-64 to ASCII-33 on input quality 35 while prescanning.
    Exception in thread "main" java.lang.AssertionError: Failed to auto-detect quality coding; quitting. Please manually set qin=33 or qin=64.
    at stream.FASTQ.testQuality(FASTQ.java:218)
    at stream.FASTQ.isInterleaved(FASTQ.java:129)
    at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentRea at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:119)
    at jgi.ReformatReads.process(ReformatReads.java:377)
    m(ConcurrentReadInputStream.java:55)
    at jgi.ReformatReads.process(ReformatReads.java:377)
    at jgi.ReformatReads.main(ReformatReads.java:45)

    Any suggestions on how to address this problem?
    I would prefer to avoid checking each file individually with a different tool and setting the qin value for each. This is part of a bigger pipeline that is intented to be applied to 100s of samples.

    Thanks in advance for your help

    Leave a comment:


  • tamu_anand
    replied
    Has anyone used bbmap for QuantSeq Data Analysis (more precisely the QuantSeq FWD protocol). The Lexogen website recommends bbduk for quality trimming and suggests use of STAR for downstream analysis.

    Is it possible to do something similar (to STAR) with bbmap? In other words, is there an analogous bbmap command similar to how one does mapping with STAR (using the genome index and gtf together)?

    Thanks in advance.

    Leave a comment:


  • darencard
    replied
    Divide by 0 error in randomreads.sh

    I am having an issue with randomreads.sh that I cannot make sense of myself.

    I am using this tool to try to extract a random subset of a genome. Most tools subset by selecting some proportion of sequences, but I want to randomly sample pieces of randomly-sampled sequences. So read simulators seem to be the better option for this.

    In this case, I'm trying to sample a (giant!) salamander genome from NCBI. For now I just have some arbitrary length/number settings, as follows:

    randomreads.sh ref=GCA_002915635.2_ASM291563v2_genomic.fna out=test.fq reads=100 minlength=50000 maxlength=500000 seed=5 banns=t adderrors=f

    As the command shows, I do not want variants or errors added in at all; the sequences should be identical to the reference genome.

    Here is the output I'm getting, which indicates some sort of 'divide by 0' error. Hopefully someone can help me diagnose and overcome this issue.

    Executing align2.RandomReads3 [build=1, ref=GCA_002915635.2_ASM291563v2_genomic.fna, out=test.fq, reads=100, minlength=50000, maxlength=500000, seed=5, banns=t, adderrors=f]

    Writing reference.
    Executing dna.FastaToChromArrays2 [GCA_002915635.2_ASM291563v2_genomic.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=500, nodisk=false]

    Set genScaffoldInfo=true
    Exception in thread "main" java.lang.ArithmeticException: / by zero
    at align2.RandomReads3.fillRandomChrom(RandomReads3.java:1758)
    at align2.RandomReads3.<init>(RandomReads3.java:585)
    at align2.RandomReads3.main(RandomReads3.java:389)

    Thanks!
    Daren

    Leave a comment:


  • seqmore
    replied
    additional information:
    The outputs by both sort methods are list below. Sorry to put so mach words here. In fact I have been tortured by this error for several weeks but cannot figure it out by myself. I would be very grateful if GenoMax or Brian or anyone could shed light on this issue. Thanks a lot!

    sort method #1:
    $sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort

    $cufflinks bbmap.bam.sort -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
    Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    File bbmap.bam.sort doesn't appear to be a valid BAM file, trying SAM...
    [10:04:49] Loading reference annotation.
    [10:05:15] Inspecting reads and determining fragment length distribution.
    SAM error on line 148483: CIGAR op has zero length
    SAM error on line 171044: CIGAR op has zero length
    SAM error on line 172120: CIGAR op has zero length
    SAM error on line 173571: CIGAR op has zero length
    SAM error on line 186806: CIGAR op has zero length
    "#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV÷5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856éO?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
    SAM error on line 191159: CIGAR op has zero length
    SAM error on line 199865: CIGAR op has zero length
    SAM error on line 213871: CIGAR op has zero length
    > Processed 37488 loci. [*************************] 100%
    > Map Properties:
    > Normalized Map Mass: 0.00
    > Raw Map Mass: 0.00
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    [10:05:19] Estimating transcript abundances.
    SAM error on line 401632: CIGAR op has zero length
    SAM error on line 424193: CIGAR op has zero length
    SAM error on line 425269: CIGAR op has zero length
    SAM error on line 426720: CIGAR op has zero length
    SAM error on line 439955: CIGAR op has zero length
    "#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV÷5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856éO?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
    SAM error on line 444308: CIGAR op has zero length
    SAM error on line 453014: CIGAR op has zero length
    SAM error on line 467020: CIGAR op has zero length
    > Processed 37488 loci. [*************************] 100%


    $more ./cuff_out/transcripts.gtf
    1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
    "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
    "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

    sort method #2:
    $samtools sort -n bbmap.bam >bbmap.sortn.bam

    $cufflinks bbmap.sortn.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff.sortn
    Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
    [09:48:22] Loading reference annotation.
    [09:48:48] Inspecting reads and determining fragment length distribution.
    > Processed 37488 loci. [*************************] 100%
    > Map Properties:
    > Normalized Map Mass: 0.00
    > Raw Map Mass: 0.00
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    [09:48:53] Estimating transcript abundances.

    $ more ./cuff.sortn/transcripts.gtf
    1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
    "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
    "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    ............................
    Last edited by seqmore; 04-02-2019, 07:22 PM.

    Leave a comment:


  • seqmore
    replied
    @GenoMax, Your suggestion is great! I uninstall Samtools and reinstall 1.9. The samtools flagstat is working. Then I try output bam directly as you suggested, like this:
    bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa ambig=all xstag=unstranded xmtag=t maxindel=100k intronlen=10 in=a.fq out=bbmap.bam outu=unbbmap.fq

    Next, I perform cufflinks using the bam. The command line is
    cufflinks bbmap.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
    The std output:
    Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
    [09:21:15] Loading reference annotation.
    [09:21:42] Inspecting reads and determining fragment length distribution.
    > Processed 37488 loci. [*************************] 100%
    > Map Properties:
    > Normalized Map Mass: 0.00
    > Raw Map Mass: 0.00
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    [09:21:46] Estimating transcript abundances.
    > Processed 37488 loci. [*************************] 100%

    The transcripts.gtf looks strange with all FPKM=0
    $more ./cuff_out/transcripts.gtf
    1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
    "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
    000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
    1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
    "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

    I try two sort methods to sort the bam file, one is like "$sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort", and the other is "$samtools sort -n bbmap.bam >bbmap.sortn.bam". Both are failed to get FPKM values.
    However, I get normal FPKMs by Tophat2 using the same set of genome assembly and gtf annotation.

    Any comments or suggestions are greatly appreciated.

    Leave a comment:


  • GenoMax
    replied
    I am not sure I am understanding what seems to be happening. Is the flagstat command showing no reads aligned?

    At this point in time samtools 0.1.19 is ancient and should really NOT be used for anything. Errors you are seeing also are about samtools options that only the new versions have.

    You should upgrade to latest samtools which is now in v.1.9. As long as samtools is in your $PATH, BBMap is able to directly write BAM files so there is no need to create SAM files. Just specify out=yourfile.bam.

    Leave a comment:


  • seqmore
    replied
    Thank you for your kindly replay @GenoMax. I didnot specify ref= since I have copied the ref fold genereted by index building with bbmap to the current working directory. As I learned from your post, bbmap will automatically find the ref fold in the current directory. I also succeeded in this way for many times previously. Now, as you indicate, I rerun the command again with ref= specified, but I failed as above. I should mention that the screen output looks like normal, as shown below.
    So I'm confused. I would be really appreciate if you could clarify this issue. Thanks a lot in advance.

    The screen output during bbmap:
    Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, maxindel=100k, intronlen=10, in=a.fq, out=a.bb.sam, outu=a.unbbmap.fq, bs=script.sh]
    Version 38.22

    Retaining first best site only for ambiguous mappings.
    Found samtools 0.1.9
    Writing reference.
    Executing dna.FastaToChromArrays2 [/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

    Set genScaffoldInfo=true
    Writing chunk 1
    Writing chunk 2
    Writing chunk 3
    Writing chunk 4
    Writing chunk 5
    Writing chunk 6
    Writing chunk 7
    Set genome to 1

    Loaded Reference: 0.010 seconds.
    Loading index for chunk 1-7, build 1
    No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr1-3_index_k13_c2_b1.block
    No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr4-7_index_k13_c2_b1.block
    Indexing threads started for block 4-7
    Indexing threads started for block 0-3
    Indexing threads finished for block 0-3
    Indexing threads finished for block 4-7
    Generated Index: 213.256 seconds.
    Finished Writing: 19.955 seconds.
    Analyzed Index: 7.710 seconds.
    Started output stream: 0.045 seconds.
    Started output stream: 0.001 seconds.
    Cleared Memory: 0.241 seconds.
    Processing reads in single-ended mode.
    Started read stream.
    Started 56 mapping threads.
    Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55

    ------------------ Results ------------------

    Genome: 1
    Key Length: 13
    Max Indel: 100000
    Minimum Score Ratio: 0.56
    Mapping Mode: normal
    Reads Used: 1236379 (56305606 bases)

    Mapping: 163.878 seconds.
    Reads/sec: 7544.53
    kBases/sec: 343.58


    Read 1 data: pct reads num reads pct bases num bases

    mapped: 42.1017% 520537 41.2560% 23229413
    unambiguous: 28.9375% 357777 29.3317% 16515379
    ambiguous: 13.1642% 162760 11.9243% 6714034
    low-Q discards: 0.0000% 0 0.0000% 0

    perfect best site: 34.4476% 425903 34.5461% 19451376
    semiperfect site: 34.4530% 425970 34.5520% 19454715

    Match Rate: NA NA 45.7553% 22917698
    Error Rate: 7.7666% 94588 54.2440% 27169499
    Sub Rate: 7.4116% 90264 0.6028% 301949
    Del Rate: 0.7988% 9728 53.6224% 26858156
    Ins Rate: 0.3819% 4651 0.0188% 9394
    N Rate: 0.0053% 64 0.0007% 372
    Splice Rate: 0.4858% 5917 (splices at least 10 bp)

    Total time: 438.182 seconds.

    Leave a comment:


  • GenoMax
    replied
    @seqmore: I don't see you actually providing the sequence index you made in step 1 to your bbmap.sh command.

    It would be included using "path=dir_with_index" in

    Code:
    $bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh

    Leave a comment:


  • seqmore
    replied
    RNAseq data analysis failed with BBMAP

    Dear Brain,

    BBMAP is great for mapping coverage and mapping speed. I have tried several times but failed. The versions of bbmap and samtools are 38.22 and 0.1.9, respectively. My data is RNA seq generated using human cell lines. The command lines and output are listed below:

    bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa

    $bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh

    ## a.fq has been trimmed using trim-galore and dynamictrim. The sequencer is illumila hiseq.

    $samtools flagstat a.bbmap.sam
    [bam_header_read] EOF marker is absent.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 in total
    0 QC failure
    0 duplicates
    0 mapped (-nan%)
    0 paired in sequencing
    0 read1
    0 read2
    0 properly paired (-nan%)
    0 with itself and mate mapped
    0 singletons (-nan%)
    0 with mate mapped to a different chr
    0 with mate mapped to a different chr (mapQ>=5)


    $more a.bbmap.sam
    @HD VN:1.4 SO:unsorted
    @SQ SN:1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF LN:248956422
    @SQ SN:10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF LN:133797422
    @SQ SN:11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF LN:135086622
    @SQ SN:12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF LN:133275309
    @SQ SN:13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF LN:114364328
    @SQ SN:14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF LN:107043718
    @SQ SN:15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF LN:101991189
    @SQ SN:16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF LN:90338345
    .......................
    ......................[omit other lines]
    @PG ID:BBMap PN:BBMap VN:38.22 CL:java -Djava.library.path=/path/bbmap-38.22-1/jni/ -ea -Xmx158342m align2.BBMap
    build=1 overwrite=true fastareadlen=500 maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbma
    p.fq bs=script.sh
    E00603:213:HVLFGCCXY:1:1101:20172:9431 1:N:0:ACGGAACA 16 5 dna:chromosome chromosome:GRCh38:5:1:181538259:1 REF 14481853 42 44= * 0 0 CAGAAACAAGCAGGACCGGGCTTTGTCTCTTGGGCCCAGTACTG FA<JJJAJJJAJFA7FJJJJFJFJJJJJFJJJJF7JJFJJJFFF NM:i:0 AM:i:42 XM:i:1 NH:i:1
    E00603:213:HVLFGCCXY:1:1101:17056:10081 1:N:0:ACGGAACA 4 * 0 0 * * 0 0 AAGCAAGTCTTTATCTTTAGAATAAATGTAGT JJJJ7FAFFFFJJJJJAJJJJJJJJJJJJJ77
    .......................
    ......................[omit other lines]


    $sh script.sh
    Note: This script is designed to run with the amount of memory detected by BBMap.
    If Samtools crashes, please ensure you are running on the same platform as BBMap,
    or reduce Samtools' memory setting (the -m flag).
    Note: Please ignore any warnings about 'EOF marker is absent'; this is a bug in samtools that occurs when using piped input.
    [samopen] SAM header is present: 194 sequences.
    sort: invalid option -- '@'
    Parse error at line 197: invalid CIGAR character
    open: No such file or directory
    Aborted (core dumped)
    [bam_sort_core] fail to open file 3
    open: No such file or directory
    [bam_index_build2] fail to open the BAM file.


    Could you give some suggestions? Thanks a lot.

    Leave a comment:


  • aushev
    replied
    I'm wondering if anyone tried to apply callvariants.sh to RNA-seq data. When I tried to use it with my bam file, it found about 140,000 variants - but all of them are homozygous which is obviously impossible. I guess I should play with the parameters somehow...

    Leave a comment:


  • GenoMax
    replied
    Originally posted by ssully View Post
    How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.

    If your OS does not support shellscripts, replace 'bbmap.sh' like this:
    Code:
    java -XmxNNg -cp /path/to/current align2.BBMap in=reads.fq out=mapped.sam
    (NN will be a real number on your system).

    Leave a comment:


  • ssully
    replied
    How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.
    Last edited by ssully; 03-17-2019, 06:43 PM.

    Leave a comment:


  • chutfilz
    replied
    No dice, same error.

    dm3.fa is a file under the subdirectory "WholeGenomeFasta" in the file set downloaded from iGenomes.

    Also in this subdirectory are files ending in .dict, .fa.fai, and an xml for genome size. I only imported the .fa to my institution's server for mapping purposes.

    cat dm3.fa reveals unannotated sequence, as expected, as well as a few stretches of ~1,000 Ns.

    Leave a comment:


  • GenoMax
    replied
    I see that you are using samtools module as well. With that you can directly write BAM files no need to use SAM.

    So let us try a modified command line and see what happens (I am going to assume that you have ~30G of RAM and 4 cores available for this job in command below and the two fastq files are in the current directory). dm3.fa is just a multi-fasta file of Drosophila chromosomes?

    Code:
    bbmap.sh -Xmx30g threads=4 ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa in1=PoolCH-1_R1_001_val_1.fa.gz in2=PoolCH-1_R2_001_val_2.fa.gz out=PoolCH-1.bam ambig=random maxindel=10000 trd=t

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
1 view
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
57 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X