BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

kmavrommatis replied

12-09-2019, 02:30 PM
reformat.sh Failed to auto-detect quality coding

Hi,
when I am running reformat.sh on a set of fastq files downloaded from SRA

reformat.sh maxcalledquality=40 out=SRR1_R1.fq.gz qout=33 mincalledquality=2 out2=SRR1_R2.fq.gz qin=auto fixjunk=t in=/RNA-Seq/Raw/fastq.20190712/SRR1_1.fastq.gz in2=/RNA-Seq/Raw/fastq.20190712/SRR1_2.fastq.gz

The program crashes with the following error:

Set INTERLEAVED to false
Changed from ASCII-33 to ASCII-64 on input quality 64 for base N while prescanning.
Changed from ASCII-64 to ASCII-33 on input quality 35 while prescanning.
Exception in thread "main" java.lang.AssertionError: Failed to auto-detect quality coding; quitting. Please manually set qin=33 or qin=64.
at stream.FASTQ.testQuality(FASTQ.java:218)
at stream.FASTQ.isInterleaved(FASTQ.java:129)
at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentRea at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:119)
at jgi.ReformatReads.process(ReformatReads.java:377)
m(ConcurrentReadInputStream.java:55)
at jgi.ReformatReads.process(ReformatReads.java:377)
at jgi.ReformatReads.main(ReformatReads.java:45)

The version of BBMAP used is 37.64.
Any advice on how to address this issue?
This is part of a bigger pipeline and the intention is to apply it on 100s of files.
Thanks in advance for your help.
Leave a comment:
kmavrommatis replied

12-09-2019, 01:55 PM
reformat.sh Failed to auto-detect quality coding

Hi,
when I process some fastq files from SRA using the following command

reformat.sh maxcalledquality=40 out=SRR1_R1.fq.gz qout=33 mincalledquality=2 out2=SRR1_R2.fq.gz qin=auto fixjunk=t in=/RNA-Seq/Raw/fastq.20190712/SRR1_1.fastq.gz in2=/RNA-Seq/Raw/fastq.20190712/SRR1_2.fastq.gz

reformat crashes with the following error:

Set INTERLEAVED to false
Changed from ASCII-33 to ASCII-64 on input quality 64 for base N while prescanning.
Changed from ASCII-64 to ASCII-33 on input quality 35 while prescanning.
Exception in thread "main" java.lang.AssertionError: Failed to auto-detect quality coding; quitting. Please manually set qin=33 or qin=64.
at stream.FASTQ.testQuality(FASTQ.java:218)
at stream.FASTQ.isInterleaved(FASTQ.java:129)
at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentRea at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:119)
at jgi.ReformatReads.process(ReformatReads.java:377)
m(ConcurrentReadInputStream.java:55)
at jgi.ReformatReads.process(ReformatReads.java:377)
at jgi.ReformatReads.main(ReformatReads.java:45)

Any suggestions on how to address this problem?
I would prefer to avoid checking each file individually with a different tool and setting the qin value for each. This is part of a bigger pipeline that is intented to be applied to 100s of samples.

Thanks in advance for your help
Leave a comment:
tamu_anand replied

06-03-2019, 12:46 AM
Has anyone used bbmap for QuantSeq Data Analysis (more precisely the QuantSeq FWD protocol). The Lexogen website recommends bbduk for quality trimming and suggests use of STAR for downstream analysis.

Is it possible to do something similar (to STAR) with bbmap? In other words, is there an analogous bbmap command similar to how one does mapping with STAR (using the genome index and gtf together)?

Thanks in advance.
Leave a comment:
darencard replied

04-04-2019, 09:50 AM
Divide by 0 error in randomreads.sh

I am having an issue with randomreads.sh that I cannot make sense of myself.

I am using this tool to try to extract a random subset of a genome. Most tools subset by selecting some proportion of sequences, but I want to randomly sample pieces of randomly-sampled sequences. So read simulators seem to be the better option for this.

In this case, I'm trying to sample a (giant!) salamander genome from NCBI. For now I just have some arbitrary length/number settings, as follows:

randomreads.sh ref=GCA_002915635.2_ASM291563v2_genomic.fna out=test.fq reads=100 minlength=50000 maxlength=500000 seed=5 banns=t adderrors=f

As the command shows, I do not want variants or errors added in at all; the sequences should be identical to the reference genome.

Here is the output I'm getting, which indicates some sort of 'divide by 0' error. Hopefully someone can help me diagnose and overcome this issue.

Executing align2.RandomReads3 [build=1, ref=GCA_002915635.2_ASM291563v2_genomic.fna, out=test.fq, reads=100, minlength=50000, maxlength=500000, seed=5, banns=t, adderrors=f]

Writing reference.
Executing dna.FastaToChromArrays2 [GCA_002915635.2_ASM291563v2_genomic.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=500, nodisk=false]

Set genScaffoldInfo=true
Exception in thread "main" java.lang.ArithmeticException: / by zero
at align2.RandomReads3.fillRandomChrom(RandomReads3.java:1758)
at align2.RandomReads3.<init>(RandomReads3.java:585)
at align2.RandomReads3.main(RandomReads3.java:389)

Thanks!
Daren
Leave a comment:
seqmore replied

04-02-2019, 07:16 PM
additional information:
The outputs by both sort methods are list below. Sorry to put so mach words here. In fact I have been tortured by this error for several weeks but cannot figure it out by myself. I would be very grateful if GenoMax or Brian or anyone could shed light on this issue. Thanks a lot!

sort method #1:
$sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort

$cufflinks bbmap.bam.sort -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File bbmap.bam.sort doesn't appear to be a valid BAM file, trying SAM...
[10:04:49] Loading reference annotation.
[10:05:15] Inspecting reads and determining fragment length distribution.
SAM error on line 148483: CIGAR op has zero length
SAM error on line 171044: CIGAR op has zero length
SAM error on line 172120: CIGAR op has zero length
SAM error on line 173571: CIGAR op has zero length
SAM error on line 186806: CIGAR op has zero length
"#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV÷5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856éO?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
SAM error on line 191159: CIGAR op has zero length
SAM error on line 199865: CIGAR op has zero length
SAM error on line 213871: CIGAR op has zero length
> Processed 37488 loci. [*************************] 100%
> Map Properties:
> Normalized Map Mass: 0.00
> Raw Map Mass: 0.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[10:05:19] Estimating transcript abundances.
SAM error on line 401632: CIGAR op has zero length
SAM error on line 424193: CIGAR op has zero length
SAM error on line 425269: CIGAR op has zero length
SAM error on line 426720: CIGAR op has zero length
SAM error on line 439955: CIGAR op has zero length
"#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV÷5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856éO?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
SAM error on line 444308: CIGAR op has zero length
SAM error on line 453014: CIGAR op has zero length
SAM error on line 467020: CIGAR op has zero length
> Processed 37488 loci. [*************************] 100%

$more ./cuff_out/transcripts.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

sort method #2:
$samtools sort -n bbmap.bam >bbmap.sortn.bam

$cufflinks bbmap.sortn.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff.sortn
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[09:48:22] Loading reference annotation.
[09:48:48] Inspecting reads and determining fragment length distribution.
> Processed 37488 loci. [*************************] 100%
> Map Properties:
> Normalized Map Mass: 0.00
> Raw Map Mass: 0.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[09:48:53] Estimating transcript abundances.

$ more ./cuff.sortn/transcripts.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
............................

Last edited by seqmore; 04-02-2019, 07:22 PM.
Leave a comment:
seqmore replied

04-02-2019, 06:43 PM
@GenoMax, Your suggestion is great! I uninstall Samtools and reinstall 1.9. The samtools flagstat is working. Then I try output bam directly as you suggested, like this:
bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa ambig=all xstag=unstranded xmtag=t maxindel=100k intronlen=10 in=a.fq out=bbmap.bam outu=unbbmap.fq

Next, I perform cufflinks using the bam. The command line is
cufflinks bbmap.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
The std output:
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[09:21:15] Loading reference annotation.
[09:21:42] Inspecting reads and determining fragment length distribution.
> Processed 37488 loci. [*************************] 100%
> Map Properties:
> Normalized Map Mass: 0.00
> Raw Map Mass: 0.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[09:21:46] Estimating transcript abundances.
> Processed 37488 loci. [*************************] 100%

The transcripts.gtf looks strange with all FPKM=0
$more ./cuff_out/transcripts.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

I try two sort methods to sort the bam file, one is like "$sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort", and the other is "$samtools sort -n bbmap.bam >bbmap.sortn.bam". Both are failed to get FPKM values.
However, I get normal FPKMs by Tophat2 using the same set of genome assembly and gtf annotation.

Any comments or suggestions are greatly appreciated.
Leave a comment:
GenoMax replied

04-02-2019, 07:07 AM
I am not sure I am understanding what seems to be happening. Is the flagstat command showing no reads aligned?

At this point in time samtools 0.1.19 is ancient and should really NOT be used for anything. Errors you are seeing also are about samtools options that only the new versions have.

You should upgrade to latest samtools which is now in v.1.9. As long as samtools is in your $PATH, BBMap is able to directly write BAM files so there is no need to create SAM files. Just specify out=yourfile.bam.
Leave a comment:
seqmore replied

04-02-2019, 06:50 AM
Thank you for your kindly replay @GenoMax. I didnot specify ref= since I have copied the ref fold genereted by index building with bbmap to the current working directory. As I learned from your post, bbmap will automatically find the ref fold in the current directory. I also succeeded in this way for many times previously. Now, as you indicate, I rerun the command again with ref= specified, but I failed as above. I should mention that the screen output looks like normal, as shown below.
So I'm confused. I would be really appreciate if you could clarify this issue. Thanks a lot in advance.

The screen output during bbmap:
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, maxindel=100k, intronlen=10, in=a.fq, out=a.bb.sam, outu=a.unbbmap.fq, bs=script.sh]
Version 38.22

Retaining first best site only for ambiguous mappings.
Found samtools 0.1.9
Writing reference.
Executing dna.FastaToChromArrays2 [/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

Set genScaffoldInfo=true
Writing chunk 1
Writing chunk 2
Writing chunk 3
Writing chunk 4
Writing chunk 5
Writing chunk 6
Writing chunk 7
Set genome to 1

Loaded Reference: 0.010 seconds.
Loading index for chunk 1-7, build 1
No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr1-3_index_k13_c2_b1.block
No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 4-7
Indexing threads started for block 0-3
Indexing threads finished for block 0-3
Indexing threads finished for block 4-7
Generated Index: 213.256 seconds.
Finished Writing: 19.955 seconds.
Analyzed Index: 7.710 seconds.
Started output stream: 0.045 seconds.
Started output stream: 0.001 seconds.
Cleared Memory: 0.241 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 56 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55

------------------ Results ------------------

Genome: 1
Key Length: 13
Max Indel: 100000
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 1236379 (56305606 bases)

Mapping: 163.878 seconds.
Reads/sec: 7544.53
kBases/sec: 343.58

Read 1 data: pct reads num reads pct bases num bases

mapped: 42.1017% 520537 41.2560% 23229413
unambiguous: 28.9375% 357777 29.3317% 16515379
ambiguous: 13.1642% 162760 11.9243% 6714034
low-Q discards: 0.0000% 0 0.0000% 0

perfect best site: 34.4476% 425903 34.5461% 19451376
semiperfect site: 34.4530% 425970 34.5520% 19454715

Match Rate: NA NA 45.7553% 22917698
Error Rate: 7.7666% 94588 54.2440% 27169499
Sub Rate: 7.4116% 90264 0.6028% 301949
Del Rate: 0.7988% 9728 53.6224% 26858156
Ins Rate: 0.3819% 4651 0.0188% 9394
N Rate: 0.0053% 64 0.0007% 372
Splice Rate: 0.4858% 5917 (splices at least 10 bp)

Total time: 438.182 seconds.
Leave a comment:
GenoMax replied

04-02-2019, 03:54 AM
@seqmore: I don't see you actually providing the sequence index you made in step 1 to your bbmap.sh command.

It would be included using "path=dir_with_index" in

Code:

$bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh
Leave a comment:
seqmore replied

04-01-2019, 05:17 PM
RNAseq data analysis failed with BBMAP

Dear Brain,

BBMAP is great for mapping coverage and mapping speed. I have tried several times but failed. The versions of bbmap and samtools are 38.22 and 0.1.9, respectively. My data is RNA seq generated using human cell lines. The command lines and output are listed below:

bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa

$bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh

## a.fq has been trimmed using trim-galore and dynamictrim. The sequencer is illumila hiseq.

$samtools flagstat a.bbmap.sam
[bam_header_read] EOF marker is absent.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_flagstat_core] Truncated file? Continue anyway.
0 in total
0 QC failure
0 duplicates
0 mapped (-nan%)
0 paired in sequencing
0 read1
0 read2
0 properly paired (-nan%)
0 with itself and mate mapped
0 singletons (-nan%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)

$more a.bbmap.sam
@HD VN:1.4 SO:unsorted
@SQ SN:1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF LN:248956422
@SQ SN:10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF LN:133797422
@SQ SN:11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF LN:135086622
@SQ SN:12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF LN:133275309
@SQ SN:13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF LN:114364328
@SQ SN:14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF LN:107043718
@SQ SN:15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF LN:101991189
@SQ SN:16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF LN:90338345
.......................
......................[omit other lines]
@PG ID:BBMap PN:BBMap VN:38.22 CL:java -Djava.library.path=/path/bbmap-38.22-1/jni/ -ea -Xmx158342m align2.BBMap
build=1 overwrite=true fastareadlen=500 maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbma
p.fq bs=script.sh
E00603:213:HVLFGCCXY:1:1101:20172:9431 1:N:0:ACGGAACA 16 5 dna:chromosome chromosome:GRCh38:5:1:181538259:1 REF 14481853 42 44= * 0 0 CAGAAACAAGCAGGACCGGGCTTTGTCTCTTGGGCCCAGTACTG FA<JJJAJJJAJFA7FJJJJFJFJJJJJFJJJJF7JJFJJJFFF NM:i:0 AM:i:42 XM:i:1 NH:i:1
E00603:213:HVLFGCCXY:1:1101:17056:10081 1:N:0:ACGGAACA 4 * 0 0 * * 0 0 AAGCAAGTCTTTATCTTTAGAATAAATGTAGT JJJJ7FAFFFFJJJJJAJJJJJJJJJJJJJ77
.......................
......................[omit other lines]

$sh script.sh
Note: This script is designed to run with the amount of memory detected by BBMap.
If Samtools crashes, please ensure you are running on the same platform as BBMap,
or reduce Samtools' memory setting (the -m flag).
Note: Please ignore any warnings about 'EOF marker is absent'; this is a bug in samtools that occurs when using piped input.
[samopen] SAM header is present: 194 sequences.
sort: invalid option -- '@'
Parse error at line 197: invalid CIGAR character
open: No such file or directory
Aborted (core dumped)
[bam_sort_core] fail to open file 3
open: No such file or directory
[bam_index_build2] fail to open the BAM file.

Could you give some suggestions? Thanks a lot.
Leave a comment:
aushev replied

03-29-2019, 06:44 AM
I'm wondering if anyone tried to apply callvariants.sh to RNA-seq data. When I tried to use it with my bam file, it found about 140,000 variants - but all of them are homozygous which is obviously impossible. I guess I should play with the parameters somehow...
Leave a comment:
GenoMax replied

03-18-2019, 03:50 AM
Originally posted by ssully View Post

How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.

If your OS does not support shellscripts, replace 'bbmap.sh' like this:

Code:

java -XmxNNg -cp /path/to/current align2.BBMap in=reads.fq out=mapped.sam

(NN will be a real number on your system).
Leave a comment:
ssully replied

03-17-2019, 06:41 PM
How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.

Last edited by ssully; 03-17-2019, 06:43 PM.
Leave a comment:
chutfilz replied

01-25-2019, 11:38 AM
No dice, same error.

dm3.fa is a file under the subdirectory "WholeGenomeFasta" in the file set downloaded from iGenomes.

Also in this subdirectory are files ending in .dict, .fa.fai, and an xml for genome size. I only imported the .fa to my institution's server for mapping purposes.

cat dm3.fa reveals unannotated sequence, as expected, as well as a few stretches of ~1,000 Ns.
Leave a comment:
GenoMax replied

01-25-2019, 04:41 AM
I see that you are using samtools module as well. With that you can directly write BAM files no need to use SAM.

So let us try a modified command line and see what happens (I am going to assume that you have ~30G of RAM and 4 cores available for this job in command below and the two fastq files are in the current directory). dm3.fa is just a multi-fasta file of Drosophila chromosomes?

Code:

bbmap.sh -Xmx30g threads=4 ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa in1=PoolCH-1_R1_001_val_1.fa.gz in2=PoolCH-1_R2_001_val_2.fa.gz out=PoolCH-1.bam ambig=random maxindel=10000 trd=t
Leave a comment:

Previous 1 2 3 4 5 6 7 14 34 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 1 view 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News