Finding mapped rate for rpkm output
Hi Brian,
I'm running BBMap with the rpkm output option and would like to know how to see mapped rate for each read file. I run multiple files consecutively with nohup. Here's my code:
bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/A_1.clean.fq.gz \
in2=data/clean/A_2.clean.fq.gz \
rpkm=data/fpkm/A.fpkm \
t=5 &
bbmap.sh ref=data/Assembly.fasta \
in1=data/clean/B_1.clean.fq.gz \
in2=data/clean/B_2.clean.fq.gz \
rpkm=data/fpkm/B.fpkm \
t=5
etc etc
So far it's worked really well. However, while the stdout file shows the mapped rates it doesn't tell me which read files relate to which stats. It just has a number of repeated --- Results 1 ---- records and I don't know which is which.
Is there a flag I need to add to ensure I can see which stats are for which files?
Thanks!
Lisa
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
pileup.sh explication
Hello,
Can some please kindly explain the output file of pileup.sh ?- basecov
- bincove
- covstat
How the coverage is calculated ?
Leave a comment:
-
Originally posted by HESmith View Post"Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files."
Typically, BBMap tools keep paired reads together. You're attempting to write aligned and unaligned reads to separate files, which violates that function.
Leave a comment:
-
@mewu3: Since paired-end reads are aligned together you should use a single "out=output.sam". If you wanted to capture unmapped reads into separate files then you would want to do that as "outu1=R1.unmapped.fq outu2=R2.unmapped.fq".
You may be able to write them out as an unmapped sam file "outu=unmapped.sam" but then again you should use only one output for that. This is untested.
Leave a comment:
-
"Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files."
Typically, BBMap tools keep paired reads together. You're attempting to write aligned and unaligned reads to separate files, which violates that function.
Leave a comment:
-
[help]
Hello,
I am using bbmap on HPC and I get the fallowing message :
Aligning C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001 reads to fasta file ...
java -Djava.library.path=/opt/apps/bbtools-37.97/jni/ -ea -Xmx50G -cp /opt/apps/bbtools-37.97/current/ align2.BBWrap build=1 overwrite=true fastareadlen=500 build=1 in1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_se_trim.fastq.gz in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz,null trimreaddescriptions=t outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam outu1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_se.sam outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam threads=8 pairlen=1000 pairedonly=t minid=0.9 mdtag=t xstag=fs nmtag=t sam=1.3 ambiguous=best secondary=t saa=f maxsites=10 -Xmx50G
Executing align2.BBWrap [build=1, overwrite=true, fastareadlen=500, build=1, in1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_se_trim.fastq.gz, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz,null, trimreaddescriptions=t, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, outu1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_se.sam, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, -Xmx50G]
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, build=1, trimreaddescriptions=t, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, in=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz, outu=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam]
Version 37.97 [build=1, overwrite=true, fastareadlen=500, build=1, trimreaddescriptions=t, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, in=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz, outu=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam]
Set threads to 8
Retaining first best site only for ambiguous mappings.
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.816
Set genome to 1
Loaded Reference: 3.463 seconds.
Loading index for chunk 1-1, build 1
Generated Index: 2.556 seconds.
Analyzed Index: 7.028 seconds.
Started output stream: 0.043 seconds.
Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files.
at stream.ReadStreamWriter.<init>(ReadStreamWriter.java:51)
at stream.ReadStreamByteWriter.<init>(ReadStreamByteWriter.java:17)
at stream.ConcurrentGenericReadOutputStream.<init>(ConcurrentGenericReadOutputStream.java:40)
at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:52)
at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:29)
at align2.AbstractMapper.openStreams(AbstractMapper.java:873)
at align2.BBMap.testSpeed(BBMap.java:437)
at align2.BBMap.main(BBMap.java:34)
at align2.BBWrap.execute(BBWrap.java:144)
at align2.BBWrap.main(BBWrap.java:22)
mewu3
Leave a comment:
-
bbmap.sh unfished job
Hello, Brian,
I am wondering whether bbmap.sh could resume an unfished job.
mewu3
Leave a comment:
-
Originally posted by pck0 View PostHey, did a little test run mapping sequences against a reference fasta that contains two identical sequences called >one and >two, then repeated the process with the sequences renamed to >three and >four. The amount of reads mapping to each were as follows:
1st run:
>one: 47,699
>two: 330
2nd run:
>three: 47,688
>four: 338
BBmap options were all default, just minidentity = 90 and T=12
How come that (1) BBmap apparently misses some reads that map on the first sequence and then maps them on the second, identical sequence, and (2) how come the runs give different results?
Just curious, my apologies if this was addressed elsewhere!
cheers
Fortunately, BBMap does have an option to run in deterministic mode.
Code:deterministic=f Run in deterministic mode. In this case it is good to set averagepairdist. BBMap is deterministic without this flag if using single-ended reads, or run singlethreaded.
Leave a comment:
-
Is mapping stochastic?
Hey, did a little test run mapping sequences against a reference fasta that contains two identical sequences called >one and >two, then repeated the process with the sequences renamed to >three and >four. The amount of reads mapping to each were as follows:
1st run:
>one: 47,699
>two: 330
2nd run:
>three: 47,688
>four: 338
BBmap options were all default, just minidentity = 90 and T=12
How come that (1) BBmap apparently misses some reads that map on the first sequence and then maps them on the second, identical sequence, and (2) how come the runs give different results?
Just curious, my apologies if this was addressed elsewhere!
cheers
Leave a comment:
-
mapPacBio strange behavior
Hi there, I really like this tool for Illumina reads and am trying it on some PacBio/Nanopore reads using the mapPacBio.sh function.
I am "successfully" getting aligned bam files out the other end, but the log file lists >99% of reads as Low-Q Discards. I also get unmapped fastq files out (I do this for my own sanity) and these are usually larger than the input files themselves.
Should I be suspicious of this behavior? I have also tried using the low-quality data suggestion (setting the key to a lower value, adjusting the minimum score ratio, etc), but this did not result in any improvement. I know the input files themselves - subread fastq.gz - are "good enough" for Canu assembly but for I'd also like to align the reads for consensus sequence generation.
Thanks!
Leave a comment:
-
@Robinsleith: I suggest that you post this as a ticket on BBMap SF page. Brian no longer visits SeqAnswers regularly. He would be the only person who can answer this.
Leave a comment:
-
Change in minid implementation?
Hello, I recently noticed that newer versions >=38.68 do not seem to implement minid in the same way as previous versions.
For the same command (same minid, 99), same reads, same reference, with versions <38.68 I get 0 mated pairs mapped, 4 Read 1 mapped, and 6 Read 2 mapped.
Code:bbmap.sh nodisk minid=99 mappedonly=T threads=4 ref=parcu18S.fasta in=LKH421_FPE_q24_minlen100.fastq.gz in2=LKH421_RPE_q24_minlen100.fastq.gz out=stdout.sam | reformat.sh in=stdin.sam out=stdout.sam minlength=80 | samtools view -h -b -S | samtools sort >67_99parcu.bam
Is the minid command just not working in newer versions or is there a new detail I am missing?
Thanks!
Leave a comment:
-
Did you try the suggested "Please manually set qin=33 or qin=64"?
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: