You could use `repair.sh` to separate the singletons out afterwards.
Announcement
Collapse
Welcome to the New Seqanswers!
Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less
Introducing Reformat, a fast read format converter
Collapse
X
-
Originally posted by GenoMax View PostYou could use `repair.sh` to separate the singletons out afterwards.
Comment
-
I'm confused about sam/bam options
In version 37.52, the parameters under Sam and bam processing options are confusing to me
Sam and bam processing options:
mappedonly=f Toss unmapped reads.
unmappedonly=f Toss mapped reads.
pairedonly=f Toss reads that are not mapped as proper pairs.
unpairedonly=f Toss reads that are mapped as proper pairs.
primaryonly=f Toss secondary alignments. Set this to true for sam to fastq conversion.
if 'mappedonly' is false, shouldn't that mean to KEEP unmapped plus mapped reads?
Likewise, 'pairedonly' false (to me) means KEEP unpaired and paired
In the end, I want my bam to only contain paired reads, so I've been running it with 'pairedonly=t' , but reformat.sh says 'input is being processed as unpaired' for my bam file.Last edited by milw; 02-26-2019, 11:31 AM.Scott Monsma
Sr Scientist at Lucigen
Comment
-
Could not find or load main class
Hi,
first time poster so my apologies for any horrid faux pas', and also for the thread necromancy!
I'm trying to convert fastq's to an unmapped SAM (ultimately to a cram to test space saving and to check no loss of data when it's reconverted back to fastq's again) and I saw a thread suggesting BBMAP. However I'm having the same issue as pepe84.
The command:
reformat.sh in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
results in:
"java -ea -Xm200m -cp /opt/science/BBMap/sh/current/ jgi.ReformatReads in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
Error: Could not find or load main class jgi.ReformatReads"
When I copy/paste the full java line quoted in the error but remove the space between "/current/" and "jgi.ReformatReads" I instead get:
"Error: Could not find or load main class in=.opt.science.blah.fastq.gz"
I've tried it with the fastq file both in and out of the BBMap directory to see if it would help, but got the same error.
Any advice would be gratefully accepted
Roy.
Originally posted by pepe84 View Posthere is the command:
java -cp C:\BBMap\current\jgi.ReformatReads in=“C:\BBMap\resources\SRRXXXXX.fastq” out1=EFB_R1.fq out2=EFB_R2.fq
And here is the error:
Error: Could not find or load main class in=C:\BBMap\resources\SRRXXXXX.fastq
Just an FYI I am using the command line on windows.
Thanks, I appreciate any help
Comment
-
Hi,
I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
Here's some lines from the bam file:
Code:HISEQ:378:C7F64ANXX:3:1207:13039:83924 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 15 60 125M = 644 754 GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[ NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:1106:10647:86342 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 30 60 125M = 669 764 ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:1208:17933:95359 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 32 60 125M = 620 713 ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef NM:i:11 MD:Z:14T2C9A1C23T8A0C11G3G23G10G10 MC:Z:125M AS:i:70 XS:i:0 HISEQ:378:C7F64ANXX:3:2305:17850:4846 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 52 60 125M = 625 690 ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef NM:i:9 MD:Z:7A1C23T8A0C11G3G23G10G30 MC:Z:117M8S AS:i:80 XS:i:0 HISEQ:378:C7F64ANXX:3:2205:15096:32122 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 59 60 125M = 675 741 AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT `aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO NM:i:2 MD:Z:90T5A28 MC:Z:125M AS:i:115 XS:i:0 HISEQ:378:C7F64ANXX:3:1307:17567:99979 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 68 60 125M = 718 775 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:2211:20485:88833 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 68 60 125M = 654 711 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:1212:7300:28109 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 71 60 125M = 636 690 CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff NM:i:7 MD:Z:14T8A0C11G3G23G10G49 MC:Z:125M AS:i:90 XS:i:0 HISEQ:378:C7F64ANXX:3:2303:16430:40702 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 72 60 125M = 694 747 TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0 HISEQ:378:C7F64ANXX:3:2116:16496:33002 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 95 60 125M = 722 752 CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0
Code:/home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz] Could not find sambamba. Found samtools 1.8 Input is being processed as unpaired Input: 464 reads 58000 bases Output: 230 reads (49.57%) 28750 bases (49.57%) Time: 0.634 seconds. Reads Processed: 464 0.73k reads/sec Bases Processed: 58000 0.09m bases/sec
Here's some of the output:
Code:@HISEQ:378:C7F64ANXX:3:2205:16922:87749 CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ @HISEQ:378:C7F64ANXX:3:1210:20568:23121 AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ @HISEQ:378:C7F64ANXX:3:2212:11893:40357 GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ @HISEQ:378:C7F64ANXX:3:2210:7117:7877 GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
Thanks for any advice.
Cheers.
Comment
-
Did you move any of the bbmap folder contents after you downloaded and uncompressed bbmap code?
Make sure the top level directory with BBMap is in your $PATH. Something likeCode:export PATH=$PATH:/opt/science/BBMap
Originally posted by Oomjah View PostHi,
first time poster so my apologies for any horrid faux pas', and also for the thread necromancy!
I'm trying to convert fastq's to an unmapped SAM (ultimately to a cram to test space saving and to check no loss of data when it's reconverted back to fastq's again) and I saw a thread suggesting BBMAP. However I'm having the same issue as pepe84.
The command:
reformat.sh in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
results in:
"java -ea -Xm200m -cp /opt/science/BBMap/sh/current/ jgi.ReformatReads in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
Error: Could not find or load main class jgi.ReformatReads"
When I copy/paste the full java line quoted in the error but remove the space between "/current/" and "jgi.ReformatReads" I instead get:
"Error: Could not find or load main class in=.opt.science.blah.fastq.gz"
I've tried it with the fastq file both in and out of the BBMap directory to see if it would help, but got the same error.
Any advice would be gratefully accepted
Roy.
Comment
-
Inaccurate sampling rate output - reformat.sh
Hello,
I have been using the reformat.sh script for a while (nice stuff!) but am running into an issue.
I need to get a specific number of reads from a file and am using the `--samplerate` option to do that. For example, if a file has 100 reads, and I need 10, I set the sample rate to 0.1. Unfortunately, it seems that for large files with very specific sample rates, the actual number of reads returned is not the product of the total reads and the sample rate. Here is an example output:
Code:Executing jgi.ReformatReads [samplerate=0.5582187961, in1=SRR2976833.fastq.gz, in2=, out=tempFile1.fastq.gz] Input is being processed as unpaired Input: 509774 reads 122293939 bases Processed: 284893 reads 68330214 bases Output: 284893 reads (55.89%) 68330214 bases (55.87%) Time: 2.295 seconds. Reads Processed: 284k 124.12k reads/sec Bases Processed: 68330k 29.77m bases/sec
Please let me know why this is happening and if there is a solution.
Thanks!
PJ
Comment
-
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
Hi everyone!
I'm trying to use reformat.sh because I want to merge my forward.fastq and reverse.fastq sequences in an interleaved paired-end file. When using the following command I get this error, what is happening?
reformat.sh in1 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out = T3_interleaved.fastq
java -ea -Xmx200m -cp /home/alba/miniconda3/opt/bbmap-38.18/current/ jgi.ReformatReads in1 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out = T3_interleaved.fastq
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at shared.PreParser.<init>(PreParser.java:71)
at shared.PreParser.<init>(PreParser.java:30)
at jgi.ReformatReads.<init>(ReformatReads.java:55)
at jgi.ReformatReads.main(ReformatReads.java:45)
Thanks in advance!
Alba
Comment
-
Alba: You need to make sure there are no spaces between input options. Try the following.
Code:reformat.sh in1=/media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2=/media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seq/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out=T3_interleaved.fastq
Comment
-
Chastity filter processing
I posted this message as a ticket in BBmap repository, but given the fact that I saw very little movement there I'm crossposting the same issue here. I hope I'm not bothering anyone.
When processing Illumina >1.8 reads, the reads are marked as filtered out or not. This is known as chastity filter. Usually, those reads are taken away and not used, but some times they are found in the FastQ files for some reason.
When using the reformat.sh tool to convert FastQ files to SAM files, there's a parameter that allows us to discard reads that contains ' 1:Y:' or ' 2:Y:'. But when the reads are not discarded, they are included in the SAM file and this information is lost. And this is a bug, as there is a place in the SAM file to keep this information and with the current implementation the information is wrong.
All reads whose chastity filter is 'Y' should have the SAM flag 512 set (which means that "read fails platform/vendor quality checks"). All other reads should have this flag not set. This should work also in the opposite direction, where a read with this flag set should generate a FastQ file with an 'Y'.
Related to this bug I have another comment. Documentation for the chastityfilter parameter says that it will discard all reads with ' 1:Y:' or ' 2:Y:'. That's good, but what happens with reads with other numbers like ' 3:Y:'? I'm having files with this nomenclature, so it would be better to really parse the fields and discard reads with an 'Y' in the second field, keeping the first field as is.
Did anyone had those issues? How did you overcome them?
Comment
-
That's good, but what happens with reads with other numbers like ' 3:Y:'? I'm having files with this nomenclature
Comment
-
Originally posted by GenoMax View PostFiles with this nomenclature have become available in last few years where technologies like 10x are creating separate files for index reads. While this does not solve the problem permanently, you could change "3:Y" to "2:Y" temporarily and then change it back after using reformat.sh.
For context: we are converting FastQ files into unmapped CRAMs for storage, and the FastQ to SAM intermediate conversion is done with reformat.sh. My main issue here is keeping the QC vendor flag in place.
I already have a workaround, as I also have to keep other things like the UMIs (if present) and the barcode. But these bits of information imply adding tags, which are optional, so I'm not complaining about them. But the QC vendor flag is not optional. It is there. And not filling it means you are assigning a "QC vendor pass" independently of the information in the input.
In any case, if someone wants to take a look at how to keep all information from the FastQ file into a SAM file, the four fields if the comment in the ID line are candidates:- The read end, which could be deducted later if you accept to standardize the output
- The QC vendor flag, which will be coded in the FLAGS field
- The control bits, which should be zero and potentially ignored (with no clear place to store them in case it is needed)
- The index barcode, which should be stored as the BC:Z: tag
The other data present in the FastQ file is already present in the SAM (read name, sequence and qualities).
Comment
Comment