Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Poshi
    replied
    Originally posted by GenoMax View Post
    Files with this nomenclature have become available in last few years where technologies like 10x are creating separate files for index reads. While this does not solve the problem permanently, you could change "3:Y" to "2:Y" temporarily and then change it back after using reformat.sh.
    Sure. In fact, this is not a big deal to me. We decided that we will ignore this number and generate it in a more standard way (first end -> 1, second end -> 2, first UMI -> 3, second UMI -> 4), independently of their original numbering.

    For context: we are converting FastQ files into unmapped CRAMs for storage, and the FastQ to SAM intermediate conversion is done with reformat.sh. My main issue here is keeping the QC vendor flag in place.

    I already have a workaround, as I also have to keep other things like the UMIs (if present) and the barcode. But these bits of information imply adding tags, which are optional, so I'm not complaining about them. But the QC vendor flag is not optional. It is there. And not filling it means you are assigning a "QC vendor pass" independently of the information in the input.

    In any case, if someone wants to take a look at how to keep all information from the FastQ file into a SAM file, the four fields if the comment in the ID line are candidates:
    • The read end, which could be deducted later if you accept to standardize the output
    • The QC vendor flag, which will be coded in the FLAGS field
    • The control bits, which should be zero and potentially ignored (with no clear place to store them in case it is needed)
    • The index barcode, which should be stored as the BC:Z: tag


    The other data present in the FastQ file is already present in the SAM (read name, sequence and qualities).

    Leave a comment:


  • GenoMax
    replied
    That's good, but what happens with reads with other numbers like ' 3:Y:'? I'm having files with this nomenclature
    Files with this nomenclature have become available in last few years where technologies like 10x are creating separate files for index reads. While this does not solve the problem permanently, you could change "3:Y" to "2:Y" temporarily and then change it back after using reformat.sh.

    Leave a comment:


  • Poshi
    replied
    Chastity filter processing

    I posted this message as a ticket in BBmap repository, but given the fact that I saw very little movement there I'm crossposting the same issue here. I hope I'm not bothering anyone.

    When processing Illumina >1.8 reads, the reads are marked as filtered out or not. This is known as chastity filter. Usually, those reads are taken away and not used, but some times they are found in the FastQ files for some reason.

    When using the reformat.sh tool to convert FastQ files to SAM files, there's a parameter that allows us to discard reads that contains ' 1:Y:' or ' 2:Y:'. But when the reads are not discarded, they are included in the SAM file and this information is lost. And this is a bug, as there is a place in the SAM file to keep this information and with the current implementation the information is wrong.

    All reads whose chastity filter is 'Y' should have the SAM flag 512 set (which means that "read fails platform/vendor quality checks"). All other reads should have this flag not set. This should work also in the opposite direction, where a read with this flag set should generate a FastQ file with an 'Y'.

    Related to this bug I have another comment. Documentation for the chastityfilter parameter says that it will discard all reads with ' 1:Y:' or ' 2:Y:'. That's good, but what happens with reads with other numbers like ' 3:Y:'? I'm having files with this nomenclature, so it would be better to really parse the fields and discard reads with an 'Y' in the second field, keeping the first field as is.

    Did anyone had those issues? How did you overcome them?

    Leave a comment:


  • GenoMax
    replied
    Alba: You need to make sure there are no spaces between input options. Try the following.

    Code:
    reformat.sh in1=/media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2=/media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seq/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out=T3_interleaved.fastq

    Leave a comment:


  • albagg13
    replied
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0

    Hi everyone!

    I'm trying to use reformat.sh because I want to merge my forward.fastq and reverse.fastq sequences in an interleaved paired-end file. When using the following command I get this error, what is happening?

    reformat.sh in1 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out = T3_interleaved.fastq
    java -ea -Xmx200m -cp /home/alba/miniconda3/opt/bbmap-38.18/current/ jgi.ReformatReads in1 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out = T3_interleaved.fastq
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at shared.PreParser.<init>(PreParser.java:71)
    at shared.PreParser.<init>(PreParser.java:30)
    at jgi.ReformatReads.<init>(ReformatReads.java:55)
    at jgi.ReformatReads.main(ReformatReads.java:45)

    Thanks in advance!
    Alba

    Leave a comment:


  • pieterjanvc
    replied
    Inaccurate sampling rate output - reformat.sh

    Hello,

    I have been using the reformat.sh script for a while (nice stuff!) but am running into an issue.

    I need to get a specific number of reads from a file and am using the `--samplerate` option to do that. For example, if a file has 100 reads, and I need 10, I set the sample rate to 0.1. Unfortunately, it seems that for large files with very specific sample rates, the actual number of reads returned is not the product of the total reads and the sample rate. Here is an example output:

    Code:
    Executing jgi.ReformatReads [samplerate=0.5582187961,
     in1=SRR2976833.fastq.gz, in2=, out=tempFile1.fastq.gz]
    
    Input is being processed as unpaired
    Input:                          509774 reads            122293939 bases
    Processed:                      284893 reads            68330214 bases
    Output:                         284893 reads (55.89%)   68330214 bases (55.87%)
    
    Time:                           2.295 seconds.
    Reads Processed:        284k    124.12k reads/sec
    Bases Processed:      68330k    29.77m bases/sec
    As you can see, the sample rate is set at 0.5582187961. Using the total number of reads, that would be 509774 * 0.5582187961 = 284565 (rounded down) reads once finished. However, the total read count is 284893, with the percentage being 55.89% instead of 55.82 as set.

    Please let me know why this is happening and if there is a solution.

    Thanks!
    PJ

    Leave a comment:


  • pieterjanvc
    replied
    Inaccurate sampling rate

    Hello,

    I have been using the reformat.sh script for a while (nice stuff!) but am running into an issue.

    I need to get a specific number of reads from a file and am using the `--samplerate` option to do that. For example, if a file has 100 reads, and I need 10, I set the sample rate to 0.1. Unfortunately, it seems that for large files with very specific sample rates, the actual number of reads returned is not the product of the total reads and the sample rate. Here is an example output:

    Code:
    Executing jgi.ReformatReads [samplerate=0.5582187961,
     in1=SRR2976833.fastq.gz, in2=, out=tempFile1.fastq.gz]
    
    Input is being processed as unpaired
    Input:                          509774 reads            122293939 bases
    Processed:                      284893 reads            68330214 bases
    Output:                         284893 reads (55.89%)   68330214 bases (55.87%)
    
    Time:                           2.295 seconds.
    Reads Processed:        284k    124.12k reads/sec
    Bases Processed:      68330k    29.77m bases/sec
    As you can see, the sample rate is set at 0.5582187961. Using the total number of reads, that would be 509774 * 0.5582187961 = 284565 (rounded down) reads once finished. However, the total read count is 284893, with the percentage being 55.89% instead of 55.82 as set.

    Please let me know why this is happening and if there is a solution.

    Thanks!
    PJ

    Leave a comment:


  • GenoMax
    replied
    Did you move any of the bbmap folder contents after you downloaded and uncompressed bbmap code?

    Make sure the top level directory with BBMap is in your $PATH. Something like
    Code:
    export PATH=$PATH:/opt/science/BBMap
    would work.


    Originally posted by Oomjah View Post
    Hi,

    first time poster so my apologies for any horrid faux pas', and also for the thread necromancy!

    I'm trying to convert fastq's to an unmapped SAM (ultimately to a cram to test space saving and to check no loss of data when it's reconverted back to fastq's again) and I saw a thread suggesting BBMAP. However I'm having the same issue as pepe84.

    The command:
    reformat.sh in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam

    results in:
    "java -ea -Xm200m -cp /opt/science/BBMap/sh/current/ jgi.ReformatReads in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
    Error: Could not find or load main class jgi.ReformatReads"

    When I copy/paste the full java line quoted in the error but remove the space between "/current/" and "jgi.ReformatReads" I instead get:
    "Error: Could not find or load main class in=.opt.science.blah.fastq.gz"

    I've tried it with the fastq file both in and out of the BBMap directory to see if it would help, but got the same error.

    Any advice would be gratefully accepted

    Roy.

    Leave a comment:


  • quokka
    replied
    Hi,

    I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

    Here's some lines from the bam file:

    Code:
    HISEQ:378:C7F64ANXX:3:1207:13039:83924	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	15	60	125M	=	644	754	GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA	bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:1106:10647:86342	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	30	60	125M	=	669	764	ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG	aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:1208:17933:95359	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	32	60	125M	=	620	713	ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA	aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef	NM:i:11	MD:Z:14T2C9A1C23T8A0C11G3G23G10G10	MC:Z:125M	AS:i:70	XS:i:0
    HISEQ:378:C7F64ANXX:3:2305:17850:4846	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	52	60	125M	=	625	690	ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG	abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef	NM:i:9	MD:Z:7A1C23T8A0C11G3G23G10G30	MC:Z:117M8S	AS:i:80	XS:i:0
    HISEQ:378:C7F64ANXX:3:2205:15096:32122	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	59	60	125M	=	675	741	AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT	`aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO	NM:i:2	MD:Z:90T5A28	MC:Z:125M	AS:i:115	XS:i:0
    HISEQ:378:C7F64ANXX:3:1307:17567:99979	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	718	775	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:2211:20485:88833	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	654	711	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:1212:7300:28109	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	71	60	125M	=	636	690	CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC	bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff	NM:i:7	MD:Z:14T8A0C11G3G23G10G49	MC:Z:125M	AS:i:90	XS:i:0
    HISEQ:378:C7F64ANXX:3:2303:16430:40702	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	72	60	125M	=	694	747	TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC	abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:2116:16496:33002	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	95	60	125M	=	722	752	CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC	bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    Here's the command and STDERR (NB in this case I was using qin=64 qout=33 for troubleshooting but I get the same result without these flags):

    Code:
    /home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
    java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
    Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz]
    
    Could not find sambamba.
    Found samtools 1.8
    Input is being processed as unpaired
    Input:                  	464 reads          	58000 bases
    Output:                 	230 reads (49.57%) 	28750 bases (49.57%)
    
    Time:                         	0.634 seconds.
    Reads Processed:         464 	0.73k reads/sec
    Bases Processed:       58000 	0.09m bases/sec


    Here's some of the output:

    Code:
    @HISEQ:378:C7F64ANXX:3:2205:16922:87749
    CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
    @HISEQ:378:C7F64ANXX:3:1210:20568:23121
    AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
    @HISEQ:378:C7F64ANXX:3:2212:11893:40357
    GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
    @HISEQ:378:C7F64ANXX:3:2210:7117:7877
    GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

    Thanks for any advice.

    Cheers.

    Leave a comment:


  • Oomjah
    replied
    Could not find or load main class

    Hi,

    first time poster so my apologies for any horrid faux pas', and also for the thread necromancy!

    I'm trying to convert fastq's to an unmapped SAM (ultimately to a cram to test space saving and to check no loss of data when it's reconverted back to fastq's again) and I saw a thread suggesting BBMAP. However I'm having the same issue as pepe84.

    The command:
    reformat.sh in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam

    results in:
    "java -ea -Xm200m -cp /opt/science/BBMap/sh/current/ jgi.ReformatReads in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
    Error: Could not find or load main class jgi.ReformatReads"

    When I copy/paste the full java line quoted in the error but remove the space between "/current/" and "jgi.ReformatReads" I instead get:
    "Error: Could not find or load main class in=.opt.science.blah.fastq.gz"

    I've tried it with the fastq file both in and out of the BBMap directory to see if it would help, but got the same error.

    Any advice would be gratefully accepted

    Roy.

    Originally posted by pepe84 View Post
    here is the command:
    java -cp C:\BBMap\current\jgi.ReformatReads in=“C:\BBMap\resources\SRRXXXXX.fastq” out1=EFB_R1.fq out2=EFB_R2.fq

    And here is the error:
    Error: Could not find or load main class in=C:\BBMap\resources\SRRXXXXX.fastq

    Just an FYI I am using the command line on windows.

    Thanks, I appreciate any help

    Leave a comment:


  • quokka
    replied
    BBMap not retrieving fastq quality values from bam file

    Hi,

    I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

    Here's some lines from the bam file:

    Code:
    HISEQ:378:C7F64ANXX:3:1207:13039:83924	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	15	60	125M	=	644	754	GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA	bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:1106:10647:86342	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	30	60	125M	=	669	764	ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG	aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:1208:17933:95359	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	32	60	125M	=	620	713	ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA	aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef	NM:i:11	MD:Z:14T2C9A1C23T8A0C11G3G23G10G10	MC:Z:125M	AS:i:70	XS:i:0
    HISEQ:378:C7F64ANXX:3:2305:17850:4846	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	52	60	125M	=	625	690	ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG	abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef	NM:i:9	MD:Z:7A1C23T8A0C11G3G23G10G30	MC:Z:117M8S	AS:i:80	XS:i:0
    HISEQ:378:C7F64ANXX:3:2205:15096:32122	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	59	60	125M	=	675	741	AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT	`aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO	NM:i:2	MD:Z:90T5A28	MC:Z:125M	AS:i:115	XS:i:0
    HISEQ:378:C7F64ANXX:3:1307:17567:99979	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	718	775	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:2211:20485:88833	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	654	711	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:1212:7300:28109	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	71	60	125M	=	636	690	CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC	bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff	NM:i:7	MD:Z:14T8A0C11G3G23G10G49	MC:Z:125M	AS:i:90	XS:i:0
    HISEQ:378:C7F64ANXX:3:2303:16430:40702	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	72	60	125M	=	694	747	TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC	abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    HISEQ:378:C7F64ANXX:3:2116:16496:33002	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	95	60	125M	=	722	752	CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC	bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
    Here's the command and STDERR (NB in this case I was using qin=64 qout=33 for troubleshooting but I get the same result without these flags):

    Code:
    /home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
    java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
    Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz]
    
    Could not find sambamba.
    Found samtools 1.8
    Input is being processed as unpaired
    Input:                  	464 reads          	58000 bases
    Output:                 	230 reads (49.57%) 	28750 bases (49.57%)
    
    Time:                         	0.634 seconds.
    Reads Processed:         464 	0.73k reads/sec
    Bases Processed:       58000 	0.09m bases/sec


    Here's some of the output:

    Code:
    @HISEQ:378:C7F64ANXX:3:2205:16922:87749
    CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
    @HISEQ:378:C7F64ANXX:3:1210:20568:23121
    AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
    @HISEQ:378:C7F64ANXX:3:2212:11893:40357
    GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
    @HISEQ:378:C7F64ANXX:3:2210:7117:7877
    GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG
    +
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

    Thanks for any advice.

    Cheers.

    Leave a comment:


  • milw
    replied
    I'm confused about sam/bam options

    In version 37.52, the parameters under Sam and bam processing options are confusing to me
    Sam and bam processing options:

    mappedonly=f Toss unmapped reads.
    unmappedonly=f Toss mapped reads.
    pairedonly=f Toss reads that are not mapped as proper pairs.
    unpairedonly=f Toss reads that are mapped as proper pairs.
    primaryonly=f Toss secondary alignments. Set this to true for sam to fastq conversion.

    if 'mappedonly' is false, shouldn't that mean to KEEP unmapped plus mapped reads?
    Likewise, 'pairedonly' false (to me) means KEEP unpaired and paired

    In the end, I want my bam to only contain paired reads, so I've been running it with 'pairedonly=t' , but reformat.sh says 'input is being processed as unpaired' for my bam file.
    Last edited by milw; 02-26-2019, 11:31 AM.

    Leave a comment:


  • tolot27
    replied
    Originally posted by GenoMax View Post
    You could use `repair.sh` to separate the singletons out afterwards.
    Thanks for pointing me into this direction. Unfortunately, repair.sh did not produce well ordered files. Fortunately, bbsplitpairs.sh could be used instead of the reformat.sh/repair.sh combination and extracted the correct pairing reads as well as singletons into a separate file.

    Leave a comment:


  • GenoMax
    replied
    You could use `repair.sh` to separate the singletons out afterwards.

    Leave a comment:


  • tolot27
    replied
    deinterleave with singletons

    Hi!

    I have a interleaved fastq containing unmapped reads produced by segemehl -u. I want to deinterleave it into the two mate pair files as well as removing/saving the singletons into a separate file.

    Currently, reformat.sh cannot deal with it, even if I give outsingle= as parameter. The header contains the strand information (i. e. 2:N:0:2).

    Is there some way to get at least the pairing reads extracted without singletons in between?

    --
    Kind regards,
    Mathias

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    Yesterday, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:58 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:18 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:04 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-03-2024, 06:55 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Working...
X