Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Introducing Reformat, a fast read format converter

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Not currently... I can add that, though. I'll make a note to do that. BBMap has an "idfilter" flag, though.

    Comment


    • #17
      Great - I've been using idfilter but it's pretty consuming to have to rerun the mapping (I'm dealing with ~10 lanes of data now).

      Comment


      • #18
        I just uploaded a new version of BBTools - 36.11 - that supports idfilter (and subfilter, editfilter, etc) in Reformat. Bear in mind that reads mapped using old-style cigar strings ('M' symbol instead of 'X' and '=') must also have MD tags. For newer cigar strings MD tags are not necessary. Unmapped reads will not be affected by this filter (they will pass the filter), so if you want to get rid of them you also need to set "mappedonly=t".

        Comment


        • #19
          Talk about a quick turnaround! Thanks a bunch BB.

          Comment


          • #20
            I am not sure this is a bug or not, but when I try to use reformat.sh (version 37.76) to add fake qualities of Q30 to a PacBio Sequel fastq file (produced with
            Code:
            bamtools convert -format fastq -in sequel.subreads.bam -out sequel.subreads.fastq
            ), which has default quality of "!", I don't get quality of ">" in the output rather "#".

            Code:
            /opt/bbmap/reformat.sh qin=33 qout=33 qfake=30 in=sequel.subreads.fastq out=sequel.subreads.fqual.fastq

            Comment


            • #21
              Hi,

              Can I use reformat or any other bbtools script to split my fasta file into sub-files?

              eg X.fa (100 sequences) -> X01.fa X02.fa....X10.fa (each with 10 sequences)?

              I don't mind whether I need to select the number of sequences per file or total number of files and it doesn't really matter what order the sequences are in as long as there is no duplication of sequences.

              Cheers,
              Dave

              Comment


              • #22
                faSplit from Jim Kent's utilities is a much better option for splitting fasta files.

                Run faSplit to look at inline help for multiple options available.

                Comment


                • #23
                  Reformat won't do that, but you can use partition.sh:

                  Code:
                  partition.sh in=X.fa out=X%.fa ways=10
                  That will produce 10 output files with an equal number of sequences and no duplication.

                  Comment


                  • #24
                    Hi Brian Bushnell,
                    when I used mapPacBio.sh for mapping pacbio reads. I met the errors as following:
                    Exception in thread "Thread-23" java.lang.AssertionError: Read 20, length 10550, exceeds the limit of 6019
                    You can map the reads in chunks by reformatting to fasta, then mapping with the setting 'fastareadlen=6019'
                    at align2.AbstractMapThread.run(AbstractMapThread.java:480)

                    But I did not find how I can reformat it.
                    Could you help me figure out this issue?
                    Thanks,
                    Fuyou

                    Comment


                    • #25
                      You can use
                      Code:
                      reformat.sh in=your_file.fastq out=newfile.fa
                      to convert the reads to fasta format.

                      That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?

                      Comment


                      • #26
                        Originally posted by GenoMax View Post
                        You can use
                        Code:
                        reformat.sh in=your_file.fastq out=newfile.fa
                        to convert the reads to fasta format.

                        That said I think mapPacBio.sh should automatically split reads longer than 6k when it does mapping. Is that not working?
                        It is not working. I used fasta format.
                        Thanks,
                        Fuyou

                        Comment


                        • #27
                          hello folks, I am trying to work on a FASTQ file using reformat.sh, although I have correctly installed Java and tested it in the command line, I still can't get it to work. It seems the problem is that I don't have the FASTQ file in the same directory as the BBMap folder, could that be an issue?

                          Comment


                          • #28
                            pepe84, do you provide a path to the file? Please copy your command as tried, and then copy the error message.
                            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                            Comment


                            • #29
                              here is the command:
                              java -cp C:\BBMap\current\jgi.ReformatReads in=“C:\BBMap\resources\SRRXXXXX.fastq” out1=EFB_R1.fq out2=EFB_R2.fq

                              And here is the error:
                              Error: Could not find or load main class in=C:\BBMap\resources\SRRXXXXX.fastq

                              Just an FYI I am using the command line on windows.

                              Thanks, I appreciate any help


                              Originally posted by SNPsaurus View Post
                              pepe84, do you provide a path to the file? Please copy your command as tried, and then copy the error message.

                              Comment


                              • #30
                                deinterleave with singletons

                                Hi!

                                I have a interleaved fastq containing unmapped reads produced by segemehl -u. I want to deinterleave it into the two mate pair files as well as removing/saving the singletons into a separate file.

                                Currently, reformat.sh cannot deal with it, even if I give outsingle= as parameter. The header contains the strand information (i. e. 2:N:0:2).

                                Is there some way to get at least the pairing reads extracted without singletons in between?

                                --
                                Kind regards,
                                Mathias

                                Comment

                                Working...
                                X