Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa mem - paired reads


    I'm still a fair NGS newbie & I'm trying to run bwa mem (v0.7.9a) on some 91bp paired end sequences & get the following encouraging message:

    [M::main_mem] read 2197804 sequences (200000164 bp)...
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (10, 774878, 120, 4)

    Which after all the FF/FR/RF/RR checks ends with the following error:

    [mem_sam_pe] paired reads have different names: "6_1101_2244_58850_1", "6_1101_2244_58850_2"

    All the paired reads are generated from using process_radtags & seem like valid pairs as I look through them manually. Has anyone found a workaround for this or am I going to have to rename all of the sequences?


  • #2
    Normally, read 1 and read 2 would have the same name up to the first whitespace. So you could replace the last underscore with a space.

    I also have a fast program that will rename reads for you, if you want, here:
    Download BBMap for free. BBMap short read aligner, and other bioinformatic tools. This package includes BBMap, a short read aligner, as well as various other bioinformatic tools. It is written in pure Java, can run on any platform, and has no dependencies other than Java being installed (compiled for Java 6 and higher).

    Just run A sample command: in1=read1.fq in2=read2.fq out1=renamed1.fq out2=renamed2.fq prefix=xxx

    Then they will be renamed like this:
    xxx_1 /1
    xxx_1 /2
    xxx_2 /1
    xxx_2 /2


    • #3
      Thanks Brian. Looks like a great suite of tools. I'm sure I've done something wrong though as I can't seem to make it work. I've downloaded & unzipped into the /etc folder keeping the structure intact. However, when I try to run the following command from where the .fq files are located:

      /etc/bbmap/ in=stacks/samples/gc2364.Lib1.1.fq in2=stacks/samples/gc2364.Lib1.2.fq out=rename/gc2364.Lib1.1.r.fq out2=rename/gc2364.Lib1.2.r.fq

      I get the following error message:

      java -ea -Xmx1g -cp /etc/bbmap/current/ jgi.RenameReads in=stacks/samples/gc2364.Lib1.1.fq in2=stacks/samples/gc2364.Lib1.2.fq out=rename/gc2364.Lib1.1.r.fq out2=gc2364.Lib1.2.r.fq
      Exception in thread "main" java.lang.UnsupportedClassVersionError: jgi/RenameReads : Unsupported major.minor version 51.0
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(
      at Method)
      at java.lang.ClassLoader.loadClass(
      at sun.misc.Launcher$AppClassLoader.loadClass(
      at java.lang.ClassLoader.loadClass(
      Could not find the main class: jgi.RenameReads. Program will exit.

      I'm running Ubuntu 12.04 with openjdk-7-jre 7u55-2.4.7-1ubuntu1~, and openjdk-6-jre 6b31-1.13.3-1ubuntu1~ installed through Synaptic.

      Any ideas on what I've done wrong?

      Thanks again


      • #4

        This means that you're invoking an old version of Java, 1.6 or earlier, which is now several years old. Although you have both openjdk-7 and openjdk-6, only one can have priority in your path - and Java is backwards-compatible, so there's no reason to have old versions of Java installed. I have little experience with openjdk, so I don't know if it supports everything that the official JDK does.

        If you type "java -version", that will tell you the version that has priority. I suggest that you either uninstall jdk 6, or manually path to the "java" executable in openjdk-7, or install Oracle's JDK (7 or 8), or download the latest Java-6-compiled version of BBTools (click on 'files').


        • #5
          Ah, I thought it must've been something obvious like that. The whole thing worked a charm, so thanks for providing that package & for all your help.


          • #6
            Thank a lot...

            Hello Brian

            I had the same error from bwa mem, but I found out the reason was the sorting of the reads.

            I solved the problem using your tool in bbtools,
            that was just great and extremely fast

            thanks a again



            • #7
              I just came across this error myself and figured out a work-around. Steveped reported file names that should look familiar to anyone using Trimmomatic, as mine resembled those very closely. The error is just as Brian says, the missing space before the 1 and 2, designating the paired read number. BWA errors out because of this missing space.

              The tool from Brian also gave me an undesirable outcome: it shunted all my reads into the singletons output file, thus destroying paired-end information, apparently because of the header formatting. A quick sed find and replace solved the problem for me (adjust for read number):

              $ sed -i 's/\_1$/\ 1/g' <file_name.fastq>


              • #8
                Hi dcard,

                I'd like to clarify that is not a generic "repair" utility, but is designed specifically to "re-pair" reads that became disordered, and still have their original Illumina names (for example, after using fastx). It won't work with nonstandard names., however, is designed for the fixed function of changing the names of reads in a specific way; it's intended for situations where your reads are in the correct order (if they are paired) but don't have names in the normal Illumina pattern. So, they solve different problems.

                bbrename is fast, but has much more limited functionality than sed; so thanks for posting your ultimate solution!


                • #9
                  Hi all,

                  I am running into a similar problem where I am getting this error:

                  [mem_sam_pe] [mem_sam_pe] [mem_sam_pe] paired reads have different names: "M03721:10:000000000-AH7UG:1:1101:16268:1596", "M03721:10:000000000-AH7UG:1:1101:16959:1596"

                  paired reads have different names: "M03721:10:000000000-AH7UG:1:1101:13391:1606", "M03721:10:000000000-AH7UG:1:1101:17720:1609"

                  [mem_sam_pe] paired reads have different names: "M03721:10:000000000-AH7UG:1:1101:16099:1561", "M03721:10:000000000-AH7UG:1:1101:17123:1563"

                  I tried sorting my read, using the sed command sd suggested by dcard, also ran the tool from BBtools, but nothing seems to work.

                  Here is the command I am using:

                  ~/bin/0.7.10/bwa mem -M -v 1 -t 24 -R $readGroup -p $fasta 1.fastq 2.fastq 1> raw.sam 2> Logs_bwaAlign.txt

                  I am using 0.7.10 bwa for alignment. However, when I use an older version such as 0.7.5a or newwer version like 0.7.12, I do not get this error and alignments works fine. This is only the issue with 0.7.10 and it really baffles me. I cant seem to figure what is the issue here. If you can help me understand whats going on here, that would be great.

                  I am also attaching the fastqs with first few reads that are giving issues for you to look.

                  Thank you so much in advance.

                  Attached Files


                  • #10
                    If this issue is seen only with a specific version of bwa that is not the latest then I would say that you should use the latest bwa and not spend time on debugging this. It could be a real bug with that version of bwa unless you are not seeing this error with other datasets and that offending version of bwa.

                    Out of curiosity have you done grep's (grep -n) to display the line numbers of the ID's above from the two files and checked that they match.


                    • #11
                      Thanks GenoMax,

                      The reason I am curious to find is coz I have already around 60 samples that are aligned with bwa-0.7.10 and I wanted to align this sample also with 0.7.10 so that I am comparing everything on same page.

                      Yes, I check the line number of each ID's from both fastqs and they matched. And, thats why I am curious to know how can I debug this.

                      Thanks for your input.


                      • #12
                        While I understand your concern about maintaining version continuity I would not think that minor point version differences in bwa would affect your overall alignment results.

                        Based on the steps you have already taken it is baffling that you are still running into this error. Is there any chance you could ask whoever demultiplexed this data originally to do it again (or atleast give you a fresh copy) in case something subtle is corrupt with these sequence files.


                        • #13
                          Thanks, I will try to check with the guy who has multiplexed this data. Its definitely data related as my other samples ran fine.

                          Thanks again.


                          • #14
                            Hello Brian and punto_c
                            Thank you very much for sharing your software and experience! I fixed the "paired reads have different names" using in bbmap.


                            Latest Articles


                            • seqadmin
                              Exploring the Dynamics of the Tumor Microenvironment
                              by seqadmin

                              The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                              07-08-2024, 03:19 PM
                            • seqadmin
                              Exploring Human Diversity Through Large-Scale Omics
                              by seqadmin

                              In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                              06-25-2024, 06:43 AM





                            Topics Statistics Last Post
                            Started by seqadmin, Today, 07:20 AM
                            0 responses
                            Last Post seqadmin  
                            Started by seqadmin, 07-16-2024, 05:49 AM
                            0 responses
                            Last Post seqadmin  
                            Started by seqadmin, 07-15-2024, 06:53 AM
                            0 responses
                            Last Post seqadmin  
                            Started by seqadmin, 07-10-2024, 07:30 AM
                            0 responses
                            Last Post seqadmin