Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge sam/bam file

    Hello all,
    I have pair-end reads (1.fq and 2.fq ) data and mapped them with 'bwa aln', I don't want to generate alignments in the SAM format in paired-end way with 'bwa sampe', but generate sam format with 'bwa samse' respectively, and then merge the two sam file

    bwa aln reference.fa 1.fq > 1.sai
    bwa aln reference.fa 2.fq > 2.sai
    bwa samse reference.fa 1.sai 1.fq > 1.sam
    bwa samse reference.fa 2.sai 2.fq > 2.sam

    How to merge 1.sam and 2.sam?

    can someone give me suggestions? thanks!

  • #2
    Originally posted by genelab View Post
    Hello all,
    I have pair-end reads (1.fq and 2.fq ) data and mapped them with 'bwa aln', I don't want to generate alignments in the SAM format in paired-end way with 'bwa sampe', but generate sam format with 'bwa samse' respectively, and then merge the two sam file

    bwa aln reference.fa 1.fq > 1.sai
    bwa aln reference.fa 2.fq > 2.sai
    bwa samse reference.fa 1.sai 1.fq > 1.sam
    bwa samse reference.fa 2.sai 2.fq > 2.sam

    How to merge 1.sam and 2.sam?

    can someone give me suggestions? thanks!
    How do you want them merged (what is your desired result)? What is your motivation for not using "bwa sampe"?

    Comment


    • #3
      Originally posted by nilshomer View Post
      How do you want them merged (what is your desired result)? What is your motivation for not using "bwa sampe"?

      I want them merged into one sam file which contains the mapping results of both the pair-ends,
      or
      Should i convert the sam two bam, and then merge the two?

      Comment


      • #4
        Originally posted by genelab View Post
        I want them merged into one sam file which contains the mapping results of both the pair-ends,
        or
        Should i convert the sam two bam, and then merge the two?
        Again, why not use "sampe"? Alternatively, if each paired-read read has the same name, you could sort by read name and run "samtools fixmate".

        Comment


        • #5
          Originally posted by nilshomer View Post
          Again, why not use "sampe"? Alternatively, if each paired-read read has the same name, you could sort by read name and run "samtools fixmate".
          I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
          These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

          In addtion, i don't need the pair-end information, and just need the mapping information.

          Thanks!

          Comment


          • #6
            Originally posted by genelab View Post
            I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
            These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

            In addtion, i don't need the pair-end information, and just need the mapping information.

            Thanks!
            Did you try "samtools merge"?

            Comment


            • #7
              Originally posted by nilshomer View Post
              Did you try "samtools merge"?
              "samtools merge" can merge the bam file, thanks!

              Comment


              • #8
                Originally posted by genelab View Post
                I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
                These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

                In addtion, i don't need the pair-end information, and just need the mapping information.

                Thanks!

                You can use sampe with -s option to disable Smith-Waterman alignments of unmapped mates. I think that's where you are getting those clipped sequences.

                Comment


                • #9
                  Originally posted by hl450 View Post
                  You can use sampe with -s option to disable Smith-Waterman alignments of unmapped mates. I think that's where you are getting those clipped sequences.

                  This is a great suggestion, sampe with -s option got the same bam records as the "samtools merge" result merged from two separate end sam/bam files.

                  Can i ask other question, what is the reason of the soft clipping? Our data is RNA-Seq fq reads, is the soft clipped come from the exon junctions?


                  Thanks for your great help!

                  Comment


                  • #10
                    I think sam files can be merged as :
                    cat sam1 sam2 > sam

                    and then sort it.

                    Am i right?

                    Comment


                    • #11
                      You'll end up with both sam file headers if you just use cat. You could do this:

                      Code:
                      cat sam1 <(grep -v '^@' sam2) > merged_sam.sam
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment


                      • #12
                        Originally posted by genelab View Post
                        Can i ask other question, what is the reason of the soft clipping? Our data is RNA-Seq fq reads, is the soft clipped come from the exon junctions?
                        I'm not sure if it's the only case but yes. BWA is not an RNA-seq mapper in a strict sense. It doesn't attempt to discover reads that align across splice junctions. It's also not a "local" aligner otherwise it wouldn't fail to align reads across exons that are shorter than the read lengths (which I've seen myself).

                        Not sure if you're aware of it but BWA has a new aligner packaged with it called 'bwa mem'. This aligner is more powerful for aligning RNA-seq data to a genome. It will soft-clip reads as well. If these aligners don't soft clip reads you'll end up losing a lot of data. Something like 20 or 30% of the exons in the mm10 gene annotation are shorter than 100 bp which is a pretty typical read length.
                        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                        Salk Institute for Biological Studies, La Jolla, CA, USA */

                        Comment


                        • #13
                          samtools merge

                          but which is the exact sintaxis when you have eight files bam and you want to have one file?

                          Comment


                          • #14
                            am creating a sam file, with 2 fastq files, using bwa
                            I apply the following command

                            bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                            V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz
                            > V350019555_L03_B5GHUMqcnrRAABA-556.sam



                            The alignment begins to be generated, but after 15 hours, the following appears, and it generates a 0-bit sam file

                            V350019555L3C006R0051051033 16 chr21 9651911 29 150M * TGTTTTCTCTTTTTACAGCAAGGACAGTTTATTAAACACACTGTTTTAAACCTTATTTTTTCTCTAAATAATATTTCATATAAGTCAATATGTAGACTTTATGTATTTTTAAAATGAAGACATAGCATTCCATTGAACTTAGACAAAAAT EFFFFFFEGFFFFGEFF9FEFFFFGGFCFFFFGFFFFFFDFFFFFFFFFFFFFGAFFFCGBFDF@G@FGFFGFGFFFFGFFFFGFGFCFFFFFFFDGFGFGFFGFFFFFAFFFFFFEFFFFFFFFFFGDFFFF@FG8FF?FFFFFFGFFF NM:i:1 MD:Z:23G126 AS:i:145 XS:i:131 XA:Z:chr4,+189704487,150M,4;
                            [main] Version: 0.7.17-r1188
                            [main] CMD: bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                            [main] Real time: 26995.284 sec; CPU: 33394.341 sec
                            zsh: command not found: V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz



                            I have previously analyzed the files with fasqc, and they seem correct


                            What is happening, what should I do?

                            Comment


                            • #15
                              bwa 2 files fasq to 1 fil sam

                              I am creating a sam file, with 2 fastq files, using bwa
                              I apply the following command

                              bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                              V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz
                              > V350019555_L03_B5GHUMqcnrRAABA-556.sam



                              The alignment begins to be generated, but after 15 hours, the following appears, and it generates a 0-bit sam file

                              V350019555L3C006R0051051033 16 chr21 9651911 29 150M * TGTTTTCTCTTTTTACAGCAAGGACAGTTTATTAAACACACTGTTTTAAACCTTATTTTTTCTCTAAATAATATTTCATATAAGTCAATATGTAGACTTTATGTATTTTTAAAATGAAGACATAGCATTCCATTGAACTTAGACAAAAAT EFFFFFFEGFFFFGEFF9FEFFFFGGFCFFFFGFFFFFFDFFFFFFFFFFFFFGAFFFCGBFDF@G@FGFFGFGFFFFGFFFFGFGFCFFFFFFFDGFGFGFFGFFFFFAFFFFFFEFFFFFFFFFFGDFFFF@FG8FF?FFFFFFGFFF NM:i:1 MD:Z:23G126 AS:i:145 XS:i:131 XA:Z:chr4,+189704487,150M,4;
                              [main] Version: 0.7.17-r1188
                              [main] CMD: bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                              [main] Real time: 26995.284 sec; CPU: 33394.341 sec
                              zsh: command not found: V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz


                              I have previously analyzed the files with fasqc, and they seem correct


                              What is happening, what should I do?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advanced Methods for the Detection of Infectious Disease
                                by seqadmin




                                The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                                ...
                                11-27-2023, 01:15 PM
                              • seqadmin
                                Strategies for Investigating the Microbiome
                                by seqadmin




                                Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                                11-09-2023, 07:02 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:23 AM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-01-2023, 09:55 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 11-30-2023, 10:48 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 11-29-2023, 08:26 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X