Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • syintel87
    Member
    • Dec 2012
    • 81

    Editing BAM file

    The below is the BAM file produced by running tophat.

    samtools view -h accepted_hits.bam
    @HD VN:1.0 SO:coordinate
    @SQ SN:Mt:3452:Mt3.5.1Chr1 LN:34297684
    @SQ SN:Mt:3453:Mt3.5.1Chr2 LN:34376035
    @SQ SN:Mt:3454:Mt3.5Chr3 LN:43359096
    @SQ SN:Mt:3455:Mt3.5.1Chr4 LN:47685368
    ...
    ...
    ...
    @SQ SN:Mt:3460:AC130806.58 LN:141498
    @SQ SN:Mt:3461:AC229707.11 LN:3289
    @SQ SN:Mt:3462:AC135504.61 LN:132492
    ...
    ...
    ...
    @PG ID:TopHat VN:2.0.4 CL:/usr/local/share/tophat-2.0.4.Linux_x86_64/tophat -p 6 -o IR_t1_Mt -G Mt.gff Mt 121108_I7001139_FCD1FBHACXX_L4_GSL_0040.06_1.fq,121108_I7001139_FCD1FBHACXX_L5_GSL_0040.06_1.fq,121108_I7001139_FCD
    1FBHACXX_L6_GSL_0040.06_1.fq,121108_I7001139_FCD1FBHACXX_L7_GSL_0040.06_1.fq,121108_I7001139_FCD1FBHACXX_L8_GSL_0040.06_1.fq
    FCD1FBHACXX:5:1114:18853:38567#GCCAATA 0 Mt:3452:Mt3.5.1Chr1 150172 50 100M * 0 0 CCCGGGCGCGTATTGCCGAAAAGAAGGTTTAGCTTCAGTTGTGCTACTCTAGACCTGCGGAAGACAGGTGCTGCATGCAAAGGCCTAAGTCCCGCGATGT bbbeeeeegggggi
    hhfigiiehdfffaegdfhiihhiiifcgegggeeeeeeddbddaaaYaa`bca__bbcbcbbcccccbaabb_bcdcbaX[aaa[ AS:i:0 X
    ...
    ...
    ...

    However, because of too many colons existing between SN and LN, it seems that cufflinks does not work.
    So, I think I will have to remove "Mt:3452:" part.

    My question is that "Could I use sed command to edit BAM file?"
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    Not directly. Something like

    Code:
    samtools view data.bam | sed whatever | samtools view -bSh > corrected.bam
    Would work.

    Comment

    • syintel87
      Member
      • Dec 2012
      • 81

      #3
      Originally posted by swbarnes2 View Post
      Not directly. Something like

      Code:
      samtools view data.bam | sed whatever | samtools view -bSh > corrected.bam
      Would work.
      Dear swbarnes2,

      Oh! Really helpful!! Thank you so much!!!!

      Comment

      • syintel87
        Member
        • Dec 2012
        • 81

        #4
        Originally posted by swbarnes2 View Post
        Not directly. Something like

        Code:
        samtools view data.bam | sed whatever | samtools view -bSh > corrected.bam
        Would work.

        When I put the command below,
        samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh > corrected.bam

        the result is the follwing:
        Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]

        Options: -b output BAM
        -h print header for the SAM output
        -H print header only (no alignments)
        -S input is SAM
        -u uncompressed BAM output (force -b)
        -x output FLAG in HEX (samtools-C specific)
        -X output FLAG in string (samtools-C specific)
        -c print only the count of matching records
        -t FILE list of reference names and lengths (force -S) [null]
        -T FILE reference sequence file (force -S) [null]
        -o FILE output file name [stdout]
        -R FILE list of read groups to be outputted [null]
        -f INT required flag, 0 for unset [0]
        -F INT filtering flag, 0 for unset [0]
        -q INT minimum mapping quality [0]
        -l STR only output reads in library STR [null]
        -r STR only output reads in read group STR [null]
        -? longer help


        I don't think it works.
        Could you see something wrong?

        Comment

        • maubp
          Peter (Biopython etc)
          • Jul 2009
          • 1544

          #5
          Originally posted by syintel87 View Post
          However, because of too many colons existing between SN and LN, it seems that cufflinks does not work.
          So, I think I will have to remove "Mt:3452:" part.
          If you are right and editing the reference names like that fixes things, you've found a bug in cufflinks - If so please report it so that cufflinks can be fixed.

          Comment

          • syintel87
            Member
            • Dec 2012
            • 81

            #6
            Originally posted by maubp View Post
            If you are right and editing the reference names like that fixes things, you've found a bug in cufflinks - If so please report it so that cufflinks can be fixed.
            Yes, while running cufflinks, the error message appeared:
            Error: sort order of reads in BAMs must be the same

            According to the tips given in another posting, I am trying editing BAM file.
            (http://seqanswers.com/forums/showthread.php?t=9304)

            Though I sent an email to cufflinks, I haven't received a reply yet.

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Originally posted by syintel87 View Post
              When I put the command below,
              samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh > corrected.bam


              I don't think it works.
              Could you see something wrong?
              Have you tried doing the three steps independently instead of a single command line?

              Comment

              • TobyH
                Junior Member
                • Aug 2012
                • 5

                #8
                Originally posted by syintel87 View Post
                When I put the command below,
                samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh > corrected.bam

                the result is the follwing:
                Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]

                Options: -b output BAM
                -h print header for the SAM output
                -H print header only (no alignments)
                -S input is SAM
                -u uncompressed BAM output (force -b)
                -x output FLAG in HEX (samtools-C specific)
                -X output FLAG in string (samtools-C specific)
                -c print only the count of matching records
                -t FILE list of reference names and lengths (force -S) [null]
                -T FILE reference sequence file (force -S) [null]
                -o FILE output file name [stdout]
                -R FILE list of read groups to be outputted [null]
                -f INT required flag, 0 for unset [0]
                -F INT filtering flag, 0 for unset [0]
                -q INT minimum mapping quality [0]
                -l STR only output reads in library STR [null]
                -r STR only output reads in read group STR [null]
                -? longer help


                I don't think it works.
                Could you see something wrong?
                If you're still having a problem with this, I believe that you need to specify for samtools view to take input from STDIN at the third step. This is done by a single '-' IIRC. So like this...

                samtools view acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh - > corrected.bam

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  It's likely the problem is due to trying to create a BAM file from input with no header (try a "samtools view -h acc_hits.bam" at the beginning of the command). You can't create a BAM file without a header.

                  Comment

                  • syintel87
                    Member
                    • Dec 2012
                    • 81

                    #10
                    It seems that errors do not occur by running the command below.

                    samtools view -h acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh - > corrected.bam

                    However, seeing "corrected.bam", there was no change.
                    "sed" command does not seem to be applied to editing bam file.

                    Comment

                    • swbarnes2
                      Senior Member
                      • May 2008
                      • 910

                      #11
                      Yes, I accidentally left out the dash in the second samtools view command. If nothing is changing, there's something wrong with the sed command. In general, piping in and out of samtools is the way to edit .bam files, it's better than making gigantic intermediate sam files that you are going to recompress anyway.

                      Comment

                      • syfo
                        Just a member
                        • Nov 2012
                        • 103

                        #12
                        Originally posted by swbarnes2 View Post
                        Yes, I accidentally left out the dash in the second samtools view command. If nothing is changing, there's something wrong with the sed command. In general, piping in and out of samtools is the way to edit .bam files, it's better than making gigantic intermediate sam files that you are going to recompress anyway.
                        Right.

                        Originally posted by syintel87 View Post
                        It seems that errors do not occur by running the command below.

                        samtools view -h acc_hits.bam | sed 's/Mt\:[0-9][0-9][0-9][0-9]\://' | samtools view -bSh - > corrected.bam

                        However, seeing "corrected.bam", there was no change.
                        "sed" command does not seem to be applied to editing bam file.
                        I am surprised that nothing changes. This "sed" command seems correct to me and it works on the text from your first post. Note that this "sed" command after the first pipe does not exactly edit the bam file but the text flow from the "samtools view" command.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                          Here are nine questions we think about, in roughly the order they matter, before...
                          06-18-2026, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-17-2026, 06:09 AM
                        0 responses
                        24 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        42 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        48 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...