Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding BAM format.

    Hi,

    I have this output in BAM format.

    NA06984-SRR006041.1145152 1040 1 113040605 57 325M * 0 0 TTGATCACTTCACACACATCTTCATCGATGAGGCTGGCCA
    CTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTA
    AAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG 7
    99::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;;;;;;;;;;::::::::::::::::::::::::: RG:Z:SRR006041 NM:i:0
    (This is data from the 1000 genomics project.)

    I'm constructing a pipeline to study variations (I get fast-q sequence, index it, align it to ref.seq hg18, do a couple of format conversions and get BAM, call indels and snps, add them to a db, call larger variations, look if they've been reported before, give out fancy graphs and charts, display the alignment, submit a report).

    I'm learning about BWA aligner and the BAM format right now. I'm using pilot data on un-aligned sequences from the 1000 genomes project (because I will have similar BAM outputs).

    I have to study and make sense out of this BAM format. I've read this tutorial on understanding the SAM/ BAM format with little help. Could someone give me further pointers?

    Thanks a lot!
    Joker!sAce
    Last edited by Joker!sAce; 02-28-2011, 07:15 AM.

  • #2
    What specific questions about the format do you have?

    Comment


    • #3
      I understand that there are a lot of columns in this record.

      NA06984-SRR006041.1145152
      1040
      1
      113040605
      57
      325M
      *
      0
      0
      TTGATCACTTCACACACATCTTCATCGATGAGGCTGGCCACTGCATGGAGCCTGAGAGTCTGGTAGCTATAGCAGGTGAGGGACTCAGGTGGGGCTGCAGGTATACACCCTGTGTGGGTCAGAGAGGTTGCACCACTTACCTTTCTTCCCACACCTCTTCTGCTTCCCAGGGCTGATGGAAGTAAAGGAAACAGGTGATCCAGGAGGGCAGCTGGTGCTGGCAGGAGACCCTCGGCAGCTGGGGCCTGTGCTGCGTTCCCCACTGACCCAGAAGCATGGACTGGGATACTCACTGCTGGAGCGGCTGCTCACCTACAACTCCCTG
      799::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::88:::::::::::;;;;;;;;;;;;;::888:;;;;;;;;;;;;;;;;;;;;;;;;;;;;888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:9::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;::::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:::::::::::::::::::::::::
      RG:Z:SRR006041
      NM:i:0

      I'd like to know what they mean. I do have faint ideas but I'd like to know about it anyways.

      Comment


      • #4
        You'll get much better answers if you post specific questionswhich can't be easily found in the SAM format documentation.

        Comment


        • #5
          My study involves divergence study on the gene p53 on short arm of chromosome 17. I need to extract this part of the sequence.

          I understand that I can do this in two ways:
          1. Get raw fasta reads.
          2. Extract from the aligned(to hg18) data(in BAM format).

          How do I do it the 2'nd part?

          Comment


          • #6
            If you know the chromosomal coordinates for your gene (which you can find in the UCSC files or via the browser), then SAMtools can extract this efficiently

            Comment


            • #7
              This sequence has been aligned to hg18. I know the chromosomal co-ordinates for hg18 (chr17:7,520,037-7,531,588 - That's the tp53 repressor gene)

              How do I proceed from here?

              Comment


              • #8
                samtools view aligned.bam chr17:7520037-7531588 > tp53.sam

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:57 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-06-2024, 07:17 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Working...
                X