Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools pileup

    This may come across as a simple question dear SeqAnswerers but I am stuck for a couple of days now or maybe I am getting this all wrong. Either ways I seek your help

    Scenario:
    I have done sequence data analysis for sequence files using bowtie and samtools. After the alignment and visualization of the alignment, in a particular position I saw that many reads were stacked in the tviewer with some nucleotides standing out as mismatches. At the Later stage of the analysis I wanted to see the mutations and samtools in combination with bcftools generated a file where I have observed that two of my sample seqs mutated into other bases from the ref. (although other seqs carrying mismatches in the tviewer screen weren't reportede)

    Suggestion required:
    How do I identify which one of my seqs actually has that mutation in them, esp that step in the analysis doesn't report any SeqIDs I can use to trace my sequences ?
    The second half of the problem is that, when I go by the chromosome genomic coordinates reported in the pileup/bcf file I still can't zero onto the sequence carrying that mutation in the original bowtie SAM file from the alignment !

  • #2
    For visualization, you might try the IGV browser. You can zoom to your region of interest and hover over a read to get information like read id.

    To programmatically extract reads that overlap a position, I would suggest using "samtools view", such as

    Code:
    samtools view filename.bam chr8:128747553-128747553
    From there, you could use the sequence, position, and CIGAR to determine if the read contains the mutation of interest.

    hope that helps answer your question,
    Justin

    Comment


    • #3
      I'm not sure I understand your question. It sounds like you have reads from a several samples mixed together in the same bam file. You visualize the alignment and see that there are mutations at some sites. You want to know which samples carry those mutations.

      If that's your problem, then the solution involves setting the Read Group attribute in the bam file. The Read Group is how you keep track of different samples when analyzing them together. With that attribute set properly, you can, for example, select reads from a particular sample in samtools view.

      I have found that bowtie doesn't handle Read Groups properly. If you have trouble you might have an easier time with bwa.

      Comment


      • #4
        Originally posted by Alex Renwick View Post
        I'm not sure I understand your question. It sounds like you have reads from a several samples mixed together in the same bam file. You visualize the alignment and see that there are mutations at some sites. You want to know which samples carry those mutations.

        If that's your problem, then the solution involves setting the Read Group attribute in the bam file. The Read Group is how you keep track of different samples when analyzing them together. With that attribute set properly, you can, for example, select reads from a particular sample in samtools view.

        I have found that bowtie doesn't handle Read Groups properly. If you have trouble you might have an easier time with bwa.
        You got me right !, I basically visualized the alignment and then wanted get the sequences that have the mutations (at that certain position), primarily; their IDs and positions where the mutations happened on the sequence as well as the chromosomal coordinate, I am getting somewhere but there are bumps on the way, In summary, report the mutations by positions and retrieve the relevant data to identify the seqs with.

        So, reading BWA docs, I was looking for the Read Group Attrib you suggested but I have not encountered it. It will be great if you can elaborate a bit more ..

        Appreciate it...

        Comment


        • #5
          Originally posted by BioSlayer View Post
          ...
          So, reading BWA docs, I was looking for the Read Group Attrib you suggested but I have not encountered it. It will be great if you can elaborate a bit more ..

          Appreciate it...
          I can elaborate a little.

          The Read Group is part of the SAM file specification, which you can find on the samtools webpage. Aligners like bowtie and bwa don't need it, but downstream analysis often does.

          The read group has to be added before the samples are merged into one bam file. Typically, you'd produce a bam file for each sample (with read group assigned), then merge the bam files into one for subsequent analysis.

          With bwa, the -r option to samse or sampe allows you to specify the Read Group. For example,
          Code:
          bwa sampe -r "@RG\tID:RUN_427\tSM:smp0\tPL:ILLUMINA" ref.fa smp0.1.sai smp0.2.sai smp0.1.fq smp0.2.fq > smp0.sam
          will give a Read Group ID of "RUN_427", sample name "smp0", and specify that the platform was "ILLUMINA".

          Some other ways of adding a Read Group specification are 1) using samtools merge -r; 2) using some picard tool; 3) writing your own small shell script.

          Comment


          • #6
            @Bamseek and @Alex Renwick.. Thank you folks, the command suggested by Bamseek works great at filtering for me all the seqs mapping at that position. The RG option in samtools merge is so handy, I don't typically produce separate bam files and then merge them but I marked it for the day I have to.

            By chance, I found a way in samtools tview that works in conjunction with the samtools view option, now I can toggle between the Read ID and its Sequence by hitting the 'r' key on the keyboard while on the viewer ... Apparently samtools has some hidden cookies to munch on ...

            Comment


            • #7
              on a similar vein, does anyone know an easy way to get these read groups to show up as different colours in a GBrowse BAM track for reads? I'll have a go at doing this myself in a couple of days, but it would be nice to know if there's something I can work on as a base.

              Comment


              • #8
                Interesting ...

                Originally posted by gringer View Post
                on a similar vein, does anyone know an easy way to get these read groups to show up as different colours in a GBrowse BAM track for reads? I'll have a go at doing this myself in a couple of days, but it would be nice to know if there's something I can work on as a base.
                Should you get a headstart at doing this with GBrowse BAM spill the beans ...

                Comment


                • #9
                  Okay, so here's my attempt*:
                  Code:
                  [illumina_default_UP_tagged_reads]
                  feature        = match
                  glyph          = segments
                  bgcolor        = sub {
                  # colors from http://gmod.org/wiki/Glyphs_and_Glyph_Options
                       my @data = @_;
                       my %sam = %{$data[0]};
                       my $read_gp = $sam{"align"}->get_tag_values("RG");
                       my %read_map;
                       @read_map{('0hpost1', '0hpost2', '0hpre', '18hpost', '18hpre', '18Ra', '18Rb',
                                  '18W', '30Ra', '30Rb', 'bcatpost1', 'bcatpost2', 'bcatpre1',
                                  'bcatpre2', 'cont1a', 'cont1b', 'cont2', 'evipost1', 'evipost2',
                                  'evipre1', 'evipre2', 'gfppost1', 'gfppost2', 'gfppre1', 'gfppre2',
                                 )} =
                                 ('chartreuse', 'chocolate', 'coral', 'cornflowerblue', 'cornsilk',
                                  'crimson', 'cyan', 'darkblue', 'darkcyan', 'darkgoldenrod',
                                  'darkgray', 'darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen',
                                  'darkorange', 'darkorchid', 'darkred', 'darksalmon', 'darkseagreen',
                                  'darkslategray', 'darkturquoise', 'darkviolet', 'deeppink', 'deepskyblue',
                                 );
                       return($read_map{$read_gp});
                    }
                  height         = 3
                  feature_limit  = 200
                  stranded       = 0
                  draw_target    = 1
                  show_mismatch  = 1
                  mismatch_color = red
                  bump density   = 20
                  bump           = fast
                  database       = Dec_2011_IDUA_Tagged
                  key            = Dec 2011 Illumina Unpaired Reads (Tagged)
                  citation       = Mapped with bowtie2 v. 2.0.0-beta5 using default options
                  category       = RNAseq coverage
                  label          = 0
                  Others will obviously need to change some strings around to get this to work for them.

                  * after many hours of frustration at the lack of useful gbrowse error messages when a 'sub' doesn't work
                  Last edited by gringer; 01-04-2012, 06:52 AM. Reason: updated to use hash array, separate group/colour arrays

                  Comment


                  • #10
                    This one uses selectable subtracks and is a bit less fiddly to work with, but seems to ignore the feature_limit variable:
                    Code:
                    [illumina_default_UP_tagged_reads]
                    feature        = match
                    glyph          = segments
                    bgcolor        = darkgreen
                    height         = 3
                    feature_limit  = 50
                    stranded       = 0
                    draw_target    = 1
                    show_mismatch  = 1
                    mismatch_color = red
                    bump density   = 20
                    bump           = fast
                    database       = Dec_2011_IDUA_Tagged
                    key            = Dec 2011 Illumina Unpaired Reads (Tagged)
                    citation       = Mapped with bowtie2 v. 2.0.0-beta5 using default options
                    category       = RNAseq coverage
                    label          = 0
                    hide empty subtracks = 1
                    subtrack select = Experiment tag_value RG;
                    subtrack table  = 0hpost1;
                                      0hpost2;
                                      0hpre;
                                      18hpost;
                                      18hpre;
                                      18Ra;
                                      18Rb;
                                      18W;
                                      30Ra;
                                      30Rb;
                                      bcatpost1;
                                      bcatpost2;
                                      bcatpre1;
                                      bcatpre2;
                                      cont1a;
                                      cont1b;
                                      cont2;
                                      evipost1;
                                      evipost2;
                                      evipre1;
                                      evipre2;
                                      gfppost1;
                                      gfppost2;
                                      gfppre1;
                                      gfppre2;
                    Also, I can't seem to get this to work with coverage maps (for which this function would be ideal).

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Exploring the Dynamics of the Tumor Microenvironment
                      by seqadmin




                      The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                      07-08-2024, 03:19 PM
                    • seqadmin
                      Exploring Human Diversity Through Large-Scale Omics
                      by seqadmin


                      In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                      06-25-2024, 06:43 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 11:09 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 07-19-2024, 07:20 AM
                    0 responses
                    148 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 07-16-2024, 05:49 AM
                    0 responses
                    121 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 07-15-2024, 06:53 AM
                    0 responses
                    111 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X