Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Locating individual reads in alignment

    I have used 100 nt paired-end sequences to construct a reduced representation reference genome of the organism I am working with. I aligned the reads back to the reference genome. I hope to find SNPs at some point. I have a list of individual reads (with the paired read) which I would like to inspect in the alignment. Is there a way to find out what position in the reference genome these reads are aligned to? I can visualize the aligned reads in IGV and there I can zoom in to a position to inspect a region. But I cannot search for a particular read - I need to know the map position of the read first. Is there a programme of script that could extract the position (and maybe other infrmation) of an individual read from a sam/bam file?
    Last edited by Tectona; 01-16-2012, 03:33 AM. Reason: typo

  • #2
    Presumably you have the SAM file that was output from the aligner. You can look for the location of a read in it using grep: grep -m 1 -w SomeReadName.123455 Aligned.sam

    That'll be easy enough provided you only have a few reads you want to look at,

    Comment


    • #3
      Tablet lets you search for reads by name, the regular expression support is also very handy: http://bioinf.hutton.ac.uk/tablet/

      Comment


      • #4
        Originally posted by dpryan View Post
        Presumably you have the SAM file that was output from the aligner. You can look for the location of a read in it using grep: grep -m 1 -w SomeReadName.123455 Aligned.sam

        That'll be easy enough provided you only have a few reads you want to look at,

        Thanks for the suggestion. However, when I use the command grep -m 1 Sequence_read_tag alignment_file.sam > output.txt

        I get the following information:
        FCB020AACXX:6:1305:20474:84915#ATGAACCT 163 369552-8 1 60 100M = 61 160 CTTGCAAAGGAAAATCTTGAGATGAACGAGGGCGACATTAGCAAGGAGGCCATCGGAGGCACCGACGGTACCACCGTCGATGGAGAGGATGCGAACCCAT bbbeeeeeggggfiiiiiiiihgifhihffhiiiiihiihiihfghfhiihggggeeecccccccccc]acccccc_acacccccccccccccccccccc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

        I recognize the name of my sequence, the sequence itself, the quality information... but where do I find the positional information?

        Comment


        • #5
          Originally posted by maubp View Post
          Tablet lets you search for reads by name, the regular expression support is also very handy: http://bioinf.hutton.ac.uk/tablet/
          Thanks for the suggestion. Tablet looks like a nice programme. And indeed, it lets you search for the name of an individual sequence. However, is there a way to display more than a single set of sequence reads? As far as I have been able to explore the programme, I do not see a way to import more than one set of reads aligned to a reference at a time... I would like to compare two or more sets of reads aligned to the same reference sequence.

          Comment


          • #6
            Originally posted by Tectona View Post
            Thanks for the suggestion. However, when I use the command grep -m 1 Sequence_read_tag alignment_file.sam > output.txt

            I get the following information:
            FCB020AACXX:6:1305:20474:84915#ATGAACCT 163 369552-8 1 60 100M = 61 160 CTTGCAAAGGAAAATCTTGAGATGAACGAGGGCGACATTAGCAAGGAGGCCATCGGAGGCACCGACGGTACCACCGTCGATGGAGAGGATGCGAACCCAT bbbeeeeeggggfiiiiiiiihgifhihffhiiiiihiihiihfghfhiihggggeeecccccccccc]acccccc_acacccccccccccccccccccc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

            I recognize the name of my sequence, the sequence itself, the quality information... but where do I find the positional information?
            It's the 4th field, so position 1 in this case. If you just want the contig and position displayed then you can use something like the following:
            Code:
            grep -m 1 Sequence_read_tag alignment_file.sam | awk '{ print $3":"$4 }' > output.txt
            The output will then be "contig: position" (without the space, for some reason the forum turns colon p into an emoticon).

            Comment


            • #7
              OK, thanks, now I got it figured out...
              Last edited by Tectona; 01-17-2012, 01:41 AM. Reason: Too early reaction

              Comment


              • #8
                That's position according to the reference contig it's aligned against. You may want to browse the SAM specification. The read you showed maps to the start of a contig.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X