Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Help] How to get those reads containing specified SNP?

    Hi, all,

    I am a new drummy for bioinformatics.

    After SNP calling using GATK/freebayes, we usually get a SNP list. Now I have some interest SNP sites. Does anyone know how to identify those reads containing these interest SNPs?

    Please note these SNP might be heterozygous. And now I mapped the reads to a reference, and get sorted bam file.

    Would anyone tell me how to achieve that in detail or just tell me your thought and some tools might be helpful

  • #2
    Assuming you have mapped your reads and now have a SAM/BAM file [this is the usual case] then the samtools program using the 'view' option will pull out reads in the region of your choice.

    Comment


    • #3
      Might not be understanding you but you can pull out all the reads + info with
      grep -B 1 -A 2 GCCTATCGCAGATACACTCC sample.fastq > SNVreads.fastqish
      (the nuc string contains your SNP)

      need to remove the -- printed between reads
      grep -v -e -- SNVreads.fastqish > SNVreads.fastq

      You might have to tweek the length of your grep nuc pattern for specificity and avoiding other SNPs (dont know what you are sequencing). A couple cross platform visualization tools is Ugene.

      Hope this is what you are looking for.

      Earl
      --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

      Comment


      • #4
        reference -----------------------------------------------------------
        read1 ----------T-------------
        read2 -------------------------
        read3 ------T------------------
        read4 --------------------------

        I want to extract all the read id having the T snp

        Comment


        • #5
          If your read file looks like that then you can use

          [your/Directory]$ grep -------T------ YourReadFile.txt > YourSNPReadFile.txt

          output:
          [your/Directory]$ more YourSNPReadFile.txt
          read1 ----------T-------------
          read3 ------T------------------

          _________________________________________________________________________
          If you have a .fastq file, all you need is the first line, which is just before the nuc string like:

          @M01472:34:000000000-A40FG:1:1101:17765:1645 1:N:0:9
          NTTCCAGCGAGGTTCTGAGTTCTTAGTCTGGTGTCGGCGTACCCACACGGTG
          +
          #>>>ABFFB?DBGGGGGCEGGGHHHGHHHHHFAGHEEGGGGGGHHGFDEEFG


          just use:

          [your/Directory]$ grep -B 1 GCCTATCGCAGATACACTCC YourSample.fastq > NamesAndReads.txt
          #where "-B 1" prints the line before the pattern
          #and the pattern "GCCTATCGCAGATACACTCC" contains the SNP somewhere in the middle.

          [your/Directory]$ grep @M01472 NamesAndReads.txt > Names.txt
          # "@M01472" is something in all the names but not in any reads
          # for instance if your read names are actually read1, read2, read3, and read4 you could use "read"

          #output for my command
          [your/Directory]$ more Names.txt
          @M01472:34:000000000-A40FG:1:1101:17765:1645 1:N:0:9
          @M01472:34:000000000-A40FG:1:1101:18453:1656 1:N:0:9
          @M01472:34:000000000-A40FG:1:1101:16266:1658 1:N:0:9
          --More--(0%)

          NOTE: this is a quick solution, if your genome is repetitive or if the SNP is in a duplicated region this approach might not be the best method. If that is the case. Something a little more involved from a .sam file might be necessary.

          hope that helps
          --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM
          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:55 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-30-2024, 03:16 PM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-29-2024, 01:32 PM
          0 responses
          29 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-24-2024, 07:15 AM
          0 responses
          215 views
          0 likes
          Last Post seqadmin  
          Working...
          X