Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unable to intersect BAM file with bedtools

    Hi everyone, I'm trying to use bedtools intersect to check the number of mapped reads in target regions (in a .bed file) originated by targeted bisulfite sequencing experiment (EpiSeq Roche).

    I used the following command.

    Code:
    ./bedtools intersect -bed -abam sample2.bam -b 
     ~/Data/MethylSeq/dataset/Agesmoke_dataset/AgeSmkSop_all_primary_targets.bed
    The program terminate with the following message and no result at all.

    Code:
    * WARNING: File sample2.bam has inconsistent naming convention for record:
    NC_000016.9  24163386  24163537  M03971:33:000000000-BN5NL:1:2114:12003:16132/1  255  +
    
    * WARNING: File sample2.bam has inconsistent naming convention for record:
    NC_000016.9  24163386  24163537  M03971:33:000000000-BN5NL:1:2114:12003:16132/1  255  +
    I tried to modify the original SAM file removing the read that cause the problem (that was the first read in the SAM file) and the problem persists with the second read. I tried also the option -nonamecheck with no results.

    Can someone help us? Thank you.
    Nicola

  • #2
    Check your chromosome names.
    Are they "chr" style in both bed and bam?

    Comment


    • #3
      Originally posted by Richard Finney View Post
      Check your chromosome names.
      Are they "chr" style in both bed and bam?
      Hi Richard, thanks for the reply.
      About your question, I think not.

      In the bed file I have record like this:
      Code:
      chr1    11123000        11123242        chr1:11123018-11123218
      chr1    16696418        16696674        chr1:16696447-16696647
      while in the SAM file (and consequently in the BAM file) I have record like that:

      Code:
      M03971:33:000000000-BN5NL:1:2114:12003:16132    99      NC_000016.9     24163387        255     151M    =       24163431        194     TGATCGGTGGTGA
      TGGGTTAGGTAGAGTGTATTAGTTCGTTTTTATGTTGTTGATAAAGATATATTCGAGATTGTGTAATTTATGAAAAAGAGGTTTAATGGATTTGGGGAGGTTTTAATTATGGTGGAAGGTTAAAGTTATGTTTTATAT BCCCCCCBBC
      ABGGGGGGFGGGGHHHHFGGHHHHHHHHGGHGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHFHHHHGGHHHHHHHHHHHHHHHHGGFGGEHHGHHHHHHHHHHGHHHGHHHHHHHHHHHHHHHHHHH NM:i:1
        ZS:Z:++
      Ps. the SAM file is the result of an alignment with BSMAP.

      Is this the problem? How can I resolve it?
      Nicola

      Comment


      • #4
        NC_000016 is a name used for "chr16".
        This the "official" name used at NCBI : https://www.ncbi.nlm.nih.gov/nuccore/NC_000016.10/

        You have to convert the "NCBI name" to "chr" names (or vice versa).

        There are many ways to rename fields. You can always brute force it using a custom simple program or script using your favorite programming language : bash, python, perl, C, etc.

        Any easy way would be to reheader the bam file. Please see samtools documentation for this.

        Comment


        • #5
          Originally posted by Richard Finney View Post
          NC_000016 is a name used for "chr16".
          This the "official" name used at NCBI : https://www.ncbi.nlm.nih.gov/nuccore/NC_000016.10/

          You have to convert the "NCBI name" to "chr" names (or vice versa).

          There are many ways to rename fields. You can always brute force it using a custom simple program or script using your favorite programming language : bash, python, perl, C, etc.

          Any easy way would be to reheader the bam file. Please see samtools documentation for this.
          It works, thank you so much!!
          I'm sorry for the triviality of the problem, but I'm not very practical with this stuff and the bedtools message wasn't very helpful.
          Again, thank you!

          Best regards
          Nicola

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Genetic Variation in Immunogenetics and Antibody Diversity
            by seqadmin



            The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
            11-06-2024, 07:24 PM
          • seqadmin
            Choosing Between NGS and qPCR
            by seqadmin



            Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
            10-18-2024, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 11:09 AM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Today, 06:13 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-01-2024, 06:09 AM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-30-2024, 05:31 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Working...
          X