Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extracting uniquely mappable read positions bam file

    Hi,

    How can I extract uniquely mappable read positions from a bam file? I need this file as input to BIC-Seq for CNV analysis.

    Thanks,
    Thanks,

  • #2
    Antbody having idea about this!
    Thanks,

    Comment


    • #3
      Depends on which aligner you used and which definition of "uniquely aligned" you want to use.

      Comment


      • #4
        Thanks dpryan, I've used bwa for the alignmment and uniquely aligned positions you can understand from the clip of THeta manual

        To get the mapping positions of the uniquely mapped reads, a user may use the modified version of samtools (included in this package).
        Suppose that the bam file is called example.bam and one wants to write the reads under the directory dir/. Then, one may use the following command to extract the mapping position of uniquely mapped reads.

        samtools view -U BWA,dir/,N,N example.bam

        or
        samtools view -U Bowtie,dir/,N,N example.bam
        Thanks,

        Comment


        • #5
          Anybody else any idea!
          Thanks,

          Comment


          • #6
            For BWA, you might either just grep for "XT:A:U" or simply use a MAPQ threshold. The latter would generally make more sense since you're actually interested in the alignments being correct (in reality, the concept of a "unique" alignment is somewhat non-sensical and is derived from an arbitrary edit-distance threshold).

            Comment


            • #7
              I also strongly encourage a MAPQ threshold. "Uniquely mapping" is not nearly stringent enough, because there may be hundreds of next-best hits if it's just one base off from a repetitive element. I find MAPQ = 10 to be a reasonable cutoff that doesn't exclude anything real-looking.

              So then it's simply "samtools view -q 10".

              Comment


              • #8
                Thanks dpryan and jwfoley,

                I actually want to use BICseq for CNV which later I've to use for tumor purity estimate, so what do you think is better, MAPQ or "XT:A:U".

                Thanks
                Thanks,

                Comment


                • #9
                  MAPQ is the better choice. A threshold of 10 is pretty common, but you might be able to get away with a bit lower, depending on how the data looks. For CNVs, I would suspect that anywhere between 5 and 10 would work well since the occasional wrong mapping won't have much of an effect.

                  Comment


                  • #10
                    Originally posted by tahamasoodi View Post
                    Thanks dpryan and jwfoley,

                    I actually want to use BICseq for CNV which later I've to use for tumor purity estimate, so what do you think is better, MAPQ or "XT:A:U".

                    Thanks
                    Have you managed to get BIC-seq to run? I have trouble.
                    Basically, I first installed the R package called Rsamtools.
                    Code:
                    # In an R shell:
                    source("http://bioconductor.org/biocLite.R")
                    biocLite("Rsamtools")
                    Then, I installed BICseq with the following command in the bash shell:
                    Code:
                    ~/apps/R CMD INSTALL BICseq_1.2.1.tar.gz

                    I tried the sample bam files that came with the zip file, but did not seem to work, and followed the 2-page PDF instruction.
                    Code:
                    > bicseq0 = BICseq(sample = '/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/tumor_sorted.bam', reference = '/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/normal_sorted.bam', seqNames = c(1:22, "X", "Y") )
                    Error in BICseq(sample = "/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/tumor_sorted.bam",  : 
                      No such chromosome "1" in the header of the BAM file "/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/tumor_sorted.bam"
                    Fine.

                    Code:
                    > bicseq0 = BICseq(sample = '/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/tumor_sorted.bam', reference = '/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/normal_sorted.bam', seqNames = c("chr1") )
                    Error in BICseq(sample = "/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/tumor_sorted.bam",  : 
                      No such chromosome "chr1" in the header of the BAM file "/home/ltfang/Documents/DOWNLOAD/BICseq/test_data/tumor_sorted.bam"



                    So anyway, tried it with my own bam files:

                    Code:
                    > bicseq <- BICseq(sample = '/PATH/TO/Our_Tumor.bam', reference = '/PATH/TO/Our_Normal.bam', seqNames=c("chr21", "chr22") )
                    
                    # The previous command worked, but this happened. 
                    > seqs <- getBICseg(object=bicseq, bin=100, lambda=2, winSize=200, quant=0.95, mult=1)
                    Error in .C("sort_rms_binning", as.integer(sample), length(sample), as.integer(reference), : 
                    "sort_rms_binning" not resolved from current namespace (BICseq)

                    Okay, so what's wrong? Anyone has any idea?

                    Comment


                    • #11
                      Thanks for posting this. I tried this on their bam files and mine as well.I am getting the same error. Have you figured it out yet ? error - "sort_rms_binning" not resolved from current namespace (BICseq)"

                      For loading their bam files:
                      If you see the bam files that they have provided it only has chromosome 22. so try it like this:

                      bicseq <- BICseq(sample = try.tumor, reference = try.normal, seqNames=paste("chr",22,sep=""))

                      Comment


                      • #12
                        Originally posted by aroraa View Post
                        Thanks for posting this. I tried this on their bam files and mine as well.I am getting the same error. Have you figured it out yet ? error - "sort_rms_binning" not resolved from current namespace (BICseq)"

                        For loading their bam files:
                        If you see the bam files that they have provided it only has chromosome 22. so try it like this:

                        bicseq <- BICseq(sample = try.tumor, reference = try.normal, seqNames=paste("chr",22,sep=""))
                        Someone told me to try it on R version 2 because it worked there for him.
                        I haven't had time to try that yet. I'll do that in the next couple of weeks.

                        Comment


                        • #13
                          Originally posted by aroraa View Post
                          Thanks for posting this. I tried this on their bam files and mine as well.I am getting the same error. Have you figured it out yet ? error - "sort_rms_binning" not resolved from current namespace (BICseq)"

                          For loading their bam files:
                          If you see the bam files that they have provided it only has chromosome 22. so try it like this:

                          bicseq <- BICseq(sample = try.tumor, reference = try.normal, seqNames=paste("chr",22,sep=""))
                          I tried it with R 2.15. At least that works with their sample data.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-25-2024, 11:49 AM
                          0 responses
                          19 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-24-2024, 08:47 AM
                          0 responses
                          20 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          62 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X