Header Leaderboard Ad

Collapse

Extract the XS field from bam

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract the XS field from bam

    Hello everyone,

    BWA mem generates for each read an "XS" field (the suboptimal alignment score). When I use samtools view, it's presented this way :
    - NS500801:90:HY7JVBGXY:2:21205:8003:11253 147 chrM 958 60 76M = 920 -114 CCCCCTCCCCAATAAAGCTAAAACTCACCTGAGTTGTAAAAAACTCCAGTTGACACAAAATAGACTACGAAAGTGG >>;@B@CC1C??=??AAC=???>C@C>CC@BAAA?@<>>>>>=B?BB@@@?A=B=B>>>><@A=B<=;A>=@=;>= BD:Z:IIIMPOLKNKJJJBIMOMIBBJLKKIKLMKJKJIIKHAAAAILKKKLJIHJKHHHH@@GGIHLLLKKLJCKOJLJJ PG:Z:MarkDuplicates RG:Z:id BI:Z:LLLPTSOOSROPQHOTSQOGGNPPQNQPROLPNMMNNFFFFLNNONPOMLNOMLMNEEKLNMOOPOONMHOROPNN NM:i:0 AS:i:76 XS:i:55

    Does anyone know an easy way to extract it ? With R ? I mean I know I could use samtools view + awk but it'll take a long time.

    Thanks in advance!

  • #2
    using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

    Code:
    java -jar dist/bioalcidaejdk.jar -e 'stream().forEach(R->println(R.getAttribute("XS")));'  in.bam

    Comment


    • #3
      Hi lindenb, thank you for your answer,

      How can I get the read name too ? I would like to have a table the in the first column the read name, and in the second the XS.

      Comment


      • #4
        > How can I get the read name too ?

        Code:
        ... printl(R.getReadName()+" "+R.getAttribute("XS")

        Comment


        • #5
          Thank you very much for your help!

          The output file is too big, I'm trying to get the chromosome too so that I can separate it per chromosome. I tried "getReferenceIndex" but it returns "null". Do you know how I could do ?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            How RNA-Seq is Transforming Cancer Studies
            by seqadmin



            Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
            09-07-2023, 11:15 PM
          • seqadmin
            Methods for Investigating the Transcriptome
            by seqadmin




            Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

            Whole Transcriptome RNA-seq
            Whole transcriptome sequencing...
            08-31-2023, 11:07 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:18 AM
          0 responses
          5 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 09:17 AM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-19-2023, 09:23 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-19-2023, 09:14 AM
          0 responses
          7 views
          0 likes
          Last Post seqadmin  
          Working...
          X