Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intersect of two sam files

    I used two aligners for paired end reads, and got two sam files. Are there any ware or scripts to implement these two operations?

    - How to find the intersect of two sam files by read ids? To use bedtools, one needs to convert sam to bam to bed and work on 2 bed files.

    - I'd like to merge the 2 sam files and, for duplicated reads being aligned in both, select a better alignment by the number of mismatch (NM).

    Please advise.

  • #2
    samtools has a merge function, but I don't know how well it performs, and the files have to be sorted before merging. You could always use samtools to convert to SAM (if you have BAM), concatenate the files from the command line (making sure that the header only goes once to the top of the file), and using samtools again to sort.

    Comment


    • #3
      Actually, bedtools will take bam files for some applications.

      Comment


      • #4
        Here is a strategy, assuming that

        each sam file has at most one alignment per read
        the sam files are not (yet) sorted

        Use `samtools view` to extract just the alignments from each file

        Use perl (or something) the extract the value of NM and pre-pend it to each line as a new first column.

        `sort` the two alignments together first by readname (-k2,2) and secondarily descending by score (-k1,1nr)

        `cut` the NM (1st column) score out of your sorted/combined data

        use `sort` again to pick out the first (best scoring) alignment in each group (by using the --merge --unique -k1,1)

        somehow contrive to reheader this combined set of alignments

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X