Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intersect of two sam files

    I used two aligners for paired end reads, and got two sam files. Are there any ware or scripts to implement these two operations?

    - How to find the intersect of two sam files by read ids? To use bedtools, one needs to convert sam to bam to bed and work on 2 bed files.

    - I'd like to merge the 2 sam files and, for duplicated reads being aligned in both, select a better alignment by the number of mismatch (NM).

    Please advise.

  • #2
    samtools has a merge function, but I don't know how well it performs, and the files have to be sorted before merging. You could always use samtools to convert to SAM (if you have BAM), concatenate the files from the command line (making sure that the header only goes once to the top of the file), and using samtools again to sort.

    Comment


    • #3
      Actually, bedtools will take bam files for some applications.

      Comment


      • #4
        Here is a strategy, assuming that

        each sam file has at most one alignment per read
        the sam files are not (yet) sorted

        Use `samtools view` to extract just the alignments from each file

        Use perl (or something) the extract the value of NM and pre-pend it to each line as a new first column.

        `sort` the two alignments together first by readname (-k2,2) and secondarily descending by score (-k1,1nr)

        `cut` the NM (1st column) score out of your sorted/combined data

        use `sort` again to pick out the first (best scoring) alignment in each group (by using the --merge --unique -k1,1)

        somehow contrive to reheader this combined set of alignments

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          Today, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:18 AM
        0 responses
        1 view
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Today, 08:04 AM
        0 responses
        3 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-30-2024, 03:16 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Working...
        X