Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intersect BAM files from alignments to human and mouse

    Hi,

    I have searched the forums (closest thing was this: seqanswers.com/forums/showthread.php?t=31625 but no replies) and done a lot of trial and error on my own, but can't come up with a good solution to this so I'm hoping someone here will have an idea!

    I'm working with xenograft models so have aligned my reads (paired end) separately to human and mouse as I want to get an idea of the levels of contamination from mouse, and devise the best strategy to deal with mouse reads from there.

    So I have two BAM files (1 for human, 1 for mouse); for each of those I've extracted the mapped and unmapped reads using samtools (-f4 and -F4 options).

    I now want to, for example, compare/intersect the reads that map to both human and mouse for a given sample. A sort of intersectBed but with 2 BAM files (bedtools only seems to accept one BAM file + 1 bed file);

    I have tried using the CompareSAMs function in Picard tools but it just tells me for each read that they're not the same in each file "read name ceases agreeing" (doesn't seem to do any searching):

    Code:
    java -jar CompareSAMs.jar mapped_to_human.sorted.bam mapped_to_mouse.sorted.bam
    Any hints would be much appreciated!
    Thanks

    PS: I'm also using the Xenome tools in parallel, but want to do this manually as well as a form of sanity check.

  • #2
    CmpBams

    I wrote a tool named CmpBams ( https://github.com/lindenb/jvarkit/wiki/CmpBams ) that might help you. It takes two or more BAM and show the differences for each read F/R.

    Code:
    #READ-Name	tmp1.sam tmp2.sam|tmp1.sam tmp3.sam|tmp2.sam tmp3.sam	tmp1.sam	tmp2.sam	tmp3.sam
    HWI-1KL149:20:C1CU7ACXX:1:1101:17626:32431/1	EQ|EQ|EQ	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M
    HWI-1KL149:20:C1CU7ACXX:1:1101:17626:32431/2	EQ|EQ|EQ	K01:2059=163/100M	K01:2059=163/100M	K01:2059=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1102:16831:71728/1	EQ|EQ|EQ	K01:2133=83/100M	K01:2133=83/100M	K01:2133=83/100M
    HWI-1KL149:20:C1CU7ACXX:1:1102:16831:71728/2	EQ|EQ|EQ	K01:2059=163/100M	K01:2059=163/100M	K01:2059=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1105:3309:27760/1	EQ|EQ|EQ	K01:2213=83/100M	K01:2213=83/100M	K01:2213=83/100M
    HWI-1KL149:20:C1CU7ACXX:1:1105:3309:27760/2	EQ|EQ|EQ	K01:2081=163/100M	K01:2081=163/100M	K01:2081=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1106:2914:12111/1	EQ|EQ|EQ	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M	K01:2136=83/43M3I54M
    HWI-1KL149:20:C1CU7ACXX:1:1106:2914:12111/2	EQ|EQ|EQ	K01:2059=163/100M	K01:2059=163/100M	K01:2059=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1107:11589:17295/1	EQ|EQ|EQ	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M
    HWI-1KL149:20:C1CU7ACXX:1:1107:11589:17295/2	EQ|EQ|EQ	K01:1990=163/100M	K01:1990=163/100M	K01:1990=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1110:14096:95943/1	EQ|EQ|EQ	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M	K01:2123=83/56M3I41M
    HWI-1KL149:20:C1CU7ACXX:1:1110:14096:95943/2	EQ|EQ|EQ	K01:1990=163/100M	K01:1990=163/100M	K01:1990=163/100M
    HWI-1KL149:20:C1CU7ACXX:1:1110:15369:59046/1	EQ|EQ|EQ	K01:2213=83/100M	K01:2213=83/100M	K01:2213=83/100M
    you could pipe the output in awk to select the reads that have been (un)mapped in on or more genome.

    Comment


    • #3
      Thanks, I'll give that a go!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X