Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • offspring
    replied
    Hi,

    good to see it's useful to others. The problem is an error in the script, either replace the "index" with "find" or use the updated version from github.

    The second issue seems to be that you're typing "import os" in the shell instead of in Python. The import you're executing right now is part of the ImageMagick package.

    Chris

    Originally posted by zaki View Post
    Dear Chris,

    I was attempting to merge unmapped + mapped tophat output as well.

    Very excited to see your script! but somehow it returns an error when I tried running it?

    Code:
    [B]$ ~/bin/bin/python2.7 fix_unmapped.py outdir/[/B]
    Traceback (most recent call last):
      File "fix_unmapped.py", line 104, in <module>
        main(sys.argv[1])
      File "fix_unmapped.py", line 53, in main
        if read.qname.index("/") != -1:
    ValueError: substring not found
    Not sure if this could be the reason - I am using ssh to log into a linux server, and if i type import os, or any of the other import, the error below will show up.

    Code:
    import: unable to open X server `'.

    Leave a comment:


  • zaki
    replied
    Dear Chris,

    I was attempting to merge unmapped + mapped tophat output as well.

    Very excited to see your script! but somehow it returns an error when I tried running it?

    Code:
    [B]$ ~/bin/bin/python2.7 fix_unmapped.py outdir/[/B]
    Traceback (most recent call last):
      File "fix_unmapped.py", line 104, in <module>
        main(sys.argv[1])
      File "fix_unmapped.py", line 53, in main
        if read.qname.index("/") != -1:
    ValueError: substring not found
    Not sure if this could be the reason - I am using ssh to log into a linux server, and if i type import os, or any of the other import, the error below will show up.

    Code:
    import: unable to open X server `'.

    Leave a comment:


  • offspring
    replied
    In case it's useful for anyone else, here's the script I use to fix the unmapped reads:

    https://github.com/cbrueffer/misc_bi...apped_reads.py

    Chris

    Leave a comment:


  • offspring
    replied
    Originally posted by dariober View Post
    Flag 69 means read unmapped and mate mapped, so Picard expects a chromosome name assigned to the mate of the unmapped read. Since the read flagged 69 has * for the chromosome of the mate, Picard complains.

    Probably this inconsistency has nothing to do with the quality of your data (unless it happens quite a lot) and I think it is just a small bug in the way Tophat sets flags. So the easy way around it is to run picard with option VALIDATION_STRINGENCY=LENIENT so that it will report the errors but it will ignore them.

    Hope this helps
    Dario
    Hi Dario,

    thanks for the input. VALIDATION_STRINGENCY=LENIENT was not really an option for me, since that would defeat the purpose of merging the two files in the first place (it would throw out all unmapped reads with a mapped mate in this case).

    However I finally got it to work, part of the problem is a bug in Tophat (already reported to the Tophat team). Here is how I had to modify unmapped.bam to make it work:

    1. Remove /1 and /2 suffixes from all reads (not necessary with Tophat version 2.0.7 and newer)
    2. Change MAPQ from 255 to 0 for all reads
    3. For unmapped reads with a mapped mate:
    - set RNAME and RNEXT of unmapped read to RNAME of mapped mate
    - set POS to POS of mapped mate
    - set PNEXT to 0
    4. For paired-end reads where both ends are unmapped, Tophat does not set the "mate is unmapped" (0x8) SAM flag on either read. This has to be fixed manually.

    After making these changes, the merged file of accepted_hits.bam and unmapped.bam could be processed by Picard tools.

    Chris

    Leave a comment:


  • dariober
    replied
    Originally posted by offspring View Post
    I have modified the unmapped reads as follows now:

    - Removed the /1 and /2 suffixes
    - Changes MAPQ from 255 to 0
    - For unmapped reads with a mapped mate, set RNAME and POS to the same value as the mapped mate

    However, even with these changes I get the "Mapped mate should have mate reference name" error from Picard.

    Here is read pair it stumbles upon now:

    Code:
    HWI-ST587_0094:4:1101:7158:1939.ATCACGA	69	chr3	129324900	0	*	*	0	0	TGNGTGTTCTCGAAGCGGTGGTCCTCCAGGCTGCGGTTGCGCGGGAAGAAGGNGCTGCCGTAACCGGTGTACGTGNCGCCCACGAGCAGGCGGCTGCCCCCC!4ADDFHHHHHJJIJJGHJHIJJJJJJJJJJJJJFHIJJIJGDBDDDDDD!,8?BDDDDDDDDDD>B>CDDCC!+8?BBBDDDDDDDDDDDB)&(28<
    HWI-ST587_0094:4:1101:7158:1939.ATCACGA	1177	chr3	129324900	50	38M	*	0	0	CTAGCCCCACGGTGGACGCGTTCGGGTGGTTGGCCGCC	FFFFHJJJHJJJJJJJJJJJJJJJJHHHHHFFFFFCCC	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:38	YT:Z:UU	XS:A:-	NH:i:1
    Flag 69 means read unmapped and mate mapped, so Picard expects a chromosome name assigned to the mate of the unmapped read. Since the read flagged 69 has * for the chromosome of the mate, Picard complains.

    Probably this inconsistency has nothing to do with the quality of your data (unless it happens quite a lot) and I think it is just a small bug in the way Tophat sets flags. So the easy way around it is to run picard with option VALIDATION_STRINGENCY=LENIENT so that it will report the errors but it will ignore them.

    Hope this helps
    Dario

    Leave a comment:


  • offspring
    replied
    I have modified the unmapped reads as follows now:

    - Removed the /1 and /2 suffixes
    - Changes MAPQ from 255 to 0
    - For unmapped reads with a mapped mate, set RNAME and POS to the same value as the mapped mate

    However, even with these changes I get the "Mapped mate should have mate reference name" error from Picard.

    Here is read pair it stumbles upon now:

    Code:
    HWI-ST587_0094:4:1101:7158:1939.ATCACGA	69	chr3	129324900	0	*	*	0	0	TGNGTGTTCTCGAAGCGGTGGTCCTCCAGGCTGCGGTTGCGCGGGAAGAAGGNGCTGCCGTAACCGGTGTACGTGNCGCCCACGAGCAGGCGGCTGCCCCCC!4ADDFHHHHHJJIJJGHJHIJJJJJJJJJJJJJFHIJJIJGDBDDDDDD!,8?BDDDDDDDDDD>B>CDDCC!+8?BBBDDDDDDDDDDDB)&(28<
    HWI-ST587_0094:4:1101:7158:1939.ATCACGA	1177	chr3	129324900	50	38M	*	0	0	CTAGCCCCACGGTGGACGCGTTCGGGTGGTTGGCCGCC	FFFFHJJJHJJJJJJJJJJJJJJJJHHHHHFFFFFCCC	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:38	YT:Z:UU	XS:A:-	NH:i:1
    Any clue would be appreciated!

    Chris
    Last edited by offspring; 03-08-2013, 01:31 AM.

    Leave a comment:


  • Tophat: merging accepted_hits.bam and unmapped.bam

    Hi everyone,

    I would like to merge the Tophat output files accepted_hits.bam and unmapped.bam into one file for subsequent analysis. The files contain paired-end Illumina reads.

    I used a naive approach of just calling samtools merge:

    Code:
    samtools merge merged.bam accepted_hits.bam unmapped.bam
    Using the resulting merged.bam file with any Picard tool results in an error however:

    Code:
    INFO	2013-03-07 10:29:53	AddOrReplaceReadGroups	Processed    39,000,000
    records.  Elapsed time: 00:11:07s.  Time for last 1,000,000:   16s.
    Last read position: chrX:153,627,857
    [Thu Mar 07 10:29:55 CET 2013] net.sf.picard.sam.AddOrReplaceReadGroups
    done. Elapsed time: 11.16 minutes.
    Runtime.totalMemory()=1110376448
    FAQ:  http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
    Exception in thread "main" net.sf.samtools.SAMFormatException: SAM
    validation error: ERROR: Record 39136746, Read name
    HWI-ST587_0094:4:1101:7158:1939.ATCACGA/1, Mapped mate should have mate
    reference name
    	at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
    	at
    net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:541)
    	at
    net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
    	at
    net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
    	at
    net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
    	at
    net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
    	at
    net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:98)
    	at
    net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    	at
    net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    	at
    net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
    The files were created with a Tophat version prior to 2.0.7, so the unmapped reads still have /1 and /2 suffixes. However, removing the suffixes before calling samtools merge results in the same error down the line.

    The reads picard complains about look like this:

    Code:
    samtools view merged.bam | grep "HWI-ST587_0094:4:1101:7158:1939.ATCACGA"
    HWI-ST587_0094:4:1101:7158:1939.ATCACGA	1177	chr3	129324900
    50	38M	*	0	0	CTAGCCCCACGGTGGACGCGTTCGGGTGGTTGGCCGCC
    FFFFHJJJHJJJJJJJJJJJJJJJJHHHHHFFFFFCCC	AS:i:0	XN:i:0	XM:i:0	XO:i:0
    XG:i:0	NM:i:0	MD:Z:38	YT:Z:UU	XS:A:-	NH:i:1
    HWI-ST587_0094:4:1101:7158:1939.ATCACGA/1	69	*	0	255	*	*	0	0
    TGNGTGTTCTCGAAGCGGTGGTCCTCCAGGCTGCGGTTGCGCGGGAAGAAGGNGCTGCCGTAACCGGTGTACGTGNCGCCCACGAGCAGGCGGCTGCCCC
    CC!4ADDFHHHHHJJIJJGHJHIJJJJJJJJJJJJJFHIJJIJGDBDDDDDD!,8?BDDDDDDDDDD>B>CDDCC!+8?BBBDDDDDDDDDDDB)&(28<
    Software versions used:
    Tophat: 2.0.6
    Picard: 1.85
    samtools: 0.1.18

    Has anyone done this successfully?

    Thanks a lot,

    Chris
    Last edited by offspring; 03-07-2013, 04:55 AM.

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin







    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has...
    Today, 01:49 PM
  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 09:29 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, Today, 09:06 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Today, 08:03 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2024, 07:36 AM
0 responses
61 views
0 likes
Last Post seqadmin  
Working...
X