Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jester112358
    replied
    Hi everyone,

    I'm having problems with samples merged /w samtools.

    I get this error when I start to run our pipeline:

    MESSAGE: SAM/BAM file SAMFileReader{/csc/aaltonen/cg3/projects/Kaposi/Kapo93+94+95/09-1107/Kapo_93-95_09-1107_merged_s_99.nodup.bam} is malformed: Read ILLUMINA-8C38E9_0112:3:84:1445:15512#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://gatkforums.broadinstitute.org...lacereadgroups to fix this problem

    I ran samtools with samtools merge -h rg.txt -r

    Information in the rg.txt should be correct:

    @RG ID:Kapo93+94+95_1_09-1107_091207_HWI-EAS418_9_s_7 DS:091207_HWI-EAS418_9 SM:09-1107
    @RG ID:Kapo93+94+95_1_09-1107_100618_ILLUMINA-8C38E9_0112_s_3 DS:100618_ILLUMINA-8C38E9_0112 SM:09-1107
    @RG ID:Kapo93+94+95_1_09-1107_110103_HWUSI-EAS1785_0223_s_2 DS:110103_HWUSI-EAS1785_0223 SM:09-1107
    @RG ID:Kapo93+94+95_1_09-1107_110616_SN588_0054_AC00T5ABXX_s_4 DS:110616_SN588_0054_AC00T5ABXX SM:09-1107
    @RG ID:Kapo93+94+95_1_09-1107_111028_SN653_0108_BB0418ABXX_s_3 DS:111028_SN653_0108_BB0418ABXX SM:09-1107
    @RG ID:Kapo93+94+95_1_09-1107_120216_SN670_0098_AC04HRABXX_s_1 DS:120216_SN670_0098_AC04HRABXX SM:09-1107
    @RG ID:Kapo93-95-1_09-1107_110714_SN588_0055_AB0A5UABXX_s_1 DS:110714_SN588_0055_AB0A5UABXX SM:09-1107

    Any help here would be appreciated greatly!

    Leave a comment:


  • cfrias
    replied
    Thank very much Wjeck and Richard!!!

    Yes, Richard I think the same.
    Always I read the manuals, but the useful information is in this forum because people
    have similars problems.

    I am only want to add the read groups to use GATK to improve the aligments from bwa bwasw.
    So, I need to remove duplicates and I have a pool of reads.
    For this purpose I need to put the read groups In order to avoid remove similar reads from differents individuals.
    (duplicated read is a read that have the same maping coordinates and the same CIGAR string,
    it isn't?)

    Cris

    Leave a comment:


  • Richard Finney
    replied
    I wonder if these easiest thing would be for software that insists on Read Groups instead provided a parameter to ignore read groups. This Read Groups thing has turned out to be more of a hassle than a benefit. Software should be more robust.

    Leave a comment:


  • cfrias
    replied
    OK, It's Friday nigth... But I try the solution

    With the Brugger's script!!
    THANK YOU VERY MUCH!!!

    Leave a comment:


  • wjeck
    replied
    I think I switched to using Picard tools AddOrReplaceReadGroups function.

    Try looking here:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Haven't had to do this in a while, though, since I started putting read groups in at the beginning, during alignment. I believe BWA now does that with proper use of the alignment option. The best way to solve this problem is to make sure it doesn't happen in the first place.

    Leave a comment:


  • cfrias
    replied
    Add read groups to bam files using bwa-0.6.2

    Hi everybody,


    I tried to add the read groups to bam files without successful.

    I'm using bwa (bwasw 454 reads) and I have tried with the command merge but don't work.

    I have read the other post http://seqanswers.com/forums/showthread.php?t=4180
    and http://sites.duke.edu/rainbowblog/20...p-information/

    But I still couldn't add the read groups.

    Please Can someone help me?
    Thanks in advance

    Cris
    Last edited by cfrias; 02-15-2013, 01:24 PM. Reason: I tried the solution!!!

    Leave a comment:


  • aforntacc
    replied
    Originally posted by freeseek View Post
    @Michael.James.Clark the following two lines of bash code:
    Code:
    echo -e "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina" > rg.txt
    samtools view -h ga.bam | cat rg.txt - | awk '{ if (substr($1,1,1)=="@") print; else printf "%s\tRG:Z:ga\n",$0; }' | samtools view -uS - | samtools rmdup - - | samtools rmdup -s - aln.bam
    should add to the bam file the read group information in the same way samtools merge adds the read group information to the two bam files as described by javijevi. The idea is to unpack the bam file, add the read group header, add the read group information to every read, repack the file, and remove duplicates. Again, remove duplicates only if the coverage is not too deep.

    hi please i got this error, how can i resolve it? i have added the readgroup bam files and used samtools to merge them but when i run the somaticindel detector from GATK it will give me the error below.
    here are the commands that i used in adding the read group and merge the bam files
    -rh rgmt.txt - genome_110506_SN13.bam genome_110506_SN132.bam genome_110506_SN132_A.bam > newmut.bam
    and here is the GATK command i used for the somaticindeldetector
    elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam
    and here is the error below
    MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem
    ##### ERROR ------------------------------------------------------------------------------------------

    please help me
    thanks a lot

    Leave a comment:


  • Seq84
    replied
    `@RG\tID:foo\tSM:bar'
    Maybe is the `quote. Try to copy and paste this string:

    '@RG\tID:foo\tSM:bar'

    Let me know.

    Leave a comment:


  • Michael.James.Clark
    replied
    Originally posted by jyli View Post
    I posted this on another thread, but

    -r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]

    gave me error: " malformated @RG line"

    Can you please help?
    It's probably the "\t". I've had trouble with that before.

    Probably best to use the Picard tool for it.

    Leave a comment:


  • jyli
    replied
    Originally posted by lh3 View Post
    You may try "samtools merge", using options -r and -h. You write your @RG header lines in a file provided to -h; -r will add RG:Z: tag to each of the alignment, based on file names.

    EDIT: for an example:

    http://sourceforge.net/apps/mediawik...rged_alignment

    I posted this on another thread, but

    -r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]

    gave me error: " malformated @RG line"

    Can you please help?

    Leave a comment:


  • wjeck
    replied
    I can attest that both of these tools work well. The PICARD tool in particular is vastly quicker than my previous workaround.

    Leave a comment:


  • Seq84
    replied
    Usage: bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq>

    Options: -a INT maximum insert size [500]
    -o INT maximum occurrences for one end [100000]
    -n INT maximum hits to output for paired reads [3]
    -N INT maximum hits to output for discordant pairs [10]
    -c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
    -f FILE sam file to output results to [stdout]
    -r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]
    -P preload index into memory (for base-space reads only)
    -s disable Smith-Waterman for the unmapped mate
    -A disable insert size estimate (force -s)

    Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
    2. For reads shorter than 30bp, applying a smaller -o is recommended to
    to get a sensible speed at the cost of pairing accuracy.
    This is BWA sampe in 0.5.9-r16 version.
    With -r option followed by such kind of string you could insert RG directly during mapping.

    For editing RG lines use AddOrReplaceReadGroup in Picard.

    Leave a comment:


  • wjeck
    replied
    The way to do this now is to use the Picard command line tool, in the latest picard version.

    Leave a comment:


  • jyli
    replied
    Originally posted by wjeck View Post
    Follow up question: Is there a way to edit the information in the @RG tag after the files have been merged in BAM format? I'd like to add and subtract information from these lines downstream, and I can't figure out an elegant way to get into them without writing out an entire SAM file and translating it back to BAM.
    Were you ever be able to figure it out (with the already a merged bam file)?

    Leave a comment:


  • caddymob
    replied
    How to patch BWA

    Ok, I figured it out! Here is how I did it...

    go into the SVN checkout of bio-bwa/trunk/bwa, then run this:

    Code:
    patch bwape.c BWA_read_group_patch.diff
    then:

    Code:
    make
    then test it:

    Code:
    ./bwa sampe
    
    Usage:   bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq>
    
    Options: -a INT   maximum insert size [500]
             -o INT   maximum occurrences for one end [100000]
             -n INT   maximum hits to output for paired reads [3]
             -N INT   maximum hits to output for discordant pairs [10]
             -c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
             -f FILE sam file to output results to [stdout]
    
             -P       preload index into memory (for base-space reads only)
             -s       disable Smith-Waterman for the unmapped mate
             -A       disable insert size estimate (force -s)
    
             -i       read group identifier (ID)
             -m       read group sample (SM), required if ID is given
             -l       read group library (LB)
             -p       read group platform (PL)
    Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
           2. For reads shorter than 30bp, applying a smaller -o is recommended to
              to get a sensible speed at the cost of pairing accuracy.
    the -i, -m, -l and -p options are the ticket!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
58 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
54 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
45 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X