Adding Read Group info to a set of Bam files

jester112358 replied

04-19-2013, 02:43 AM
Hi everyone,

I'm having problems with samples merged /w samtools.

I get this error when I start to run our pipeline:

MESSAGE: SAM/BAM file SAMFileReader{/csc/aaltonen/cg3/projects/Kaposi/Kapo93+94+95/09-1107/Kapo_93-95_09-1107_merged_s_99.nodup.bam} is malformed: Read ILLUMINA-8C38E9_0112:3:84:1445:15512#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://gatkforums.broadinstitute.org...lacereadgroups to fix this problem

I ran samtools with samtools merge -h rg.txt -r

Information in the rg.txt should be correct:

@RG ID:Kapo93+94+95_1_09-1107_091207_HWI-EAS418_9_s_7 DS:091207_HWI-EAS418_9 SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_100618_ILLUMINA-8C38E9_0112_s_3 DS:100618_ILLUMINA-8C38E9_0112 SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_110103_HWUSI-EAS1785_0223_s_2 DS:110103_HWUSI-EAS1785_0223 SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_110616_SN588_0054_AC00T5ABXX_s_4 DS:110616_SN588_0054_AC00T5ABXX SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_111028_SN653_0108_BB0418ABXX_s_3 DS:111028_SN653_0108_BB0418ABXX SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_120216_SN670_0098_AC04HRABXX_s_1 DS:120216_SN670_0098_AC04HRABXX SM:09-1107
@RG ID:Kapo93-95-1_09-1107_110714_SN588_0055_AB0A5UABXX_s_1 DS:110714_SN588_0055_AB0A5UABXX SM:09-1107

Any help here would be appreciated greatly!
Leave a comment:
cfrias replied

02-15-2013, 01:45 PM
Thank very much Wjeck and Richard!!!

Yes, Richard I think the same.
Always I read the manuals, but the useful information is in this forum because people
have similars problems.

I am only want to add the read groups to use GATK to improve the aligments from bwa bwasw.
So, I need to remove duplicates and I have a pool of reads.
For this purpose I need to put the read groups In order to avoid remove similar reads from differents individuals.
(duplicated read is a read that have the same maping coordinates and the same CIGAR string,
it isn't?)

Cris
Leave a comment:
Richard Finney replied

02-15-2013, 01:28 PM
I wonder if these easiest thing would be for software that insists on Read Groups instead provided a parameter to ignore read groups. This Read Groups thing has turned out to be more of a hassle than a benefit. Software should be more robust.
Leave a comment:
cfrias replied

02-15-2013, 01:26 PM
OK, It's Friday nigth... But I try the solution

With the Brugger's script!!
THANK YOU VERY MUCH!!!
Leave a comment:
wjeck replied

02-15-2013, 01:19 PM
I think I switched to using Picard tools AddOrReplaceReadGroups function.

Try looking here:

What exactly is AddOrReplaceReadGroups (picard tools) doing? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=11887

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Haven't had to do this in a while, though, since I started putting read groups in at the beginning, during alignment. I believe BWA now does that with proper use of the alignment option. The best way to solve this problem is to make sure it doesn't happen in the first place.
Leave a comment:
cfrias replied

02-15-2013, 01:08 PM
Add read groups to bam files using bwa-0.6.2

Hi everybody,

I tried to add the read groups to bam files without successful.

I'm using bwa (bwasw 454 reads) and I have tried with the command merge but don't work.

I have read the other post http://seqanswers.com/forums/showthread.php?t=4180
and http://sites.duke.edu/rainbowblog/20...p-information/

But I still couldn't add the read groups.

Please Can someone help me?
Thanks in advance

Cris

Last edited by cfrias; 02-15-2013, 01:24 PM. Reason: I tried the solution!!!
Leave a comment:
aforntacc replied

09-04-2012, 08:37 PM
Originally posted by freeseek View Post

@Michael.James.Clark the following two lines of bash code:

Code:

echo -e "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina" > rg.txt samtools view -h ga.bam | cat rg.txt - | awk '{ if (substr($1,1,1)=="@") print; else printf "%s\tRG:Z:ga\n",$0; }' | samtools view -uS - | samtools rmdup - - | samtools rmdup -s - aln.bam

should add to the bam file the read group information in the same way samtools merge adds the read group information to the two bam files as described by javijevi. The idea is to unpack the bam file, add the read group header, add the read group information to every read, repack the file, and remove duplicates. Again, remove duplicates only if the coverage is not too deep.

hi please i got this error, how can i resolve it? i have added the readgroup bam files and used samtools to merge them but when i run the somaticindel detector from GATK it will give me the error below.
here are the commands that i used in adding the read group and merge the bam files
-rh rgmt.txt - genome_110506_SN13.bam genome_110506_SN132.bam genome_110506_SN132_A.bam > newmut.bam
and here is the GATK command i used for the somaticindeldetector
elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam
and here is the error below
MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem
##### ERROR ------------------------------------------------------------------------------------------

please help me
thanks a lot
Leave a comment:
Seq84 replied

07-02-2011, 11:17 PM
`@RG\tID:foo\tSM:bar'

Maybe is the `quote. Try to copy and paste this string:

'@RG\tID:foo\tSM:bar'

Let me know.
Leave a comment:
Michael.James.Clark replied

07-01-2011, 10:33 PM
Originally posted by jyli View Post

I posted this on another thread, but

-r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]

gave me error: " malformated @RG line"

Can you please help?

It's probably the "\t". I've had trouble with that before.

Probably best to use the Picard tool for it.
Leave a comment:
jyli replied

05-18-2011, 04:10 AM
Originally posted by lh3 View Post

You may try "samtools merge", using options -r and -h. You write your @RG header lines in a file provided to -h; -r will add RG:Z: tag to each of the alignment, based on file names.

EDIT: for an example:

http://sourceforge.net/apps/mediawik...rged_alignment

I posted this on another thread, but

-r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]

gave me error: " malformated @RG line"

Can you please help?
Leave a comment:
wjeck replied

04-15-2011, 04:59 AM
I can attest that both of these tools work well. The PICARD tool in particular is vastly quicker than my previous workaround.
Leave a comment:
Seq84 replied

04-15-2011, 04:53 AM
Usage: bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq>

Options: -a INT maximum insert size [500]
-o INT maximum occurrences for one end [100000]
-n INT maximum hits to output for paired reads [3]
-N INT maximum hits to output for discordant pairs [10]
-c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
-f FILE sam file to output results to [stdout]
-r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]
-P preload index into memory (for base-space reads only)
-s disable Smith-Waterman for the unmapped mate
-A disable insert size estimate (force -s)

Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
2. For reads shorter than 30bp, applying a smaller -o is recommended to
to get a sensible speed at the cost of pairing accuracy.

This is BWA sampe in 0.5.9-r16 version.
With -r option followed by such kind of string you could insert RG directly during mapping.

For editing RG lines use AddOrReplaceReadGroup in Picard.
Leave a comment:
wjeck replied

04-14-2011, 10:37 AM
The way to do this now is to use the Picard command line tool, in the latest picard version.
Leave a comment:
jyli replied

04-14-2011, 10:14 AM
Originally posted by wjeck View Post

Follow up question: Is there a way to edit the information in the @RG tag after the files have been merged in BAM format? I'd like to add and subtract information from these lines downstream, and I can't figure out an elegant way to get into them without writing out an entire SAM file and translating it back to BAM.

Were you ever be able to figure it out (with the already a merged bam file)?
Leave a comment:

caddymob replied

10-06-2010, 10:23 PM

How to patch BWA

Ok, I figured it out! Here is how I did it...

go into the SVN checkout of bio-bwa/trunk/bwa, then run this:

Code:

patch bwape.c BWA_read_group_patch.diff

then:

Code:

make

then test it:

Code:

./bwa sampe

Usage:   bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq>

Options: -a INT   maximum insert size [500]
         -o INT   maximum occurrences for one end [100000]
         -n INT   maximum hits to output for paired reads [3]
         -N INT   maximum hits to output for discordant pairs [10]
         -c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
         -f FILE sam file to output results to [stdout]

         -P       preload index into memory (for base-space reads only)
         -s       disable Smith-Waterman for the unmapped mate
         -A       disable insert size estimate (force -s)

         -i       read group identifier (ID)
         -m       read group sample (SM), required if ID is given
         -l       read group library (LB)
         -p       read group platform (PL)
Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
       2. For reads shorter than 30bp, applying a smaller -o is recommended to
          to get a sensible speed at the cost of pairing accuracy.

the -i, -m, -l and -p options are the ticket!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News