Hi everyone,
I'm having problems with samples merged /w samtools.
I get this error when I start to run our pipeline:
MESSAGE: SAM/BAM file SAMFileReader{/csc/aaltonen/cg3/projects/Kaposi/Kapo93+94+95/09-1107/Kapo_93-95_09-1107_merged_s_99.nodup.bam} is malformed: Read ILLUMINA-8C38E9_0112:3:84:1445:15512#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://gatkforums.broadinstitute.org...lacereadgroups to fix this problem
I ran samtools with samtools merge -h rg.txt -r
Information in the rg.txt should be correct:
@RG ID:Kapo93+94+95_1_09-1107_091207_HWI-EAS418_9_s_7 DS:091207_HWI-EAS418_9 SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_100618_ILLUMINA-8C38E9_0112_s_3 DS:100618_ILLUMINA-8C38E9_0112 SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_110103_HWUSI-EAS1785_0223_s_2 DS:110103_HWUSI-EAS1785_0223 SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_110616_SN588_0054_AC00T5ABXX_s_4 DS:110616_SN588_0054_AC00T5ABXX SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_111028_SN653_0108_BB0418ABXX_s_3 DS:111028_SN653_0108_BB0418ABXX SM:09-1107
@RG ID:Kapo93+94+95_1_09-1107_120216_SN670_0098_AC04HRABXX_s_1 DS:120216_SN670_0098_AC04HRABXX SM:09-1107
@RG ID:Kapo93-95-1_09-1107_110714_SN588_0055_AB0A5UABXX_s_1 DS:110714_SN588_0055_AB0A5UABXX SM:09-1107
Any help here would be appreciated greatly!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Thank very much Wjeck and Richard!!!
Yes, Richard I think the same.
Always I read the manuals, but the useful information is in this forum because people
have similars problems.
I am only want to add the read groups to use GATK to improve the aligments from bwa bwasw.
So, I need to remove duplicates and I have a pool of reads.
For this purpose I need to put the read groups In order to avoid remove similar reads from differents individuals.
(duplicated read is a read that have the same maping coordinates and the same CIGAR string,
it isn't?)
Cris
Leave a comment:
-
I wonder if these easiest thing would be for software that insists on Read Groups instead provided a parameter to ignore read groups. This Read Groups thing has turned out to be more of a hassle than a benefit. Software should be more robust.
Leave a comment:
-
OK, It's Friday nigth... But I try the solution
With the Brugger's script!!
THANK YOU VERY MUCH!!!
Leave a comment:
-
I think I switched to using Picard tools AddOrReplaceReadGroups function.
Try looking here:
Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
Haven't had to do this in a while, though, since I started putting read groups in at the beginning, during alignment. I believe BWA now does that with proper use of the alignment option. The best way to solve this problem is to make sure it doesn't happen in the first place.
Leave a comment:
-
Add read groups to bam files using bwa-0.6.2
Hi everybody,
I tried to add the read groups to bam files without successful.
I'm using bwa (bwasw 454 reads) and I have tried with the command merge but don't work.
I have read the other post http://seqanswers.com/forums/showthread.php?t=4180
and http://sites.duke.edu/rainbowblog/20...p-information/
But I still couldn't add the read groups.
Please Can someone help me?
Thanks in advance
Cris
Leave a comment:
-
Originally posted by freeseek View Post@Michael.James.Clark the following two lines of bash code:
Code:echo -e "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina" > rg.txt samtools view -h ga.bam | cat rg.txt - | awk '{ if (substr($1,1,1)=="@") print; else printf "%s\tRG:Z:ga\n",$0; }' | samtools view -uS - | samtools rmdup - - | samtools rmdup -s - aln.bam
hi please i got this error, how can i resolve it? i have added the readgroup bam files and used samtools to merge them but when i run the somaticindel detector from GATK it will give me the error below.
here are the commands that i used in adding the read group and merge the bam files
-rh rgmt.txt - genome_110506_SN13.bam genome_110506_SN132.bam genome_110506_SN132_A.bam > newmut.bam
and here is the GATK command i used for the somaticindeldetector
elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam
and here is the error below
MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem
##### ERROR ------------------------------------------------------------------------------------------
please help me
thanks a lot
Leave a comment:
-
`@RG\tID:foo\tSM:bar'
'@RG\tID:foo\tSM:bar'
Let me know.
Leave a comment:
-
Originally posted by jyli View PostI posted this on another thread, but
-r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]
gave me error: " malformated @RG line"
Can you please help?
Probably best to use the Picard tool for it.
Leave a comment:
-
Originally posted by lh3 View PostYou may try "samtools merge", using options -r and -h. You write your @RG header lines in a file provided to -h; -r will add RG:Z: tag to each of the alignment, based on file names.
EDIT: for an example:
http://sourceforge.net/apps/mediawik...rged_alignment
I posted this on another thread, but
-r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]
gave me error: " malformated @RG line"
Can you please help?
Leave a comment:
-
I can attest that both of these tools work well. The PICARD tool in particular is vastly quicker than my previous workaround.
Leave a comment:
-
Usage: bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq>
Options: -a INT maximum insert size [500]
-o INT maximum occurrences for one end [100000]
-n INT maximum hits to output for paired reads [3]
-N INT maximum hits to output for discordant pairs [10]
-c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
-f FILE sam file to output results to [stdout]
-r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]
-P preload index into memory (for base-space reads only)
-s disable Smith-Waterman for the unmapped mate
-A disable insert size estimate (force -s)
Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
2. For reads shorter than 30bp, applying a smaller -o is recommended to
to get a sensible speed at the cost of pairing accuracy.
With -r option followed by such kind of string you could insert RG directly during mapping.
For editing RG lines use AddOrReplaceReadGroup in Picard.
Leave a comment:
-
The way to do this now is to use the Picard command line tool, in the latest picard version.
Leave a comment:
-
Originally posted by wjeck View PostFollow up question: Is there a way to edit the information in the @RG tag after the files have been merged in BAM format? I'd like to add and subtract information from these lines downstream, and I can't figure out an elegant way to get into them without writing out an entire SAM file and translating it back to BAM.
Leave a comment:
-
How to patch BWA
Ok, I figured it out! Here is how I did it...
go into the SVN checkout of bio-bwa/trunk/bwa, then run this:
Code:patch bwape.c BWA_read_group_patch.diff
Code:make
Code:./bwa sampe Usage: bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq> Options: -a INT maximum insert size [500] -o INT maximum occurrences for one end [100000] -n INT maximum hits to output for paired reads [3] -N INT maximum hits to output for discordant pairs [10] -c FLOAT prior of chimeric rate (lower bound) [1.0e-05] -f FILE sam file to output results to [stdout] -P preload index into memory (for base-space reads only) -s disable Smith-Waterman for the unmapped mate -A disable insert size estimate (force -s) -i read group identifier (ID) -m read group sample (SM), required if ID is given -l read group library (LB) -p read group platform (PL) Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3. 2. For reads shorter than 30bp, applying a smaller -o is recommended to to get a sensible speed at the cost of pairing accuracy.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
Yesterday, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
45 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: