Seqanswers Leaderboard Ad

**Solyris** · 03-08-2010, 10:01 PM

Hi,

I am quite new to NGS data here and I work with a commercial software from CLCbio which also offers a mapping algorithm of its own, called Genomic Workbench.

I would want to convert my SAM output from the software to BAM to allow using the samtools function like pileup.

I get the following error when i ran the command in Ubuntu OS

>./samtools view -huS -o DATA/test.bam DATA/s_2_1_sequence_SS200_LAwMM.sam
[samopen] SAM header is present: 24 sequences.
Parse error at line 113: CIGAR and sequence length are inconsistent
Aborted

I read somewhere in this thread that currently the samtools does not allow sam file processing without the reference sequence, so is the whats giving the problem? If so can anyone point me to a place to generate the correct reference sequence file, I tried reading through the manual but there is nowhere telling me how the reference file should be formatted. And I am looking at the whole human reference genome with 24 gbk files from NCBI.

Any help is appreciated.

Thanks
Sol

**drio** · 03-09-2010, 01:57 PM

Originally posted by Solyris View Post

Hi,

I am quite new to NGS data here and I work with a commercial software from CLCbio which also offers a mapping algorithm of its own, called Genomic Workbench.

I would want to convert my SAM output from the software to BAM to allow using the samtools function like pileup.

I get the following error when i ran the command in Ubuntu OS

>./samtools view -huS -o DATA/test.bam DATA/s_2_1_sequence_SS200_LAwMM.sam
[samopen] SAM header is present: 24 sequences.
Parse error at line 113: CIGAR and sequence length are inconsistent
Aborted

I read somewhere in this thread that currently the samtools does not allow sam file processing without the reference sequence, so is the whats giving the problem? If so can anyone point me to a place to generate the correct reference sequence file, I tried reading through the manual but there is nowhere telling me how the reference file should be formatted. And I am looking at the whole human reference genome with 24 gbk files from NCBI.

Any help is appreciated.

Thanks
Sol

samtools performs some sanity checks in the CIGAR string and it is telling you something is not right. Have you looked to that particular alignment to confirm if the CIGAR is correct?

**GoneSouth** · 03-15-2010, 08:09 AM

why do deletions in the pileup-file have a quality attached

Hi guys,

Does anyone know why deletions in the pileup file have an quality attached??? How can a deletion have a quality?
And how is this calculated??

For example:

YHet 23690 N 1 a-1n Q
YHet 23691 N 1 * [
YHet 23692 N 1 c [

or

YHet 25409 N 5 AAA-2NNa-2nnA-2NN VTW`a
YHet 25410 N 5 A$A$*** USR`a
YHet 25411 N 3 *** SG`

best ro

**jeffhsu3** · 04-05-2010, 11:47 AM

If an insertion or deletion occurs at the end of the pileup read bases string, they don't seem to the extra character after the '\+[0-9]+[ACGTNacgtn]+' pattern.

For example:
chr1 2263 C 4 ,$.$.,+1t CC9C FFFF.

Am I missing something? The pattern is described here: pileup format, and it mentions the in/del pattern '\+[0-9]+[ACGTNacgtn]+' but there appears to be an extra character in the examples given on the page:

seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<

That extra character appears to be missing if the in/del occurs at the end of the read bases string. Including that extra character as part of the insertion/deletion it makes the read_bases match with the read number.

**jdiezperezj** · 04-12-2010, 02:02 AM

So, is it already possible to convert soap aligner output format to SAM or BAM formats.
Best.
Javi

Originally posted by lh3 View Post

To corthay:

You are quick. I am planning a new bwa release as I realized that I could improve it a little without much work (PS: the new version is released now). Wgsim, wgsim_eval.pl and converters for soap and bowtie are available from SVN only:

svn co https://samtools.svn.sourceforge.net...s/dev/samtools samtools

**RockChalkJayhawk** · 04-13-2010, 11:06 AM

FLAGS for fusion detection

Lets say I have RNA-Seq data (Paired-End) and I want to find out if the mates are mapped > 1 Mb on the same chromosome or map to 2 different chromosomes. How do I determine that from the FLAGS?

**nilshomer** · 04-13-2010, 01:04 PM

Originally posted by RockChalkJayhawk View Post

Lets say I have RNA-Seq data (Paired-End) and I want to find out if the mates are mapped > 1 Mb on the same chromosome or map to 2 different chromosomes. How do I determine that from the FLAGS?

You can use the MRNM and MPOS fields in the SAM file.

**RockChalkJayhawk** · 04-13-2010, 01:14 PM

Originally posted by nilshomer View Post

You can use the MRNM and MPOS fields in the SAM file.

So in that case, my MRNM does not equal "=" OR MRNM equals "=" and the difference between POS and MPOS > 1 million.

Is this correct?

**nilshomer** · 04-13-2010, 01:24 PM

Originally posted by RockChalkJayhawk View Post

So in that case, my MRNM does not equal "=" OR MRNM equals "=" and the difference between POS and MPOS > 1 million.

Is this correct?

Perfect!

**RockChalkJayhawk** · 04-13-2010, 01:26 PM

Originally posted by nilshomer View Post

Perfect!

Thanks Nils! Youre the best!

**menenuh** · 04-16-2010, 05:44 AM

non-unique reads

Hello,
In my sam file I have both unique and non-unique reads. What happens to non-unique reads when I call SNPs from the sam file? Are they included in the SNP calling process?

thanks

**bair** · 05-11-2010, 12:59 AM

denovo on sam format

Dear all,

I have alignment results in bam file which includes pair-end, mate-pair reads in different length (101 and 35, 36bp). Does anybody know that Soap or other denovo program can handle with bam format directly or I have to use the raw reads files?

Many thanks!

**gen2prot** · 06-07-2010, 12:20 PM

Hello All,

Does anybody know how can I sort the .sam file on the basis of the first column? That is the column containing the unique read identifiers? Right now its sorted on the 3rd.

Thanks
Abhijit

**nilshomer** · 06-07-2010, 12:39 PM

Originally posted by gen2prot View Post

Hello All,

Does anybody know how can I sort the .sam file on the basis of the first column? That is the column containing the unique read identifiers? Right now its sorted on the 3rd.

Thanks
Abhijit

SAMtools and Picard will both sort by read name. See their documentation.

**gen2prot** · 06-07-2010, 06:41 PM

Hello nilshomer,

I downloaded picard. I have the .jar files on MAC osx 10.6. Yet these jar files won't open. I have them saved on the Desktop. How do I run it?

Thanks
Abhijit

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, 05-14-2024, 07:03 AM	0 responses 24 views 0 likes	Last Post by seqadmin 05-14-2024, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 44 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 58 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 44 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News