Similar problem
I am also generating a 200 byte file when I index, however the other 6 lanes I use the same technique so I am not sure why this lane is generating a 200 byte index file. What did you do to correct this?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I fixed the problem by following Heng Li's suggestion from samtools mailing list.
The SAM file (generated from SOLiD GFF) had different FASTA ids than the ones in the REF_FASTA. Changing the REF_LIST names to match SAM entries did the trick.
Leave a comment:
-
Thanks, Nils, for the suggestion.
Here are a first few lines, is there something obvious I am missing?
Thanks.
samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam | head
5692_30_692 16 * 1 255 50M * 0 0 ACTCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGT !$$$"@:7@@@<@@@@;@@@?@@@@@@@@@<@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T31102031030112131310303322331312020112001122201221 CQ:Z:=:@;;?:><5@7;9:=>9?&7=892:5:<(8/72*<8-7&=8)/,7.$$% MD:Z:1TC47
690_1188_353 16 * 1 255 50M * 0 0 CCCCAACCCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAAC !=6-&'2?##058<'&67/.&&&&-8@@:664@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T11032001032001032001032000100000103000103100101000 CQ:Z:3;:8<49?>;,;8;:5:,+,/<2''&&&&)'1&',1(.##2.%#$*-1 MD:Z:3A46
2580_1784_289 16 * 2 255 50M * 0 0 NACCCTAACCCTAACCCTAACCCTAACCCTAACCGACCCTCACCCTCACC !@@1@@@=>@="""@@@))@@=6@@@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10112200112200123010320010320010020011120010320012 CQ:Z:6>9:8>7;:86@(>:A;89<==6,7><1&8:8;1;,()59&8,8,&<' MD:Z:T49
4592_669_181 16 * 2 255 50M * 0 0 AGCAACTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGTC !7'))*61@,'00:##.0@?:@@@<?@@@22@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T12110203103011213131130332233131202111201112210132 CQ:Z:;=<;=<<;;;=<8:<<=66<2;0:<$9577$<*'(#4'*',:/#4**)'1 MD:Z:3AA45
276_439_1731 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCGAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@<@@@>7=@@7114?<,""?@/,&&"""+! RG:Z:EGAN CS:Z:T22301002301002301002301002301002310032110023311320 CQ:Z:=9:8:;8::367<895669/6,17,6)//6,,&,)7&')'9)'&&&&&&& MD:Z:36T13
1109_603_622 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@@@@@<<((@?7)(8<16&&@8&,**@'&! RG:Z:EGAN CS:Z:T22301002301002301002301002300002300023000020000000 CQ:Z:@<=AA?>;><;>>9;68:85151631,1(61/)1(1,&1&;3&,3*95'& MD:Z:50
5612_1858_1717 16 * 4 255 50M * 0 0 TAAAACTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCT !""%%%**@<186/11;>"""69<@@@@::@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T32001032001032001032001032000102200103200100210033 CQ:Z:=:<<>7=>7<8;:;379<18/,/5/51,.)/&,3)))'0))45**%&&)/ MD:Z:3GG45
5818_1717_156 0 * 6 255 50M * 0 0 AACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCAAAA @@@@@@@@@@@@@@@@@@@;@@=@@8=@"""@@><>%%@9;>$$@('''! RG:Z:EGAN CS:Z:T30100230100230100230100230100001002302002300001000 CQ:Z:8:4<;99569A::11;;843):,233&83$+,80/.1%31)3,$:8(0'5 MD:Z:46TT2
5482_1175_1664 0 * 7 255 50M * 0 0 CCCTAACCCTAACCCTTACCCTTAACCCTAACCCTAACCCTAACCCTAAC @@@@@@@@@@@4@@@@18>:===7?@147@@@@@''67@9&&&&?:&&7! RG:Z:EGAN CS:Z:T20023010023010020310020301002301002001002000002201 CQ:Z:1;686;53=25,):3;,&3,////)7,&/)9+6+9'1&21)&1&1/,&2& MD:Z:21C28
3491_1092_1469 16 * 8 255 50M * 0 0 CCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCTAACC !"""<=@"""619.1?+<@<)3@=<@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10103200103200103200103200103000013000010000010000 CQ:Z:;5?A<>A;A@<<:=:<@687698397&853)47&&:1.,&189/6(5,)& MD:Z:50
Leave a comment:
-
Originally posted by wdt View PostThanks, Nils. Here are the outputs:
samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10
[bam_view] fail to get the reference name. Abort!
$ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated
generated nothing!
I named FASTA headers in reference differently, could this be the problem?
Also, just FYI, this is single frag data, not mate pair.
Here is the header for the BAM:
samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
@HD VN:1.0 SO:coordinate
@PG ID:SOLID-GffToSam VN:1.0
@RG ID:EGAN SM:COMBINED_556
@SQ SN:chr_1_validated LN:247249719
@SQ SN:chr_2_validated LN:242951149
@SQ SN:chr_3_validated LN:199501827
@SQ SN:chr_4_validated LN:191273063
@SQ SN:chr_5_validated LN:180857866
@SQ SN:chr_6_validated LN:170899992
@SQ SN:chr_7_validated LN:158821424
@SQ SN:chr_8_validated LN:146274826
@SQ SN:chr_9_validated LN:140273252
@SQ SN:chr_10_validated LN:135374737
@SQ SN:chr_11_validated LN:134452384
@SQ SN:chr_12_validated LN:132349534
@SQ SN:chr_13_validated LN:114142980
@SQ SN:chr_14_validated LN:106368585
@SQ SN:chr_15_validated LN:100338915
@SQ SN:chr_16_validated LN:88827254
@SQ SN:chr_17_validated LN:78774742
@SQ SN:chr_18_validated LN:76117153
@SQ SN:chr_19_validated LN:63811651
@SQ SN:chr_20_validated LN:62435964
@SQ SN:chr_21_validated LN:46944323
@SQ SN:chr_22_validated LN:49691432
@SQ SN:chr_X_validated LN:154913754
@SQ SN:chr_Y_validated LN:57772954Code:samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam
This seems perfect for the samtools help mailing list ([email protected]).
Leave a comment:
-
Thanks, Nils. Here are the outputs:
samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10
[bam_view] fail to get the reference name. Abort!
$ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated
generated nothing!
I named FASTA headers in reference differently, could this be the problem?
Also, just FYI, this is single frag data, not mate pair.
Here is the header for the BAM:
samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
@HD VN:1.0 SO:coordinate
@PG ID:SOLID-GffToSam VN:1.0
@RG ID:EGAN SM:COMBINED_556
@SQ SN:chr_1_validated LN:247249719
@SQ SN:chr_2_validated LN:242951149
@SQ SN:chr_3_validated LN:199501827
@SQ SN:chr_4_validated LN:191273063
@SQ SN:chr_5_validated LN:180857866
@SQ SN:chr_6_validated LN:170899992
@SQ SN:chr_7_validated LN:158821424
@SQ SN:chr_8_validated LN:146274826
@SQ SN:chr_9_validated LN:140273252
@SQ SN:chr_10_validated LN:135374737
@SQ SN:chr_11_validated LN:134452384
@SQ SN:chr_12_validated LN:132349534
@SQ SN:chr_13_validated LN:114142980
@SQ SN:chr_14_validated LN:106368585
@SQ SN:chr_15_validated LN:100338915
@SQ SN:chr_16_validated LN:88827254
@SQ SN:chr_17_validated LN:78774742
@SQ SN:chr_18_validated LN:76117153
@SQ SN:chr_19_validated LN:63811651
@SQ SN:chr_20_validated LN:62435964
@SQ SN:chr_21_validated LN:46944323
@SQ SN:chr_22_validated LN:49691432
@SQ SN:chr_X_validated LN:154913754
@SQ SN:chr_Y_validated LN:57772954Last edited by wdt; 10-02-2009, 11:40 AM.
Leave a comment:
-
The BAMs I work with are 80-200GB in size and have "BAI" (BAM indexes) that are about 5-10MB in size. 200 bytes seems rather small. Can you try "samtools view ${BAM_FILE}.sorted.bam chr10", which should start printing alignments at chromosome 10?
Leave a comment:
-
SAM to .bai conversion
Hi,
I performed the following steps for SAM to bai file conversion:
samtools import ${REF_LIST} ${SAM_FILE} ${BAM_FILE}
# Sort BAM
samtools sort ${BAM_FILE} ${BAM_FILE}.sorted
# Generated 20 GB sorted.bam file
# Index sorted bam, output in default file ${BAM_FILE}.sorted.bai
samtools index ${BAM_FILE}.sorted.bam
Last step generated a .bai file with just 200 bytes in it. Is this expected size?
I did not see any error messages
Thanks!!!
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:09 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
Today, 11:09 AM
|
||
Started by seqadmin, Today, 06:13 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
Today, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Leave a comment: