Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UNCKidney
    replied
    Similar problem

    I am also generating a 200 byte file when I index, however the other 6 lanes I use the same technique so I am not sure why this lane is generating a 200 byte index file. What did you do to correct this?

    Leave a comment:


  • wdt
    replied
    I fixed the problem by following Heng Li's suggestion from samtools mailing list.

    The SAM file (generated from SOLiD GFF) had different FASTA ids than the ones in the REF_FASTA. Changing the REF_LIST names to match SAM entries did the trick.

    Leave a comment:


  • wdt
    replied
    Thanks, Nils, for the suggestion.

    Here are a first few lines, is there something obvious I am missing?

    Thanks.


    samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam | head
    5692_30_692 16 * 1 255 50M * 0 0 ACTCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGT !$$$"@:7@@@<@@@@;@@@?@@@@@@@@@<@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T31102031030112131310303322331312020112001122201221 CQ:Z:=:@;;?:><5@7;9:=>9?&7=892:5:<(8/72*<8-7&=8)/,7.$$% MD:Z:1TC47
    690_1188_353 16 * 1 255 50M * 0 0 CCCCAACCCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAAC !=6-&'2?##058<'&67/.&&&&-8@@:664@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T11032001032001032001032000100000103000103100101000 CQ:Z:3;:8<49?>;,;8;:5:,+,/<2''&&&&)'1&',1(.##2.%#$*-1 MD:Z:3A46
    2580_1784_289 16 * 2 255 50M * 0 0 NACCCTAACCCTAACCCTAACCCTAACCCTAACCGACCCTCACCCTCACC !@@1@@@=>@="""@@@))@@=6@@@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10112200112200123010320010320010020011120010320012 CQ:Z:6>9:8>7;:86@(>:A;89<==6,7><1&8:8;1;,()59&8,8,&<' MD:Z:T49
    4592_669_181 16 * 2 255 50M * 0 0 AGCAACTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGTC !7'))*61@,'00:##.0@?:@@@<?@@@22@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T12110203103011213131130332233131202111201112210132 CQ:Z:;=<;=<<;;;=<8:<<=66<2;0:<$9577$<*'(#4'*',:/#4**)'1 MD:Z:3AA45
    276_439_1731 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCGAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@<@@@>7=@@7114?<,""?@/,&&"""+! RG:Z:EGAN CS:Z:T22301002301002301002301002301002310032110023311320 CQ:Z:=9:8:;8::367<895669/6,17,6)//6,,&,)7&')'9)'&&&&&&& MD:Z:36T13
    1109_603_622 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@@@@@<<((@?7)(8<16&&@8&,**@'&! RG:Z:EGAN CS:Z:T22301002301002301002301002300002300023000020000000 CQ:Z:@<=AA?>;><;>>9;68:85151631,1(61/)1(1,&1&;3&,3*95'& MD:Z:50
    5612_1858_1717 16 * 4 255 50M * 0 0 TAAAACTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCT !""%%%**@<186/11;>"""69<@@@@::@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T32001032001032001032001032000102200103200100210033 CQ:Z:=:<<>7=>7<8;:;379<18/,/5/51,.)/&,3)))'0))45**%&&)/ MD:Z:3GG45
    5818_1717_156 0 * 6 255 50M * 0 0 AACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCAAAA @@@@@@@@@@@@@@@@@@@;@@=@@8=@"""@@><>%%@9;>$$@('''! RG:Z:EGAN CS:Z:T30100230100230100230100230100001002302002300001000 CQ:Z:8:4<;99569A::11;;843):,233&83$+,80/.1%31)3,$:8(0'5 MD:Z:46TT2
    5482_1175_1664 0 * 7 255 50M * 0 0 CCCTAACCCTAACCCTTACCCTTAACCCTAACCCTAACCCTAACCCTAAC @@@@@@@@@@@4@@@@18>:===7?@147@@@@@''67@9&&&&?:&&7! RG:Z:EGAN CS:Z:T20023010023010020310020301002301002001002000002201 CQ:Z:1;686;53=25,):3;,&3,////)7,&/)9+6+9'1&21)&1&1/,&2& MD:Z:21C28
    3491_1092_1469 16 * 8 255 50M * 0 0 CCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCTAACC !"""<=@"""619.1?+<@<)3@=<@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10103200103200103200103200103000013000010000010000 CQ:Z:;5?A<>A;A@<<:=:<@687698397&853)47&&:1.,&189/6(5,)& MD:Z:50

    Leave a comment:


  • nilshomer
    replied
    Originally posted by wdt View Post
    Thanks, Nils. Here are the outputs:

    samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

    [bam_view] fail to get the reference name. Abort!

    $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

    generated nothing!

    I named FASTA headers in reference differently, could this be the problem?
    Also, just FYI, this is single frag data, not mate pair.


    Here is the header for the BAM:

    samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
    @HD VN:1.0 SO:coordinate
    @PG ID:SOLID-GffToSam VN:1.0
    @RG ID:EGAN SM:COMBINED_556
    @SQ SN:chr_1_validated LN:247249719
    @SQ SN:chr_2_validated LN:242951149
    @SQ SN:chr_3_validated LN:199501827
    @SQ SN:chr_4_validated LN:191273063
    @SQ SN:chr_5_validated LN:180857866
    @SQ SN:chr_6_validated LN:170899992
    @SQ SN:chr_7_validated LN:158821424
    @SQ SN:chr_8_validated LN:146274826
    @SQ SN:chr_9_validated LN:140273252
    @SQ SN:chr_10_validated LN:135374737
    @SQ SN:chr_11_validated LN:134452384
    @SQ SN:chr_12_validated LN:132349534
    @SQ SN:chr_13_validated LN:114142980
    @SQ SN:chr_14_validated LN:106368585
    @SQ SN:chr_15_validated LN:100338915
    @SQ SN:chr_16_validated LN:88827254
    @SQ SN:chr_17_validated LN:78774742
    @SQ SN:chr_18_validated LN:76117153
    @SQ SN:chr_19_validated LN:63811651
    @SQ SN:chr_20_validated LN:62435964
    @SQ SN:chr_21_validated LN:46944323
    @SQ SN:chr_22_validated LN:49691432
    @SQ SN:chr_X_validated LN:154913754
    @SQ SN:chr_Y_validated LN:57772954
    Does
    Code:
    samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam
    work?

    This seems perfect for the samtools help mailing list ([email protected]).

    Leave a comment:


  • wdt
    replied
    Thanks, Nils. Here are the outputs:

    samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

    [bam_view] fail to get the reference name. Abort!

    $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

    generated nothing!

    I named FASTA headers in reference differently, could this be the problem?
    Also, just FYI, this is single frag data, not mate pair.


    Here is the header for the BAM:

    samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
    @HD VN:1.0 SO:coordinate
    @PG ID:SOLID-GffToSam VN:1.0
    @RG ID:EGAN SM:COMBINED_556
    @SQ SN:chr_1_validated LN:247249719
    @SQ SN:chr_2_validated LN:242951149
    @SQ SN:chr_3_validated LN:199501827
    @SQ SN:chr_4_validated LN:191273063
    @SQ SN:chr_5_validated LN:180857866
    @SQ SN:chr_6_validated LN:170899992
    @SQ SN:chr_7_validated LN:158821424
    @SQ SN:chr_8_validated LN:146274826
    @SQ SN:chr_9_validated LN:140273252
    @SQ SN:chr_10_validated LN:135374737
    @SQ SN:chr_11_validated LN:134452384
    @SQ SN:chr_12_validated LN:132349534
    @SQ SN:chr_13_validated LN:114142980
    @SQ SN:chr_14_validated LN:106368585
    @SQ SN:chr_15_validated LN:100338915
    @SQ SN:chr_16_validated LN:88827254
    @SQ SN:chr_17_validated LN:78774742
    @SQ SN:chr_18_validated LN:76117153
    @SQ SN:chr_19_validated LN:63811651
    @SQ SN:chr_20_validated LN:62435964
    @SQ SN:chr_21_validated LN:46944323
    @SQ SN:chr_22_validated LN:49691432
    @SQ SN:chr_X_validated LN:154913754
    @SQ SN:chr_Y_validated LN:57772954
    Last edited by wdt; 10-02-2009, 11:40 AM.

    Leave a comment:


  • nilshomer
    replied
    The BAMs I work with are 80-200GB in size and have "BAI" (BAM indexes) that are about 5-10MB in size. 200 bytes seems rather small. Can you try "samtools view ${BAM_FILE}.sorted.bam chr10", which should start printing alignments at chromosome 10?

    Leave a comment:


  • wdt
    started a topic SAM to .bai conversion

    SAM to .bai conversion

    Hi,

    I performed the following steps for SAM to bai file conversion:

    samtools import ${REF_LIST} ${SAM_FILE} ${BAM_FILE}

    # Sort BAM
    samtools sort ${BAM_FILE} ${BAM_FILE}.sorted

    # Generated 20 GB sorted.bam file

    # Index sorted bam, output in default file ${BAM_FILE}.sorted.bai
    samtools index ${BAM_FILE}.sorted.bam

    Last step generated a .bai file with just 200 bytes in it. Is this expected size?
    I did not see any error messages

    Thanks!!!

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 02:20 PM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-07-2024, 06:58 AM
0 responses
181 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:18 AM
0 responses
228 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:04 AM
0 responses
184 views
0 likes
Last Post seqadmin  
Working...
X