Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • allenyu
    replied
    Thanks! Now trying to use sorted reads first.

    Leave a comment:


  • marcowanger
    replied
    Hi Allen,

    Yes, you need to sort your Fastq input before running Novoalign. No luck man.


    Originally posted by zee View Post
    Hi Allenyu

    Try adding " --hdrhd 4" to your novoalign command in case there is more than 1 byte difference between the read names of a set of paired reads.
    Also note that read1 and read2 should be in order throughout your FASTQ input file. If this is not the case then most aligners will probably not do the right thing.

    Leave a comment:


  • zee
    replied
    Hi Allenyu

    Try adding " --hdrhd 4" to your novoalign command in case there is more than 1 byte difference between the read names of a set of paired reads.
    Also note that read1 and read2 should be in order throughout your FASTQ input file. If this is not the case then most aligners will probably not do the right thing.

    Leave a comment:


  • allenyu
    replied
    Originally posted by dpryan View Post
    The original SAM file also looks to have truncated names. Your read names should all end in ":8:[\d]+:[\d]+:[\d]+" (or something like that), where [\d]+ is regex for a number. The SAM file that you posted looks to have 3 reads (according to read name), but 5 reads if you look at the sequences. Is there something screwed up in your original fastq files?
    Yes you are right, it seems the read titles were screwed up by novoalign. The original read titles were fine.

    Code:
    @HWI-ST621:415:D197AACXX:7:1101:1179:2146 1:N:0:
    NCAGAATGAGCAATTAGAAATCCTCTGTNNTNNTAGNNNNCTGGAAATTAAACCAAGTGTATAATGCACCTAATGAAGTGTATGGTCTGANGTTTAANTAG
    +
    #1=DDFFFHHHHHJJJJJJJJJJJJJJI##2##1:C####00?DHGIJJJEHIHIEHCHFGIIJJJIGEEHHFEHFFFDDDFEEECDEDC#,5<@@C####
    @HWI-ST621:415:D197AACXX:7:1101:1185:2187 1:N:0:
    TTTGAACATCCCCACTAGGTTCTTTTCCATTGNCAANNNGGAGCATCAGCCAGTGAATCTGTTTCAGGTTTCCATTCTGCAGAACTCCTCCAAAGCATGTG
    +
    CCCFDFFFHHHHHEHIJJJCHHIIJJIIGGIG#1:C###00?DHIJHGIIJJJGHIEHIIIGDHGIJI@DHFH>AEHFFFFFFECCCCEDCDCCDDDCDCC

    Leave a comment:


  • allenyu
    replied
    Originally posted by maubp View Post
    Very strange. Was that a typo in the version of samtools (I have 0.1.18 on my machine), or do you really have an out of date copy?
    You are right, that was a typo mistake. Thanks for spotting that.

    Leave a comment:


  • dpryan
    replied
    The original SAM file also looks to have truncated names. Your read names should all end in ":8:[\d]+:[\d]+:[\d]+" (or something like that), where [\d]+ is regex for a number. The SAM file that you posted looks to have 3 reads (according to read name), but 5 reads if you look at the sequences. Is there something screwed up in your original fastq files?

    Leave a comment:


  • maubp
    replied
    Very strange. Was that a typo in the version of samtools (I have 0.1.18 on my machine), or do you really have an out of date copy?

    Leave a comment:


  • SAM/BAM sort by read names produces truncated read names

    Hi,

    I tried to sort the alignment file by read name, but it appears that truncated read names were produced. This phenomenon was observed no matter which program I used: SAMtools sort (0.1.8), Picard SortSam (1.77) or Novosort (2.08) .

    Here is the first few records of the original SAM file:
    Code:
    HWI-ST621:415:D197AACXX:8:1101:1        113     chr2    236798427       70      100M1S  chr8    3088040 0       ACCTCTGTTTCTAAGCAGTGGAATAGAATTGCTTATGGAATAGCCAGGTCATAGGATGTNATAANTTCCCTGGAAATCAGAGGGGAAAAGAAGCAAAACAN   C@?>?AC@:C@>CECDEE@ACFEBFFDEEHECDACADHFHFEHIJGJIGIHJJIHDB80#HF?1#GDJIHCIGGHHAIIIJJHEHJJIHHHHHFFFDD=1#        PG:Z:novoalign  RG:Z:LS148      AS:i:18 UQ:i:18 NM:i:2  MD:Z:59G4T35
    HWI-ST621:415:D197AACXX:8:1101:1        177     chr8    3088040 70      101M    chr2    236798427       0       AAATACATACATACACACAGACTGATTTTCTCTTCAGCAATATTTTAATGAAACCCCATACTGCAAATTACATAAACTAGTTAAAGTACACCAACCTCAAG   DEEDDDFDCEECEEDDBFFFDHHHFGHECJJIHFJJJIJJJIJHHGIHGDDGGJJJIIHGHIJJJIIJIGJJIIIFIIJJJJJJIIHFFAHHHDFFDFCCB        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST621:415:D197AACXX:8:1101:1223:2124        83      chr8    143208201       70      100M1S  =       143207998       -303    CGCTGAGAGCAAGGTGCCAGCAGGGTGGGCCCTTCTGGAGGCTCCGGCCGGGATCTGTTCCAGGCCACCCCCGCCTTCCGGCCATCCTCAGCTTGGCTCCN   >@CA>A:A>>>3(CA<AACDDDB<<?3?@9?CDCDCBCC?7<BBDBB@<93?DCCAA8<B?A<<DB7DCIGGBHGAHIIHFJJIEJIIHHHHHFFFDD=1#        PG:Z:novoalign  RG:Z:LS148      AS:i:47 UQ:i:47 NM:i:1  MD:Z:6C93       PQ:i:59 SM:i:70 AM:i:70
    HWI-ST621:415:D197AACXX:8:1101:1223:2124        163     chr8    143207998       70      92M     =       143208201       303     TTGTGGAGTCAGGTGTCCCTGGGGTCACGGTGACTGGCCAGGCGNGGGGAGCCAGGAGGCACACGGTCCTGGGCTCTNGCAGGGCTGGAGTG    @BBDFFADD?FHH@@EGGGGIIII@BCGHG8?DGHGB@FHHGAG#-<CC;@E?ACEE?B7?BCA?B;?BDDCB9??A#++28?B?B@B1<>A PG:Z:novoalign  RG:Z:LS148      AS:i:12 UQ:i:12 NM:i:2  MD:Z:44C32G14   PQ:i:59 SM:i:70 AM:i:70
    HWI-ST621:415:D197AACXX:8:1101:14       65      chr6    74783346        70      1S100M  chr1    1867309 0       NGATTAAGCAGCCAAGCTGTATCCTGAGGGAAACATGGGCAATGGAAAGCATCAGATTTCCTGGGTCAAAGCTATCCTGAGCTCAGGCACTGGGCTAACTG   #4=DFFFFGHHHHJJJJJJIJJJJJJJJJJGHIJIHIIJIGIIJJBFHIIIJJJJDIJJIHHIJJIGGHHHHHFFFFFFEDEEEEDDD@DDDDDDCDCDDD        PG:Z:novoalign  RG:Z:LS148      AS:i:6  UQ:i:6  NM:i:0  MD:Z:100
    HWI-ST621:415:D197AACXX:8:1101:14       129     chr1    1867309 70      101M    chr6    74783346        0       ACACACACACACACACACGAACTGCAGGGGGCTCTGGAGCCATGGAGTTAGAAAAGCTCTCTGAGAGGCCAGGTGTAGTGGCTCATGCCTGTAATCCCAGC   CCCFDFFFHHHHGJJJIJJIJJJJJFHIJIJFHIJJJDHEHHHHG@D?BDACCEDCBDDDDDDCDDDDBDBDB@CCCCCCBDDCCC@ACAC@>AB>CCACD        PG:Z:novoalign  RG:Z:LS148      AS:i:30 UQ:i:30 NM:i:1  MD:Z:68T32
    HWI-ST621:415:D197AACXX:8:1101:14       97      chr2    62756955        70      1S100M  chr6    74783591        0       NGTGCTGTTTGGTTTGTGTGTATTATATGGGTTTGGATTACAATAATTCCTCCCTTTTGTATAATGTTTTGCAGTTTTTAAAGCACTTCATGCTCTAAATC   #1=DDFFDHHGGFHIIHHEHGFGIDHHIIIIFGIIICGGEHHHIIIII>GGGIIIIIIIICFGHHGGHIIIIDAAEHHHEBDDFCEEECCDCCCCCC>ACC        PG:Z:novoalign  RG:Z:LS148      AS:i:6  UQ:i:6  NM:i:0  MD:Z:100
    HWI-ST621:415:D197AACXX:8:1101:14       145     chr6    74783591        70      101M    chr2    62756955        0       ATTTTTGTAAGTCACCAATGGTTGGATGTTGGCAGTTTCATAAGGTTCATTCTAATAGTTCCTGGGACACAAATGACTCGAAGTAGGTCAAGACAGGTTCA   <DDDDDDDDDDDDEEECCFDFFGHEHGJJIIJJJJIGHIHCIIIJIGCGIIGDIHEIIHGIJJJJIHIIJJIIHGBHHJIJJJJJJJJHHHFHFFFFD?C@        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST621:415:D197AACXX:8:1101:1        81      chr1    155944063       70      101M    chr11   19838477        0       CAGCTGTACCTGGCAGCAGCCCCTTCCCCAAGATGGTGACACCTCTGTCCACACCCTCTGTAATAGTGACCGGAGAGCCTGTGGAGCATTCCACCAGGATT   DDDEDAA:BCAA:DD@BDDDDB?@=BDEDEEDFFFD@;??=HHIIIIGJIHF<JIHFGBIHIJIIIIIHJJJJJJJIJJJJIJJJJJIHHHHHFFFFFCC@        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST621:415:D197AACXX:8:1101:1        161     chr11   19838477        70      101M    chr1    155944063       0       AGCCCCTTATGCAGAAAAAGGGACTCCACCTGGAGCCCTCTCTGGATCTACTTCTCCCAGATAAATCAGTCGGCTGTGTAATCTTTCAGGAAACCTGACCC   ??<DDFFFFHHDDDHIGDDAFE9FFGHGCHEGG9FGGHGGGGCFHBF*0BBCBGGE@GHGCHA@ECE@H;ADBFDCDDCCDD@CCC;33:32:595<9>3<        PG:Z:novoalign  RG:Z:LS148      AS:i:1  UQ:i:1  NM:i:0  MD:Z:101
    After sorting:
    Code:
    HWI-ST  81      chr7    83652142        70      82M     chr8    142160880       0       CTTTGTATTTACAGATACCACGGCCATTTTGCAATGTCCTCAGCACATAGTGGAAGCTGAACAAACAATCACATTTTCTAAT      @D<EA?7)==77@=7)('-'FF;FABB*0>EDB9DFDGDEBDEECC<FHHHBE@9HHEAB<;>FFDBBFA<DFA;A,B48;?   PG:Z:novoalign  RG:Z:LS148      AS:i:22 UQ:i:22 NM:i:1  MD:Z:76A5
    HWI-ST  65      chr9    120922414       70      101M    chr6    160312253       0       TCACTGAGTCTGATTGAAGCAACTGGCATTGGTGATCATACTTCAATATTTCTCTCATATTTGAAGTTAGAATTAGTTGATGTGAGATATTATATTAGCCT   @CCFFFFFHFHFAHHIDGHIJGIIJGHCGIJICFHIIIIIJIJJIIJGIIEIJHHGGIICGHIBGHFGHHGGHIDC@DHGIHGIGHHHHECBDFFFFFEDE        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST  81      chr2    46872242        70      101M    chr17   79461315        0       CATGGATTAAAATATTAAGTAATTTGATCTAGATGATTGTTTACAGTTTAACGCAAATACACTTAGTCTGTTCTGATTATTTACTCAAGGATTATATTACT   >C>:EDDFCDDFFDFFHHHHHHJIHGG=GIGJJIIIIJGIIIHIJHDGGHHJFIIJIIGC:JHHAIIFJJJIHGH@IJJJHHCGB>HGGHGHHFFFFF@C@        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST  65      chr8    103315908       70      93M     chr17   40205036        0       AGATATCTGAGAAACTGACCTAAATAAGCAATCTGAAAAGATTAAGGTTCCTTCAATTATTATACTACTTGTTCTCCAAATAACACACTAACT   <@@ADD>DDBA<FG?A43?@FFF:3AEB>DFECE91:C<CFCFCFFC::4?D>FCDDD<FC8DFEFDG88@.==C=4@D;7@:7?CCBDD@>@        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:93
    HWI-ST  89      chr16   61016706        70      101M    =       61016706        0       TGTTGAGTCAATGTAAGACCTTGGTAAGAATTCTTCAATTTAGACATGGCTAATTTTTAATGTCAACCACAGCTATTGAGGTACTTATATTAATTAACCTT   C?CECACCFFFFDDDE?=CCGGIIIGGEGIIIGGIIEGIIHHDBFGIGFIIIHGIIIIIGGIHG@CHHHGHHHGHDEIFIIGIHBIIIHHDDHEDEDF@?@        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST  97      chr12   16510044        70      101M    chr9    75346048        0       TAATAAAAATTCAGTTTTAACTATAGATGCCTTCTTCTCCTCTTGTGTTTGATTTATTGCTCCAAATGGGCCAACCTGGATGTCTATATTTCTTCCACTAA   CCCFFFFFHHHHHJIIGIIJJIJIIJJJJJJIEIIJJJJJHJIJGFGFHJJJIIJJJJJJJJIGJJJJIIJJJIJHJHHFHHBBEDFFCFEFEEEEDDDDD        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST  73      chr5    22843028        70      97M     =       22843028        0       TAACTGTGTTTACTTTTCTCAGTTTCTACCAGAGAAAAGGCAGGTGCATTTTTTTGGTATGTTTGTGTAAAGTGAATTTGGCTTTACTTTTTCAAAT       =?<DD>=;FHDFFHGE@EFH?EA<B4AA@EBGCC1?91*:8CFG0?@?<D@@B;AFB=7=3?CHEEBE77B@6>;(6;.;;@;?>A>5(5:@CC5@>    PG:Z:novoalign  RG:Z:LS148      AS:i:3  UQ:i:3  NM:i:0  MD:Z:97
    HWI-ST  73      chr6    152150636       70      101M    =       152150636       0       CATTTGTCATCATTACACGGTCATGGGAGTGCTAAGAAGACTTAAATGCAGGGCTACCACCCCTTCCCAATTCATCTTTTATCCATTTTATTTCTCTAAGG   @CCDDDDEHHFHHFBHGGHHAFEFFHIGG:?CFGIGIGGHHEGIEHIGHGDE@;B=FA@F@FGGGEEHECCFFEFFCECDECCCDDDEDDCC@BCC>CCCC        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    HWI-ST  113     chr7    63064316        30      101M    chr17   26080536        0       CCTGCTCATCTCAGGCCTGCCGGCTCCTCCACCTGCCTTTTCGAGTACCCTGGGAACCCCCCGAGGACAGGTGTCATCGGTTGCTTCATCTCACCATCCCT   A94+(:ACCC??@BB@@7DDBDB<2????@8;BDB@A@BCDBCCCA<-DCC>3?8DB=7@@IHCIIJIGIJIIIJGHHGGGGHGEIDIFFFFAFFFDF@@@        PG:Z:novoalign  RG:Z:LS148      AS:i:31 UQ:i:31 NM:i:1  MD:Z:42C58
    HWI-ST  89      chr4    96140737        70      101M    =       96140737        0       AACAACGAGCCTCACTAGGTGACGATTAGCTATGGTTTCCCTGGTCTATACTGGATTTGGGTTCATTGGTAAATCATTCTATTCATAGCAATACAAGATAT   <<A?8DDDDDDCCAEEEFFFFHHHHHFIJJJJIIIIIGIGHIFIGHIIGDGGIJIJJIIHIHIEHIIJJJJJJIIJJJIJJIIJJIJJHGHHHFFFFFB@@        PG:Z:novoalign  RG:Z:LS148      AS:i:0  UQ:i:0  NM:i:0  MD:Z:101
    Does anyone have any idea of what's wrong with the programs or data?

    Thanks a lot!

    Allen

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-21-2024, 07:49 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-20-2024, 07:23 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Working...
X