Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Brian Bushnell
    replied
    Originally posted by reventropy View Post
    You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.
    That's exactly why I made the suggestion; there are a lot of poorly-written tools that break read pairing, and that's usually the culprit.

    If you need to do quality or adapter trimming, I can suggest BBDuk, which is made to handle single or paired files, keeping reads together. It's extremely fast and uses a better quality-trimming algorithm than most alternatives, as well as being more sensitive in adapter-trimming (you can specify the number of mismatches allowed). You can also use it for contaminant removel (phiX, e.coli, various spike-ins or vectors).

    Leave a comment:


  • reventropy
    replied
    I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.
    After looking into this some more, I'm not sure there is a way to feed multiple files into the galaxy Tophat2 wrapper. Fortunately it looks like they have tool specifically for combining paired end read files (which I swear I looked for before ). We'll see if this works. As a backup, we'll run another instance of Tophat2 via command line arguments.

    You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.

    Leave a comment:


  • Brian Bushnell
    replied
    I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.

    Leave a comment:


  • reventropy
    replied
    Thanks for the response yueluo. I ran it through a galaxy wrapper but I selected the first-strand option, so the wrapper should be passing the command onto Bowtie. I just spoke with a colleague who informed me that my paired end reads appear to be out of order.

    For instance:

    Read1-foreward:
    1101:1432:2038 1:N:0:TGACCA
    Read1-Reverse
    1101:1452:2018 2:N:0:TGACCA

    This may have happened when I concatenated the files, or it might just be how I received the sequencing data. Do you have any ideas about how I can re-sort by coordinates?

    Leave a comment:


  • yueluo
    replied
    What options did you use when running tophat/bowtie ?
    Since you use stranded-data, you might want to check the '--library-type' option.

    Leave a comment:


  • reventropy
    started a topic High discordant alignments

    High discordant alignments

    I've set up a galaxy workflow for paired end first stranded RNAseq, and I've gotten some odd summary results from Tophat2 alignment. At least I think they're odd as I'm new to this.

    Left reads:
    Input : 218685181
    Mapped : 193500858 (88.5% of input)
    of these: 14727362 ( 7.6%) have multiple alignments (40016 have >20)
    Right reads:
    Input : 218685181
    Mapped : 196263585 (89.7% of input)
    of these: 14724480 ( 7.5%) have multiple alignments (40380 have >20)
    Unpaired reads:
    Input : 5950944
    Mapped : 5300035 (89.1% of input)
    of these: 227937 ( 4.3%) have multiple alignments (142 have >20)
    89.1% overall read mapping rate.

    Aligned pairs: 173668750
    of these: 13863688 ( 8.0%) have multiple alignments
    170432898 (98.1%) are discordant alignments
    1.5% concordant pair alignment rate.
    Here's the flagstat output


    490744296 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    490744296 + 0 mapped (100.00%:-nan%)
    486148534 + 0 paired in sequencing
    241299292 + 0 read1
    244849242 + 0 read2
    523372 + 0 properly paired (0.11%:-nan%)
    443477134 + 0 with itself and mate mapped
    42671400 + 0 singletons (8.78%:-nan%)
    418612688 + 0 with mate mapped to a different chr
    312416516 + 0 with mate mapped to a different chr (mapQ>=5)
    For the number of reads mapped, the concordant pairs seem extremely low. I'm wondering if I missed a parameter in Tophat or Bowtie? Notably, I have not set a read group identifier in Bowtie (necessary?), nor could I figure out how from the Bowtie documentation. I also wonder if something could be awry with my fastq files, as they have been concatenated from a split dataset. Here are the first couple reads from the foreward and reverse data respectively.

    @HW-ST997:217:C3KKGACXX:4:1101:1432:2038 1:N:0:TGACCA
    TTCATCTTTAGATAATGAATTATATCCAAGATCAGACTGGCCACCTGTACTAGATCTATCATCAGTAGCATATACTTTGATTAAACCCG
    +
    FF00B<<FFFFFFBBFFFBFIFBBF0BBFFFFBFFFFIF<FFF<FBFF7BBBB<<B<''<B<BBB<<BBBBBFFFBBF<<B<7B7<BBB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 1:N:0:TGACCA
    GAGGGAGTATAGGGCTGTGACTAGTATGTTGAGTCCTGTAAGTAGGAGAGTGATATTTGATCAGGAGAACGTGGTTACTAGCACAGAGA
    +
    FIFIIBFBBFFFIIFFFFFFFFFFFBFFIIIFFFIIIFFFFFFFFFBF<BBBBF0BFFFBFFBFFFFFFFBFBFBFB<BBBBBBBBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1451:2106 1:N:0:TGACCA
    ACTGGGAAACGTTCACGCTGGGTCCAGCATTTGCCATGGACAAGATGCCAGGACCCGTATGCTTCAGGATGAAGTTCTTGTCATCAAAT
    +
    FIIFFBBFFFFFFBB7<7BBFFF77BBFFIFFFIFBFFFIFFIIF<B<0<BB7BBBBB<BBBBBBBB0BBBB0<7<BBBB0'0B<B<BB




    @HW-ST997:217:C3KKGACXX:4:1101:1452:2018 2:N:0:TGACCA
    TTACCCCCATACTCCTTACACTATTCCTCATCANCCNACTAAAAATATTAAACACAAACTACCACCTACCTCCCTCACCAAAGCCCATA
    +
    FFFFFFFF7FFFIIIIIFFFFFFFIIFFFFFFB#0B#07<FFFIFFFFIFBFFIFFFFFFFFBFF<BB<BFFFFB<BBBBBFBFFB<BB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 2:N:0:TGACCA
    AGTCATTCTCATAATCGCCCACGGGCTTACATCNTCNTTACTATTCTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCA
    +
    FFFIIFFFIIFIIFFBFBFFFIIIIFFFIFFFF#0<#07<BBFFFBBFBFFBBFFFFFBFFFFFFFFFFFFFBBBBFFBFFBBBFBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1409:2234 2:N:0:TGACCA
    ATCTCAGAAAAGAAGACATGGAATATGCCCTGNNTANACTGGATGACACCAAATTCCGCTCTCATGAGGGTGAAACTTCCTACATCCGA
    +
    <BFFFIFFIIIBBFFBFBBFFFFF7FFFFFII##07#07BFFBFFBFFFIFFFBF7BBFFBBBBBBB<BB0<B<'7<BBBBBBBBBBB<
    Thanks in advance for any help!

    -Jeremy

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
158 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:06 AM
0 responses
56 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 08:03 AM
0 responses
48 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2024, 07:36 AM
0 responses
76 views
0 likes
Last Post seqadmin  
Working...
X