Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inputting fastq files into Tophat2 without info on seq platform type

    I'm trying to use Tophat2 in galaxy to map paired reads, but the drop key for selecting the files doesn't recognize any files imported (files look fine using fastqc). It only recognizes them after running fastq groomer.

    I don't know the platform used to sequence these, so I don't know what to use for running fastq groomer. I tried illumina 1.3-1.7 and then separately sanger/illumina. But then when I used tophat to map either sets of files, the mapping results were terrible. For illumina1.3, it gave me 94.7 discordant alignments. For sanger/illumina, it gave me 0% mapped reads. I'm assuming the problem is the file type I'm converting? The data have been used before for RNA seq DGE analysis, so I'm assuming they're fine.

    My question: how can I know from the original fastq file what to put for the fastq groomer? Or: any helpful information.

    Original fastq files (top line):
    GWZHISEQ02:321YMKACXX:4:1101:1856:1996 1:N:0:ATCACG
    CACGATGATGGCCTTCGACGGCAAGTACGACTTCCCCCTGGACATCAGCGA
    +
    @@CFDDFFHHHHHJJHJIIIJDIJJDGHIIJJJIJJJJJIJIJJJGJJJHH

  • #2
    Those are Illumina reads, and could be either ASCII-64 (old Illumina) or ASCII-33 (Sanger) format; most likely ASCII-64 but I can't tell from that read. It may be possible if you post some more reads (particularly if you can find a read with an 'N' base call).

    Comment


    • #3
      Here's one with several Ns:

      @GWZHISEQ02:321YMKACXX:5:1101:5470:1986 1:N:0:ATCACG
      CTGGATATCAATAATGCTCTCCNTAGGGATATTTCCCGCAAATTTGANNNN
      +
      CCCFFFFFHHHHHJJJJJJJJJ#3AGIJJJJJJJJJJJJJJJJJJJJ####

      Comment


      • #4
        That's strange, normally N should be Q0 (!) not Q2 (#), but it appears to be ASCII-33 (Sanger) data. I'm not sure why the reads are not mapping. You may want to BLAST some of them to a database like NT to make sure they come from the correct organism.

        Comment


        • #5
          You should not need to "groom" the data if they are already Sanger formatted. Just choose the "pencil" edit icon against the name of the dataset and manually set the data type to "fastqsanger" under "datatype" tab.

          You should do some QC/trimming though as that may be affecting your alignments.

          Comment


          • #6
            I'm wondering if maybe they need to be adapter-trimmed? They all failed Kmer in fastqc.

            Comment


            • #7
              In that case, probably yes! Though that's easiest to do if you know what kind of adapters were used.

              Comment


              • #8
                Originally posted by GenoMax View Post
                You should not need to "groom" the data if they are already Sanger formatted. Just choose the "pencil" edit icon against the name of the dataset and manually set the data type to "fastqsanger" under "datatype" tab.
                I changed the dataset type to fastqsanger, but Tophat2 and Trimmomatic are still not recognizing the files. I click the dropkey in either program (ex: RNA-Seq FASTQ file, forward reads) and there's nothing there.

                Edit: This is true for either paired-end (which is correct for my data) or single-end options.

                SOLUTION: I am dumb. Accidentally changed them to fastqCsanger
                Last edited by skmotay; 10-08-2014, 12:44 PM.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Choosing Between NGS and qPCR
                  by seqadmin



                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                  10-18-2024, 07:11 AM
                • seqadmin
                  Non-Coding RNA Research and Technologies
                  by seqadmin




                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                  Nobel Prize for MicroRNA Discovery
                  This week,...
                  10-07-2024, 08:07 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 11-01-2024, 06:09 AM
                0 responses
                15 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-30-2024, 05:31 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-24-2024, 06:58 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-23-2024, 08:43 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X