Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie error

    Hello everybody,

    I'm new here, and already have a question. Lately, we have performed some bacterial RNA seq on Illumina HiSeq. I got my data and went for alignment using Bowtie2.

    Honestly, I've never touched Linux before; however, I downloaded Ubundu, and started from scratch. One week later, I was able to run test alignments and everything seemed to work perfectly. So, few hours ago, I performed full alignment of my first replicate and things didn't go that good.

    Here is my command line:

    bowtie2 -q -t -p 6 -D 20 -x IndexFile -U FastqFile -S OutputFile

    It took 2 hours to align on 8Go RAM and 6core desktop and it generated an output file of 6Gb. However, an error was associated with the alignment:

    Error: Read HWI-ST766:125:COPRMACXX:6:1306:11865:38045 1:N:0:ATNACG has more quality values than read characters!

    I tried to look for posts about this error here but found nothing. Tried to google it, same thing. Anybody knows what this error is about? Can I still trust my alignment?

    Thanks a lot people!

  • #2
    Oh yeah, another thing: I'm not sure if I had to trim 5' or 3' ends of my reads. In fact, we performed signle reads 50pb RNA-seq and it wasn't directional (so I guess same adaptors on 5' and 3'). For the alignment, I took directly my sequence files that I obtained from our Seq Department, so I'm not sure if the trimming has been done or I have to do it? Perhaps, thats the problem...


    • #3
      I'm no expert, but I think the first thing you should do is find out to what degree your sequencing facility processes the reads before they're handed over to you. Regardless of what they do, I think it's probably a good idea to take a look at the quality scores with something like FastQC and trim bad quality or adapter contamination with something like AdaptCut before aligning.

      Now, as for your specific problem, you should be able to take a look at the fastq file that your reads are in and see what's going on with this one particular read that's giving you an error. (Look up the grep command if you're not sure how to do that.) If it's just one read causing a problem, and you can't easily see what the issue is, I would just delete that read.


      • #4
        JChase, thanks a lot for your answer. Indeed, I will check with my sequencing facility about their output file just to be sure how do they handle it. I've already performed a FastQC and it seems okay to me. I'll try to check for trimming my sequences.

        Finaly, I do agree that the easiest way of dealing with this problem is to delete that particular read from the Fastq file. However, the Fastq file is like 3Gb, so it takes hours before it opens with gedit and mostly it crushes before it actually finish opening. Is there a simple way to search within a file without actually opening it?

        In the other way, I tried to ignore this error and just continue with SAM to BAM and sorting, but got another error... I'll post another thread for that one.

        Thanks again!


        • #5
          I would use grep -n to get the line number ( and then use sed to delete those linenumbers from the file (; be sure you delete the line that the read name is on as well as the lines associated with it, because fastq reads are not on a single line. Sed can be slow on big files, because it will go through the entire file, but it's better than trying to edit the file manually.


          • #6
            I had to delete some lines from a file once like this. I used the sed command. See this thread:


            • #7
              Thank you so much guys. I'm working on it right now, hope it will work!



              • #8
                It worked perfectly. Commands:

                grep -n STRING_TO_SEARCH *file (search for a line)

                and after you go:

                sed -i 2,+3d *file: it will go to the second line and from there delete 3 other lines (in this exemple, it will delete line 3,4 and 5)!

                Thanks again!


                Latest Articles


                • seqadmin
                  Recent Developments in Metagenomics
                  by seqadmin

                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                  09-23-2024, 06:35 AM
                • seqadmin
                  Understanding Genetic Influence on Infectious Disease
                  by seqadmin

                  During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                  Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                  09-09-2024, 10:59 AM





                Topics Statistics Last Post
                Started by seqadmin, 10-02-2024, 04:51 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 10-01-2024, 07:10 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 09-30-2024, 08:33 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 09-26-2024, 12:57 PM
                0 responses
                Last Post seqadmin  