Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • travelk
    Member
    • Jul 2013
    • 20

    Removing contaminating reads

    Hey all,

    I have a very basic question that I just cannot seem to find a straight answer to despite scouring Google and SEQanswers.

    When I ran my dataset through Fastq_screen I discovered I have E. coli contamination. I ran my files in Tophat2 against the E.coli genome and determined that about 5% of my reads positively map to E. coli.

    I would like to remove these reads but I cannot figure out how.

    Most forum posts suggest to remove them but do not specify the exact method to do that. My initial thought was to take my unmapped.bam file from my E.coli alignment and use that as my "clean" data but it means converting the data back to fastq in paired reads format and adding various processing steps so I'm not really sure what is left in that file. Naturally, I'd like to lose as little data as possible.

    Is there a simple way to just take my original file fastq file, extract the E.coli+ reads and have a clean data set?

    Or alternatively, does having a contamination like that not really matter because they won't map to the mouse genome anyway?

    Obviously I'm pretty new to this and teaching myself so any advice, however simple it may seem, is much appreciated.

    Thanks for your help!
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Just align to the E coli genome with bowtie2, which can be told to write unmapped reads in fastq format. You can then directly align the resulting fastq files.

    Comment

    • a.kmg
      Member
      • Aug 2014
      • 15

      #3
      You can run Bowtie2 with your fastq file on E.coli to suppress reads corresponding to E.coli :

      bowtie2 -U rawFastq -p nbproc --un fasqtFileWithoutEcoli indexEcoli -S ecoli_reads.sam
      This command treats the raw fastq file, creating a new fastq containing not E.coli reads (--un = unmapped reads) and recovering reads aligned on E. coli in a sam file (-S option). Delete -S option if you do not want to get the E.coli reads.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        BBMap example (remain in the same directory or adjust paths accordingly). Will work on PC/Mac/*nix.

        1. Get Ecoli genome fasta file.

        2. Build an index for Ecoli genome using BBMap.

        Code:
        $ bbmap.sh ref=./E_coli_genome.fa
        3. Align against the Ecoli genome index saving reads that don't align to a new file (these are the reads you want, the outu file below)

        Code:
        $ bbmap.sh in=your_fastq_file path=./ outm=E_coli_reads.fastq outu=reads_you_want.fastq qin=33

        Comment

        • travelk
          Member
          • Jul 2013
          • 20

          #5
          Thank you! This totally did the trick. I processed the files and then reran them through fastq_screen and all the E.coli reads were gone.

          And thank you a.kmg and GenoMax for your clear step by step instructions.

          I ended up using bowtie2 because that it what I've been working with so far. I initially got an error message but realized I was missing the -x in the command line. So, in the end I used:

          bowtie2 -U raw.fastq -p nbproc --un FileWithoutEcoli.fastq -x indexEcoli -S ecoli_reads.sam

          Comment

          • nareshvasani
            Member
            • Apr 2013
            • 57

            #6
            Hi fellow,

            I am trying to find rRNA gene from my input file:

            Step1: Create Index

            #bowtie2-build rRNA.fasta rRNA.index

            Step 2: Align to rRNA index inorder to get rRNA free fastq file.

            #bowtie2-align -p 2 -k 1 -q -U /filter_clean.fastq --un fasqFileWithoutrRNA -x rRNA.index

            When I run second step, it comes with error saying:

            " bowtie2-align: option '--un' is ambiguous; possibilities: '--ungapped' '--unpaired' "

            So I replace --un with --unpaired, But it is not working.

            Can anyone please shed some light on this.

            I would really appreciate your help.

            Thanks,
            Naresh

            Comment

            • jpnm
              Junior Member
              • May 2013
              • 1

              #7
              As far as I know "--un" is not an option of bowtie2-align...but only of bowtie2!

              So if you run, it should work fine!

              bowtie2 -p 2 -k 1 -q -U /filter_clean.fastq --un fasqFileWithoutrRNA -x rRNA.index

              I hope it helps!

              Comment

              • nareshvasani
                Member
                • Apr 2013
                • 57

                #8
                Originally posted by jpnm View Post
                As far as I know "--un" is not an option of bowtie2-align...but only of bowtie2!

                So if you run, it should work fine!

                bowtie2 -p 2 -k 1 -q -U /filter_clean.fastq --un fasqFileWithoutrRNA -x rRNA.index

                I hope it helps!
                Hi jpnm,

                Thanks for your input, I am trying you suggestion. Will keep you posted.


                Naresh

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  03-24-2025, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM
                • seqadmin
                  Investigating the Gut Microbiome Through Diet and Spatial Biology
                  by seqadmin




                  The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                  02-24-2025, 06:31 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                41 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                46 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                36 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                191 views
                0 reactions
                Last Post seqadmin  
                Working...