Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Un-align reads in BWA

    Hi All,
    I have RNA-seq data from human tissue samples. I want to align to RefSeq and then to UCSC and then to human genome in sequential order using BWA. But how can I store unalign sequences? For examples, unalign sequences from RefSeq reference alignment (which will be input sequences for UCSC)?
    Thank you so much for your help,
    Rakesh

  • #2
    Normally unaligned reads are stored in FASTQ, although you can also store unaligned reads in SAM/BAM which is supported by some alignment tools (including BWA).

    See also my blog post which might be of interest:
    I think it is time to retire the FASTQ file format in favour of storing unaligned reads in SAM/BAM format . I will try to explain, as thi...


    Are you asking in general how to go from aligned reads in SAM/BAM to unaligned reads?

    Comment


    • #3
      Hi Peter,

      Thanks for your detailed response and very interesting blog.

      Specifically, I want to align my RNA-seq data (in fastq format) using BWA with RefSeq reference database and at same time I want to store un-align reads (in fastq or SAM format) in separate file. As BWA does not have any option to store un-align reads. Can you please guide me?

      Thanks again for your kind help,

      Rakesh

      Comment


      • #4
        This is a perl script that will take your bam file and your original fastq file and generate an fastq file containing only your unaligned reads.

        One caveat, you can't have any unaligned reads in your bam file.

        I should also note that I did not write this script. If I remember correctly it was written by Simon Anders, the author of DESeq, DEXseq, and other useful things. I think he posted it on here sometime ago, but I have no idea which thread anymore.


        Code:
        #!/usr/bin/perl
        use warnings;
        use strict;
        
        my ($fastq,$sam,$outfile) = @ARGV;
        
        unless ($outfile) {
          die "Usage is filter_unmapped_reads.pl [FastQ file] [SAM File] [File for unmapped reads]\n";
        }
        
        if (-e $outfile) {
          die "Won't overwrite an existing file, delete it first!";
        }
        
        open (FASTQ,$fastq) or die "Can't open fastq file: $!";
        open (SAM,$sam) or die "Can't open SAM file: $!";
        open (OUT,'>',$outfile) or die "Can't write to $outfile: $!";
        
        my $ids = read_ids();
        
        filter_fastq($ids);
        
        close OUT or die "Can't write to $outfile: $!";
        
        
        sub filter_fastq {
        
          warn "Filtering FastQ file\n";
        
          my ($ids) = @_;
        
          while (<FASTQ>) {
        
            if (/^@(\S+)/) {
              my $id = $1;
        
              # Remove the end designator from paired end reads
              $id =~ s/\/\d+$//;
        
              my $seq = <FASTQ>;
              my $id2 = <FASTQ>;
              my $qual = <FASTQ>;
        
        
              unless (exists $ids->{$id}) {
        	print OUT $_,$seq,$id2,$qual;
              }
            }
            else {
              warn "Line '$_' should have been an id line, but wasn't\n";
            }
        
          }
        
        }
        
        
        sub read_ids {
        
          warn "Collecting mapped ids\n";
        
          my $ids;
        
          while (<SAM>) {
        
            next if (/^@/);
            my ($id) = split(/\t/);
            $ids->{$id} = 1;
          }
        
          return $ids;
        }
        To use:

        Code:
        $ perl unaligned_reads.pl <in.fastq> <in.bam> <out.fastq>

        Comment


        • #5
          Ah, here is the thread where Simon originally posted it:

          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

          Comment


          • #6
            Hi Chadn,

            Thank you so much for your detailed response.

            Best wishes,

            Rakesh

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 01:32 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            199 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            221 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            232 views
            0 likes
            Last Post seqadmin  
            Working...
            X