Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Novoalign alignment script help (perl)

    Hello
    Can anyone help with a perl script I have written to align multiple fastq files using novoalign. I am having trouble declaring the sample name into the script, so that once the alignments are complete, it is output with the correct sample name
    So far I have this:

    #!/usr/bin/perl
    use strict;

    my $sample;
    chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
    chomp (my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

    while ($#fastq_R1list>0) {

    my $fastq1=pop(@fastq_R1list);
    my $fastq2=pop(@fastq_R2list);

    if (! -e "$fastq1"){die "no fastq.gz file found for $fastq1\n";}
    if (! -e "$fastq2"){die "no fastq.gz file found for $fastq2\n";}

    my $sam="\$\'\@RG\\tID:${sample}\\tSM:${sample}\\tPL:ILLUMINA\'";

    system ( "/novoalign -t 100 -H -F ILM1.8 -g 65 -x 7 -o FullNW -c 1 -o SAM $sam -f ./trimmed/${sample}.fastq1.trimmed.gz ./trimmed/${sample}.fastq2.trimmed.gz -d /reference_genomes/novo/human_g1k_v37.k15.s2.novoindex -k -K ./trimmed/${sample}_novocalibration > /dev/null" );
    system ( "/novoalign -t 100 -H -F ILM1.8 -g 65 -x 7 -o FullNW -c 1 -o SAM $sam -f ./trimmed/${sample}.fastq1.trimmed.gz ./trimmed/${sample}.fastq2.trimmed.gz -d /reference_genomes/novo/human_g1k_v37.k15.s2.novoindex -k ./trimmed/${sample}_novocalibration > ./trimmed/${sample}.sam" );
    system ("rm ./trimmed/${sample}.fastq1.trimmed.gz ./trimmed/${sample}.fastq2.trimmed.gz ./trimmed/${sample}_novocalibration" );

    }
    #./trimmed is the output folder

    No sam files are being output even though the script runs. It's as though the sample names are not being declared but I though I did that with:
    my $sample;
    chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
    chomp (my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

    I am pretty new to perl! Any help is appreciated.

    Thanks
    Vanisha

  • #2
    Hi Vanisha,

    I don't know if this solves your problem, but it looks like you declared the variable $sample but did not attach a name to it. So $sample is still empty.

    Regards,
    Boetsie

    Comment


    • #3
      Hi Boetsie

      I have a sample list relating to $sample in the command:
      chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);

      When I do:
      chomp (my $sample = `ls ./sample`);

      it still doesn't identify the sample list to input into ${sample}

      Comment


      • #4
        Since you are using relative paths, are you running your perl script from a directory that is immediately above the "300bp_fastq" directory?

        As "boetsie" pointed out above there is nothing attached to $sample. Are you actually getting a listing if you run "ls ./300bp_fastq/${sample}*R1_001.fastq.gz" on the shell prompt?
        Last edited by GenoMax; 02-07-2013, 04:37 AM.

        Comment


        • #5
          yes, this is the output from the command line:

          ls ./300bp_fastq/${sample}*R1_001.fastq.gz | head
          ./300bp_fastq/sample1.fastq.gz
          ./300bp_fastq/sample2.fastq.gz
          ./300bp_fastq/sample3.fastq.gz
          ./300bp_fastq/sample4.fastq.gz
          etc..

          and I am running the script from directory above 300bp_fastq.

          Do I need to use use a loop with the sample list, as I have done with the fastq files?
          Last edited by Vanisha; 02-07-2013, 05:03 AM.

          Comment


          • #6
            chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
            chomp (my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

            Remove "chomp" and try the run.

            Comment


            • #7
              this is the start of the script now when removing chomp and declaring $sample

              my @sample= `ls ./300bp_fastq/*.gz`);
              my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
              my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

              I get these errors:
              syntax error at TrimAlignFastq.pl line 18, near "`ls ./300bp_fastq/*.gz`)"
              Global symbol "$sample" requires explicit package name at TrimAlignFastq.pl line 19.
              syntax error at TrimAlignFastq.pl line 19, near "`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`)"
              Global symbol "$sample" requires explicit package name at TrimAlignFastq.pl line 20.
              syntax error at TrimAlignFastq.pl line 20, near "`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`)"

              Comment


              • #8
                You could just read in the file './samples' (assuming you have the names of the samples in here on each line) and go through each line:

                open(IN,samples);
                while(my $sample = <IN>){
                chomp $sample;
                print "reading sample $sample...\n"

                my $file1 = "./300bp_fastq/${sample}*R1_001.fastq.gz" ;

                my $file2 = "./300bp_fastq/${sample}*R2_001.fastq.gz" ;
                etc...
                }

                Comment


                • #9
                  Or do something like:

                  open(IN, 'ls ./300bp_fastq/*.gz |');
                  while(my $sample = <IN>){
                  chomp $sample;
                  print "sample = $sample\n";
                  #etc...

                  }

                  Regards,
                  Boetsie

                  Comment


                  • #10
                    That's great - so I have read the samples in, but i now cannot specify read 1 and read 2:

                    CODE:
                    open(IN, 'ls ./300bp_fastq/*.gz |');
                    while(my $sample = <IN>){
                    chomp $sample;
                    print "sample = $sample\n";

                    my $fastq1 = "./300bp_fastq/${sample}*R1_00*.fastq.gz";
                    my $fastq2 = "./300bp_fastq/${sample}*R2_00*.fastq.gz";

                    OUTPUT:
                    sample = ./300bp_fastq/sample-1511075418_S97_L001_R1_001.fastq.gz
                    no fastq.gz file found for ./300bp_fastq/./300bp_fastq/sample-1511075418_S97_L001_R1_001.fastq.gz*R1_00*.fastq.gz


                    the output is the same even when I change the command to:
                    my $fastq1 = "./300bp_fastq/*R1_00*.fastq.gz";

                    Comment


                    • #11
                      Generally R1 and R2 would be listed one after the other. It will work in this case but you may want to explicitly match the R1/R2 in the name by pattern search just to be sure.

                      It looks like you are already picking up the full path in the $sample looking at the OUTPUT in post #10
                      OUTPUT:
                      sample = ./300bp_fastq/sample-1511075418_S97_L001_R1_001.fastq.gz
                      so you do not need to add the extra "./300bp_fastq/".
                      Last edited by GenoMax; 02-07-2013, 07:07 AM.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM
                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 05-07-2024, 06:57 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-06-2024, 07:17 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-02-2024, 08:06 AM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-30-2024, 12:17 PM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X