Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Vanisha
    Member
    • Nov 2009
    • 21

    Novoalign alignment script help (perl)

    Hello
    Can anyone help with a perl script I have written to align multiple fastq files using novoalign. I am having trouble declaring the sample name into the script, so that once the alignments are complete, it is output with the correct sample name
    So far I have this:

    #!/usr/bin/perl
    use strict;

    my $sample;
    chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
    chomp (my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

    while ($#fastq_R1list>0) {

    my $fastq1=pop(@fastq_R1list);
    my $fastq2=pop(@fastq_R2list);

    if (! -e "$fastq1"){die "no fastq.gz file found for $fastq1\n";}
    if (! -e "$fastq2"){die "no fastq.gz file found for $fastq2\n";}

    my $sam="\$\'\@RG\\tID:${sample}\\tSM:${sample}\\tPL:ILLUMINA\'";

    system ( "/novoalign -t 100 -H -F ILM1.8 -g 65 -x 7 -o FullNW -c 1 -o SAM $sam -f ./trimmed/${sample}.fastq1.trimmed.gz ./trimmed/${sample}.fastq2.trimmed.gz -d /reference_genomes/novo/human_g1k_v37.k15.s2.novoindex -k -K ./trimmed/${sample}_novocalibration > /dev/null" );
    system ( "/novoalign -t 100 -H -F ILM1.8 -g 65 -x 7 -o FullNW -c 1 -o SAM $sam -f ./trimmed/${sample}.fastq1.trimmed.gz ./trimmed/${sample}.fastq2.trimmed.gz -d /reference_genomes/novo/human_g1k_v37.k15.s2.novoindex -k ./trimmed/${sample}_novocalibration > ./trimmed/${sample}.sam" );
    system ("rm ./trimmed/${sample}.fastq1.trimmed.gz ./trimmed/${sample}.fastq2.trimmed.gz ./trimmed/${sample}_novocalibration" );

    }
    #./trimmed is the output folder

    No sam files are being output even though the script runs. It's as though the sample names are not being declared but I though I did that with:
    my $sample;
    chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
    chomp (my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

    I am pretty new to perl! Any help is appreciated.

    Thanks
    Vanisha
  • boetsie
    Senior Member
    • Feb 2010
    • 245

    #2
    Hi Vanisha,

    I don't know if this solves your problem, but it looks like you declared the variable $sample but did not attach a name to it. So $sample is still empty.

    Regards,
    Boetsie

    Comment

    • Vanisha
      Member
      • Nov 2009
      • 21

      #3
      Hi Boetsie

      I have a sample list relating to $sample in the command:
      chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);

      When I do:
      chomp (my $sample = `ls ./sample`);

      it still doesn't identify the sample list to input into ${sample}

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Since you are using relative paths, are you running your perl script from a directory that is immediately above the "300bp_fastq" directory?

        As "boetsie" pointed out above there is nothing attached to $sample. Are you actually getting a listing if you run "ls ./300bp_fastq/${sample}*R1_001.fastq.gz" on the shell prompt?
        Last edited by GenoMax; 02-07-2013, 04:37 AM.

        Comment

        • Vanisha
          Member
          • Nov 2009
          • 21

          #5
          yes, this is the output from the command line:

          ls ./300bp_fastq/${sample}*R1_001.fastq.gz | head
          ./300bp_fastq/sample1.fastq.gz
          ./300bp_fastq/sample2.fastq.gz
          ./300bp_fastq/sample3.fastq.gz
          ./300bp_fastq/sample4.fastq.gz
          etc..

          and I am running the script from directory above 300bp_fastq.

          Do I need to use use a loop with the sample list, as I have done with the fastq files?
          Last edited by Vanisha; 02-07-2013, 05:03 AM.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            chomp (my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
            chomp (my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

            Remove "chomp" and try the run.

            Comment

            • Vanisha
              Member
              • Nov 2009
              • 21

              #7
              this is the start of the script now when removing chomp and declaring $sample

              my @sample= `ls ./300bp_fastq/*.gz`);
              my @fastq_R1list=`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`);
              my @fastq_R2list=`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`);

              I get these errors:
              syntax error at TrimAlignFastq.pl line 18, near "`ls ./300bp_fastq/*.gz`)"
              Global symbol "$sample" requires explicit package name at TrimAlignFastq.pl line 19.
              syntax error at TrimAlignFastq.pl line 19, near "`ls ./300bp_fastq/${sample}*R1_001.fastq.gz`)"
              Global symbol "$sample" requires explicit package name at TrimAlignFastq.pl line 20.
              syntax error at TrimAlignFastq.pl line 20, near "`ls ./300bp_fastq/${sample}*R2_001.fastq.gz`)"

              Comment

              • boetsie
                Senior Member
                • Feb 2010
                • 245

                #8
                You could just read in the file './samples' (assuming you have the names of the samples in here on each line) and go through each line:

                open(IN,samples);
                while(my $sample = <IN>){
                chomp $sample;
                print "reading sample $sample...\n"

                my $file1 = "./300bp_fastq/${sample}*R1_001.fastq.gz" ;

                my $file2 = "./300bp_fastq/${sample}*R2_001.fastq.gz" ;
                etc...
                }

                Comment

                • boetsie
                  Senior Member
                  • Feb 2010
                  • 245

                  #9
                  Or do something like:

                  open(IN, 'ls ./300bp_fastq/*.gz |');
                  while(my $sample = <IN>){
                  chomp $sample;
                  print "sample = $sample\n";
                  #etc...

                  }

                  Regards,
                  Boetsie

                  Comment

                  • Vanisha
                    Member
                    • Nov 2009
                    • 21

                    #10
                    That's great - so I have read the samples in, but i now cannot specify read 1 and read 2:

                    CODE:
                    open(IN, 'ls ./300bp_fastq/*.gz |');
                    while(my $sample = <IN>){
                    chomp $sample;
                    print "sample = $sample\n";

                    my $fastq1 = "./300bp_fastq/${sample}*R1_00*.fastq.gz";
                    my $fastq2 = "./300bp_fastq/${sample}*R2_00*.fastq.gz";

                    OUTPUT:
                    sample = ./300bp_fastq/sample-1511075418_S97_L001_R1_001.fastq.gz
                    no fastq.gz file found for ./300bp_fastq/./300bp_fastq/sample-1511075418_S97_L001_R1_001.fastq.gz*R1_00*.fastq.gz


                    the output is the same even when I change the command to:
                    my $fastq1 = "./300bp_fastq/*R1_00*.fastq.gz";

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      Generally R1 and R2 would be listed one after the other. It will work in this case but you may want to explicitly match the R1/R2 in the name by pattern search just to be sure.

                      It looks like you are already picking up the full path in the $sample looking at the OUTPUT in post #10
                      OUTPUT:
                      sample = ./300bp_fastq/sample-1511075418_S97_L001_R1_001.fastq.gz
                      so you do not need to add the extra "./300bp_fastq/".
                      Last edited by GenoMax; 02-07-2013, 07:07 AM.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                        Here are nine questions we think about, in roughly the order they matter, before...
                        Yesterday, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      20 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      38 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      45 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...