I want to use galaxy's fastq_filter tool on the command line.
Basically, I already know what the inputs are required by fastq_filter.py, but not sure how to generate two of them.
After you read the python and xml file, you learn that it is expecting us to run a line something like this:
The fastq_filter.ply is interesting. In it it has something like
Is that python? Is this how the xml turns user input into a filter script? I had someone suggest I use the galaxy api for this, but that might be just as much work to get set up as getting this script to run? I'm not opposed to it, but I want to the easy way out because this is the last galaxy tool I have to run in my analysis I think before I move on to other things.
Any help and assistance would be appreciated.
Basically, I already know what the inputs are required by fastq_filter.py, but not sure how to generate two of them.
After you read the python and xml file, you learn that it is expecting us to run a line something like this:
Code:
fastq_filter.py $input_file $fastq_filter_file $output_file $output_file.files_path '${input_file.extension[len( 'fastq' ):]}'
- $input_file
- $fastq_filter_file I don't know how to make this
- $output_file
- $output_file.files_path I don't know what this is or how to avoid it
- ${input_file.extension[len( 'fastq' ):]} Seems to be type check input file type ? Not going to worry about this for now
The fastq_filter.ply is interesting. In it it has something like
Code:
def fastq_read_pass_filter( fastq_read ): def mean( score_list ): return float( sum( score_list ) ) / float( len( score_list ) ) if len( fastq_read ) < $min_size: return False if $max_size > 0 and len( fastq_read ) > $max_size: return False num_deviates = $max_num_deviants qual_scores = fastq_read.get_decimal_quality_scores() for qual_score in qual_scores: if qual_score < $min_quality or ( $max_quality > 0 and qual_score > $max_quality ): if num_deviates == 0: return False else: num_deviates -= 1 #if not $paired_end: qual_scores_split = [ qual_scores ] #else: qual_scores_split = [ qual_scores[ 0:int( len( qual_scores ) / 2 ) ], qual_scores[ int( len( qual_scores ) / 2 ): ] ] #end if #for $fastq_filter in $fastq_filters: for split_scores in qual_scores_split: left_column_offset = $fastq_filter[ 'offset_type' ][ 'left_column_offset' ] right_column_offset = $fastq_filter[ 'offset_type' ][ 'right_column_offset' ] #if $fastq_filter[ 'offset_type' ]['base_offset_type'] == 'offsets_percent': left_column_offset = int( round( float( left_column_offset ) / 100.0 * float( len( split_scores ) ) ) ) right_column_offset = int( round( float( right_column_offset ) / 100.0 * float( len( split_scores ) ) ) ) #end if if right_column_offset > 0: split_scores = split_scores[ left_column_offset:-right_column_offset] else: split_scores = split_scores[ left_column_offset:] if split_scores: ##if a read doesn't have enough columns, it passes by default if not ( ${fastq_filter[ 'score_operation' ]}( split_scores ) $fastq_filter[ 'score_comparison' ] $fastq_filter[ 'score' ] ): return False #end for return True
Any help and assistance would be appreciated.
Comment