Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • thh32
    Member
    • Feb 2014
    • 60

    How to iterate over reads in file with HTSeq?

    I am currently attempting to write a program in Python (v2.7.6) and am using HTSeq to read in fastq files and then run my code on it. However my issue is that my code runs on the first read from each input file (pair end data so 2 input files) and then stops.

    Does anyone know how to run code on read1 and iterate to read 2 etc?

    Code is :

    HTML Code:
    # Import functions
    import HTSeq
    import itertools
    import numpy
    from matplotlib import pyplot
    from itertools import groupby
    import operator
    
    #Open files
    output_file= open('output.fq', "w")
    Unmatched1= open ('Unmatched1.fq', "w")
    Unmatched2= open ('Unmatched2.fq', "w")
    
    # Counts lines in input file as 'count' for future use, if needed
    print "%d lines in your choosen file" % len(open("Real_test_1").readlines())
    count = '%d'  % len(open("Real_test_1").readlines())
    print "Reads below"
    print count
    
    # Reads in files for use by HTSeq
    fastq_file1 = HTSeq.FastqReader( "Real_test_1", "phred")
    fastq_file2 = HTSeq.FastqReader( "Real_test_2", "phred")
    
    
    #Get rev_comp and rename seq1
    for read in fastq_file2:
    	rc2o = read
    	for read in fastq_file2:                                          
    		rc2=read.get_reverse_complement()                             
    		print rc2                                                     
    
    		for read in fastq_file1:
    			rc1o = read
    		for read in fastq_file1:
    		    rc1 = read
    		    print rc1
    
    		#Reverse seq2 again so can be matched
    		rc2w = rc2[::-1]
    
    		rc1u = rc1
    
    
    
    		while len(rc1u) > 20:
    		    slide_merge(rc1u, rc2w)
    		    rc1u = rc1u[1:]
    
    		merging
    
    
    		max(merging.iteritems(), key=operator.itemgetter(1))[0]
    
    		highest = max(merging.iteritems(), key=operator.itemgetter(1))[0]
    
    		highest
    
    		len(highest)
    
    		remove = len(highest)
    
    		if remove > 8:
    			rc1r = rc1[:-remove]
    			rc3 = rc1r+rc2w
    			rc3.write_to_fastq_file(output_file)
    		else:
    			rc1o.write_to_fastq_file(Unmatched1)
    			rc2o.write_to_fastq_file(Unmatched2)

    I tried putting aload of the code in a for loop however all that happened was it ran 1000 times due to that being how many lines were in each input file but ran on the first line each time and iterate through the reads.

    Code:
    HTML Code:
    for read in fastq_file2:
    	rc2o = read
    	for read in fastq_file2:                                          
    		rc2=read.get_reverse_complement()                             
    		print rc2                                                     
    
    		for read in fastq_file1:
    			rc1o = read
    		for read in fastq_file1:
    		    rc1 = read
    		    print rc1
    
    		#Reverse seq2 again so can be matched
    		rc2w = rc2[::-1]
    
    		rc1u = rc1
    
    
    
    		while len(rc1u) > 20:
    		    slide_merge(rc1u, rc2w)
    		    rc1u = rc1u[1:]
    
    		merging
    
    
    		max(merging.iteritems(), key=operator.itemgetter(1))[0]
    
    		highest = max(merging.iteritems(), key=operator.itemgetter(1))[0]
    
    		highest
    
    		len(highest)
    
    		remove = len(highest)
    
    		if remove > 8:
    			rc1r = rc1[:-remove]
    			rc3 = rc1r+rc2w
    			rc3.write_to_fastq_file(output_file)
    		else:
    			rc1o.write_to_fastq_file(Unmatched1)
    			rc2o.write_to_fastq_file(Unmatched2)
    Any help would be greatly appreciated.

    Thanks,
    Tom
  • wolma
    Member
    • May 2014
    • 23

    #2
    ouff,
    as a start, replace your way too many for loops with:

    for read1, read2 in itertools.izip(fastq_file1, fastq_file2):

    then do all your processing inside this one loop.
    Good luck,
    Wolfgang

    Comment

    • gringer
      David Eccles (gringer)
      • May 2011
      • 845

      #3
      Can you give a rough idea of what your code is supposed to do? I'm used to using HTSeq to iterate over mapped BAM files or GTF files, not FASTQ files, so can't quite work out how HTSeq would be useful for you here. My guess based on a rough look at the code is a paired-end read merger, but the lines with only "empty" and "merging" (not commented) are very confusing -- I'm guessing they're for debugging purposes.

      I've got a few comments about the code syntax, which may or may not affect the outcome:
      • reading entire files into memory just to count lines [len(open("Real_test_1").readlines())] is a little inefficient, and will make your code much slower than it needs to be.
      • when using loops, never use the same variable for an inner loop as used for an outer loop [you have "read" used in a loop and its child, and then again in a couple more loops].
      Last edited by gringer; 06-07-2014, 04:32 AM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        New Genomics Tools and Methods Shared at AGBT 2025
        by seqadmin


        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

        The Headliner
        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
        03-03-2025, 01:39 PM
      • seqadmin
        Investigating the Gut Microbiome Through Diet and Spatial Biology
        by seqadmin




        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
        02-24-2025, 06:31 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-20-2025, 05:03 AM
      0 responses
      17 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-19-2025, 07:27 AM
      0 responses
      18 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-18-2025, 12:50 PM
      0 responses
      19 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-03-2025, 01:15 PM
      0 responses
      185 views
      0 reactions
      Last Post seqadmin  
      Working...