Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting alignments

    I am using bowtie/tophat (allowing multimaps 10) to map my RNA-seq data.
    I did cuffdiff and I am trying other methods such as DESEQ and edgeR etc..
    The remaining methods need count data.
    Therefore, I tried HTSEQ. Before doing HTSEQ I sorted my file by query name using picard tools Sortsam.jar.
    The problem I am running into is after sorting my bam file I see reads with no names sometimes and I believe HTSEQ is giving me an error because of this:

    Error occured when processing SAM input (record #109 in file /Users/mparida/Oiko_Oceanic_Acid_stuff/dnacore454.healthcare.uiowa.edu/Project_Manak_code_RNAseq/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED_QUERY_NAME.bam):
    'pair_alignments' needs a sequence of paired-end alignments
    [Exception type: ValueError, raised in __init__.py:603]

    I am not sure what these alignments are and when I look at the alignment it looks the folllowing:
    89 scaffold_10 477964 50 88M * 0 0 AGACTTTTAACGGACTTTAACTCGAAATGGCCCGCAACAGCACCGTCAATGNCGTCGCCAAGTACTCTCGANCGAAGATGTATTCTCG CDCDDDDDDDDDDDDDECDCDDCEEDDBDDDDDDDDDDDDDDFFFHHHA;-#JJJIIIIIGHD?IHHFF?1#JJJJJJIIGJJJJHHH AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:51G19T16 YT:Z:UU NH:i:1

    Any insight into what these alignments might be and why they have no names in my alignment file.
    I can't even find this read from my raw data file.
    Rocky

  • #2
    My guess would be that you have some orphan alignments in the BAM files (i.e., you have some reads marked as being paired-end but lacking a mate in the file). HTSeq-count has a new option ("-r pos") that allows you to input a file that's coordinate sorted. Also, there's featureCounts, which I don't think has ever required name-sorted input (it's also faster than htseq-count).

    Comment


    • #3
      Hi dpryan
      Thanks for you reply. You may be right about the orphaned alignment. It still doesn't answer where does that read comes from. I couldn't find a read such as the above from my raw data file. Also as suggested I tried the following:
      a) python ~/Software/HTSeq-0.6.1/scripts/htseq-count -f bam -t gene -s no -a 20 $CONTROL1_1 $ANNOTATION
      Error occured when processing SAM input (record #109 in file /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED_QUERY_NAME.bam):
      'pair_alignments' needs a sequence of paired-end alignments

      b) python ~/Software/HTSeq-0.6.1/scripts/htseq-count -f bam -t gene -r pos -s no -a 20 $CONTROL1_1 $ANNOTATION
      Error:Error occured when processing SAM input (record #22 in file /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam)
      Sequence of paired-end alignments expected, but got single-end alignment.

      Do you suggest removing those orphaned alignments from my analysis that have no read IDs? Or is this another issue?

      Once again thanks a ton for your time and replies.
      Last edited by mparida85; 03-19-2014, 12:41 PM.

      Comment


      • #4
        What's the output if you run:

        Code:
        samtools view -cF 1 /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam
        htseq-count generally deals well with the orphaned reads (it just issues a warning, since how these should be handled is a bit ambiguous in the specification), but I wonder if mixing paired and single-end data in one file leads to this sort of error (in fact, looking through the source code suggests that that's the only circumstance that can lead to this).

        Comment


        • #5
          samtools view -cF 1 /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam = 524336
          Yes I merged both my PE and SE files together to create /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam

          Comment


          • #6
            htseq-count doesn't seem to handle that case. Perhaps featureCounts will, I've never tried. Otherwise, just count the paired-end and single-end separately.

            Comment


            • #7
              I was trying to process some paired-end RNASeq data and attempted to use the latest version of htseq (0.6.1). For some reason, it doesn't make it past a certain record complaining about expecting paired-end alignments (as your error suggests)

              .....
              .....
              .....
              52700000 SAM alignment record pairs processed.
              52800000 SAM alignment record pairs processed.
              52900000 SAM alignment record pairs processed.
              53000000 SAM alignment record pairs processed.
              Error occured when processing SAM input (line 53063638):
              'pair_alignments' needs a sequence of paired-end alignments
              [Exception type: ValueError, raised in __init__.py:603]

              What's annoying is that there are a little more than 53 million records in this alignment file which means it almost finished running. I don't see why htseq can't support reads with unmapped mates (singletons), as you proposed (I did get one warning while it was running for a particular instance where a mate was not found: "Warning: Read DGL9ZZQ1:503:C28YCACXX:1:1101:1213:73341 claims to have an aligned mate which could not be found in an adjacent line.").


              Looking at the SAM file line which threw the error - I don't see anything especially wrong :

              "samtools view accepted_hits.bam | sed '53063638q;d'

              DGL9ZZQ1:503:C28YCACXX:5:2104:4502:43742 161 Scaffold9997 1534367 50 101M Scaffold9970 1335931 0 CTCTGTTTCAGTAAATTCACATTCCTTTGACAAGATCACCGTCCATTAAATGCAACACTACAGCCCTTCATGTCTGGATTAGCACTTGCATAGACTTCGTA CCCFFDFFHHHHHJGHIJJJJIJIIJJJJJJIJJJJIHJJJIJJIIJJJJIJJJIJJJJJJJJJJJJJJIJJJJJJHHFHHHFFFFFEEEECEDDDDDDDC AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:98A2 YT:Z:UU NH:i:1"


              I have no idea how to get around this and have never come across this issue, having run htseq-count several times in the past

              Any help would be greatly appreciated

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:35 AM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 02:46 PM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Working...
              X