Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fanli
    replied
    Originally posted by emolinari View Post
    sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
    This sorts your alignment by genomic position. You want to sort by read name:

    sort -k 1,1 hits.sam > hits.sam.sorted

    Leave a comment:


  • dlepe
    replied
    Originally posted by apredeus View Post
    flags 0/16 should be enough to tell the strand.

    did you try to visualize your bam (or small portion of it) in IGV or something like that? Make sure you have a picture you would expect to see.
    As I stated in the post previous to this one, it looks ok to me..

    Leave a comment:


  • dlepe
    replied
    Originally posted by kmcarr View Post
    Are you positive that the method/kit used to prepare the library is not strand-specific? The Life Tech SOLiD™ Total RNA-Seq Kit does generate strand-specific libraries. How was your library prepared?
    I'm not sure how the library was prepared but I was told that it was not stranded and when I look at my mapping file I see about half of the reads aligned to each of the strands which is what you'd expect for a no stranded protocol..

    Leave a comment:


  • emolinari
    replied
    Hi everyone,

    sorry for sneaking in this thread, but I guess you can definitively help and maybe it is also an easy one (- I am not a bioinformatician...yet I am trying to do my best!)


    I am doing RNA-seq on human samples. I have run several samples on 2x75 Illumina HiSeq200, then processed raw reads with Tophat, run samtools to convert bam into sam files, sorted sam file and proceeded with Htseq. *flagstat results from samtools are all ok (excellent mapping and paired mapping rate).


    Here my Htseq commands:

    htseq-count -m HTSeq.scripts.count -m intersection-nonempty -s yes -i gene_id file.sam annotation.gtf > HTSeq_counts.txt 2> HTSeq_sterr.txt

    Although HTSeq count provides the .txt file with all the ensemble ids and relative counts, the sterror file is a huge (3-4Gb) text file full of warnings:
    Warning: Read HWI-ST1144:606:H9U7WADXX:1:2208:15447:12819 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

    So I figured out that maybe I had to sort the sam file. I did it with this command, which I found in other threads:

    sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted

    Now I run HTSeq again (same commands, sorted sam file as input) and the sterror file comes out gigantic as before.

    I really don't know how to proceed...
    1) is this error file a relatively acceptable thing?
    2) am I doing something terribly wrong which I am not aware of?

    I appreciate any help...
    thanks!!!
    Manu

    Leave a comment:


  • liaojinyue
    replied
    Error: 'pair_alignments' needs a sequence of paired-end alignments

    Hi Simon,

    I'm new to HTSeq and encountered an error when analyzing our stranded paired-end RNA-seq data.

    Let me describe the procedure of my analysis.
    1. Use cutadapt to trim adapters and trim by quality
    2. Use tophat2 to map to the mouse genome
    3. Sort the accepted_hits.bam by name with samtools
    4. Error reported when I was trying to count the read using htseq-count

    'pair_alignments' needs a sequence of paired-end alignments.

    I search the forum, this error is reported when combining paired-end and single end read accepted_hits.bam into one input file for htseq-count. I know htseq-count handles orphan reads from tophat output .bam file well. Is it possible cutadapt causes some problem in our case? Thanks.

    Jason

    Leave a comment:


  • kmcarr
    replied
    Originally posted by dlepe View Post
    Since my protocol wasn't stranded I should be losing half the counts when --stranded=yes but as you can see this was not the case.. I tried the same for some Illumina data I have access to and got this, which I think its alright.
    Are you positive that the method/kit used to prepare the library is not strand-specific? The Life Tech SOLiD™ Total RNA-Seq Kit does generate strand-specific libraries. How was your library prepared?

    Leave a comment:


  • kmcarr
    replied
    Originally posted by apredeus View Post
    what command line options do you mean? stranded=yes is default behaviour, is it not? I'm using "-t exon" options and that's about it, really. Default strandedness and default "overlap" mode all work for me...
    Originally posted by gringer View Post
    The default options are fine for doing this.
    Be aware that there are two possible --stranded options: "yes" and "reverse". (I guess there's really 3 if you also count "no".) Which one is correct depends on the method used to create the strand specific RNA-Seq library. "yes" tells htseq that the first read for paired end reads (or only read for single end reads) will match the sense strand of the mRNA. "reverse" means that read 1 will match the anti-sense strand. If your libraries were prepared using an Illumina TruSeq Stranded RNA library prep kit (uses the dUTP method) then you should use '--stranded=reverse' for htseq. For other library prep kits/methods carefully check the manufacturer's documentation to determine which orientation the library is.

    Leave a comment:


  • apredeus
    replied
    flags 0/16 should be enough to tell the strand.

    did you try to visualize your bam (or small portion of it) in IGV or something like that? Make sure you have a picture you would expect to see.

    Leave a comment:


  • dlepe
    replied
    Any idea about post #161???

    Leave a comment:


  • apredeus
    replied
    great, thank you!

    Leave a comment:


  • gringer
    replied
    The default options are fine for doing this.

    Leave a comment:


  • apredeus
    replied
    what command line options do you mean? stranded=yes is default behaviour, is it not? I'm using "-t exon" options and that's about it, really. Default strandedness and default "overlap" mode all work for me...

    Leave a comment:


  • gringer
    replied
    Correct, assuming that all the right command line options had been set up.

    Leave a comment:


  • apredeus
    replied
    so if two genes' exons overlap but the genes are on the opposite strands, the read landing in the overlap would NOT be considered ambiguous?

    Leave a comment:


  • gringer
    replied
    Originally posted by apredeus View Post
    does it matter for "ambiguous" reads if they land on the right strand? I.e. for cases shown in the two last cases, if gene A and gene B are on opposite strands, and the library is stranded, there is no ambiguity actually. Is that taken into consideration?
    It is taken into consideration if it's set in the command-line options (on by default). For stranded counting, a read will only be considered ambiguous if the exon on the corresponding strand is shared by multiple genes (this can happen). Otherwise, an ambiguous count includes genes on the opposite strand as well (much more likely).

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X