Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gringer
    replied
    Here's a rough idea of how to do bam2fastq:
    Code:
    samtools view file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file.fastq
    Unfortunately this will give you a fastq file with interleaved reads, which can be a little bit of a pain to use. You can use the filter function (-f / -F) of samtools view to get around that, reading through the BAM file twice:

    Code:
    samtools view -f 0x40 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R1.fastq
    samtools view -f 0x80 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R2.fastq
    The SAM File format specification is your friend, see section 1.4.

    The process of BAM -> FASTQ -> Tophat is slower in terms of computer time, but from your description it sounds like it will be quicker in terms of bum-on-seat time.
    Last edited by gringer; 06-08-2014, 03:08 AM.

    Leave a comment:


  • dpryan
    replied
    The general idea is to:
    1. Iterate over the reads
    2. For each read, get its start and end position.
    3. If at least one of the exons could be between those coordinates then get the CIGAR
    4. Parse the CIGAR string into a sequence of aligned regions
    5. For each region, note if it overlaps one of your exons. Add that to a vector or a data structure of your choice (you could even just use an integer as a bitmap).
    6. Once you've iterated through the aligned regions for a read of interest, look at the structure from the previous step and proceed as desired.


    That's the general idea. If your BAM file is coordinate sorted and indexed, then you can simply request the reads covering the regions of interest, which will make things a bit quicker.

    Leave a comment:


  • adrian
    replied
    Thank you.
    Is there a particular function that I could use? If not what would be the logic to get those read stats.

    thanks
    Adrian

    Leave a comment:


  • dpryan
    replied
    The simplest method would be to just script this in pysam (or whatever language you prefer).

    BTW, you can convert the BAM file to fastq and realign that, but it's faster to just write a little python script.

    Leave a comment:


  • adrian
    started a topic Bam file to junctions.bed

    Bam file to junctions.bed

    Hi:

    I received aligned BAM file and do not have a raw sequence file.

    My aim is to count how many reads skip exon 7 of a gene and how many reads do not skip, in addition to reads that span exons 6 and 7 ; 7 and 8.

    Ex 6-------- Ex 7 -------- Ex 8
    ___________ __________ => Condition1: reads that span 6-7 and exs 7-8
    ____//////////////////////___ => Condition 2:reads skipping exon 7
    ___________________ => Condition 3: reads that span exon 6,7 and end in 8.



    Is there a way to get numbers from my BAM file for above 3 conditions.

    Yes, if I have raw reads, i would use TopHat and get junctions.bed to deduce these. However, the BAM file was not generated using TopHat and I don't have access to raw sequences.

    Is there a way to get junctions.bed - perhaps convert bam to FASTA and then realign using Tophat. This potentially 'would' corrupt the paired end structure..leading to loss of read numbers..( I am not so sure about this though)

    Or

    Is there any other smart way to just count reads that jump exon 7.

    Appreciate any response. Thanks a lot.

    Adrian

Latest Articles

Collapse

  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
17 views
0 likes
Last Post seqadmin  
Working...
X