Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jjlaisnoopy
    replied
    Thank you all !

    Leave a comment:


  • GenoMax
    replied
    Ah yes. Corrected.

    Leave a comment:


  • sarvidsson
    replied
    Originally posted by GenoMax View Post
    A command line solution. See if this works:

    Code:
    $ samtools view -h file.bam | awk -F'\t' '{OFS = "\n"; ORS = "\n";}{ if ($3 == "*" ) print "@"$1,$10,"+",$11}' > outfile.bam
    GenoMax, you probably mean to call the output file "outfile.fastq", right?
    Code:
    $ samtools view -h file.bam | awk -F'\t' '{OFS = "\n"; ORS = "\n";}{ if ($3 == "*" ) print "@"$1,$10,"+",$11}' > outfile.[B]fastq[/B]

    Leave a comment:


  • GenoMax
    replied
    A command line solution. See if this works:

    Code:
    $ samtools view -h file.bam | awk -F'\t' '{OFS = "\n"; ORS = "\n";}{ if ($3 == "*" ) print "@"$1,$10,"+",$11}' > outfile.fastq
    Last edited by GenoMax; 02-11-2015, 06:38 AM. Reason: correction

    Leave a comment:


  • dpryan
    replied
    Just write a quick little script in python with pysam to do this. There isn't always a premade program to do everything.

    Leave a comment:


  • jjlaisnoopy
    replied
    All I want is the Unmapped reads with no mate or an unmapped mate are assigned to chrom "*" , not include unmapped mate reads which assigned to chr1, chr2, ...
    just reads assigned to
    "*" 0 0 42640654

    Leave a comment:


  • GenoMax
    replied
    Are you asking about why the number is not adding up to 42640654? See possible explanation here: https://www.biostars.org/p/18949/
    Last edited by GenoMax; 02-10-2015, 07:05 PM.

    Leave a comment:


  • jjlaisnoopy
    replied
    Originally posted by GenoMax View Post
    I tried the parameter: -f 4
    And then index the result bam file
    Here is the idxstats from it:

    chrM 16571 0 32042
    chr1 249250621 0 1937746
    chr2 243199373 0 2244387
    chr3 198022430 0 1501432
    chr4 191154276 0 1825761
    chr5 180915260 0 1486923
    chr6 171115067 0 1273600
    chr7 159138663 0 1531851
    chr8 146364022 0 1315785
    chr9 141213431 0 1184324
    chr10 135534747 0 2018103
    chr11 135006516 0 1373030
    chr12 133851895 0 1180836
    chr13 115169878 0 616704
    chr14 107349540 0 758336
    chr15 102531392 0 624791
    chr16 90354753 0 991456
    chr17 81195210 0 916042
    chr18 78077248 0 1614444
    chr19 59128983 0 755016
    chr20 63025520 0 644231
    chr21 48129895 0 547857
    chr22 51304566 0 277926
    chrX 155270560 0 1365662
    chrY 59373566 0 642042
    * 0 0 42640654

    Any suggestions ?

    Leave a comment:


  • GenoMax
    replied
    Originally posted by jjlaisnoopy View Post
    This is good method to split bam file. But I got a question.

    * 0 0 42640654

    How could I get the "*" 42640654 reads, those were not mapped to any contigs ?

    Leave a comment:


  • jjlaisnoopy
    replied
    This is good method to split bam file. But I got a question.
    The following is idxstats of a bam file:
    chrM 16571 2073252 32042
    chr1 249250621 115733016 1937746
    chr2 243199373 104133908 2244387
    chr3 198022430 96577573 1501432
    chr4 191154276 89582368 1825761
    chr5 180915260 94818025 1486923
    chr6 171115067 84533173 1273600
    chr7 159138663 71186849 1531851
    chr8 146364022 65630236 1315785
    chr9 141213431 59368028 1184324
    chr10 135534747 63503839 2018103
    chr11 135006516 59963670 1373030
    chr12 133851895 63898721 1180836
    chr13 115169878 41939790 616704
    chr14 107349540 43647215 758336
    chr15 102531392 39227879 624791
    chr16 90354753 42298502 991456
    chr17 81195210 49043800 916042
    chr18 78077248 75701725 1614444
    chr19 59128983 26119207 755016
    chr20 63025520 32668117 644231
    chr21 48129895 19226969 547857
    chr22 51304566 15797809 277926
    chrX 155270560 74715396 1365662
    chrY 59373566 2021162 642042
    * 0 0 42640654

    How could I get the "*" 42640654 reads, those were not mapped to any contigs ?

    Leave a comment:


  • jazz
    replied
    Thanks everyone. I will give these suggestions a try and let you know how it went.

    Leave a comment:


  • syfo
    replied
    Originally posted by vivek_ View Post
    That's why you have the BAM index right, so you are not reading the entire file to export each coordinate?

    for the sequentiality issue, you can extract the contig names from the bam header into a file and loop over them:

    Code:
    samtools view -H input.bam | awk '{print $2}' | awk '{gsub(/SN\:/,""); print}'  > contigs.txt
    Watch out, you don't want the first ("@HD ...") nor the last ("@PG ...") line of the header.

    Try this instead:
    Code:
    samtools view -H all.bam | sed '1d;s/.*SN:\(.*\)\t.*/\1/;$d' > contigs.list
    Or, if you prefer awk:
    Code:
    samtools view -H all.bam | awk '/^@SQ/{gsub(/SN\:/,"");print $2}' > contigs.list
    or even (just for fun):
    Code:
    samtools idxstats all.bam | cut -f1 > contigs.list
    All those should give you the same list of contigs.

    Then,
    Code:
    for c in `cat contigs.list` ; do
    echo processing $c
    samtools view -bh all.bam $c > $c.bam
    done
    But I agree it might take a while...

    Leave a comment:


  • vivek_
    replied
    That's why you have the BAM index right, so you are not reading the entire file to export each coordinate?

    for the sequentiality issue, you can extract the contig names from the bam header into a file and loop over them:

    Code:
    samtools view -H input.bam | awk '{print $2}' | awk '{gsub(/SN\:/,""); print}'  > contigs.txt

    Leave a comment:


  • dpryan
    replied
    Originally posted by vivek_ View Post
    A unix oneliner should work right?

    Code:
    for i in {1..22};do samtools view -bh input.bam chr$i > chr$i.bam;done
    Depends, I've seen examples where the contigs weren't sequentially numbered (presumably due to some contigs becoming merged as latter data came in)

    Also, for a file with a large number of contigs (have a look at some of the mouse lines from the Sanger Institute), looping over the whole file many many times will get super slow. You could probably write a script to process the whole thing in one go in a fraction of the time.

    Leave a comment:


  • vivek_
    replied
    A unix oneliner should work right?

    Code:
    for i in {1..22};do samtools view -bh input.bam chr$i > chr$i.bam;done
    Last edited by vivek_; 06-05-2013, 01:16 PM.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:46 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-24-2024, 11:09 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
160 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
127 views
0 likes
Last Post seqadmin  
Working...
X