Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sex/gender tags for BAM files?

    My analysis of next-generation sequencing data is gender / sex specific (should I consider the Y chromosome, or two X chromosomes? )

    I am unaware of any flag in .bam files (or samtools) for the gender of an individual (Male of Female). Is there a tag for gender that I don't know about?

    I am also unaware of gender consideraton in aligners such as BWA, etc. Do these aligners in fact consider gender when aligning?

    Finally, I couldn't find any gender / sex information on databases such as 1000 Genomes. Is it hidden away somewhere?

  • #2
    I don't think aligner algorithms are gender specific and bam files do not have gender flags. The gender/relationship data is passed into NGS analysis using pedigree files.

    Comment


    • #3
      Hi, I come back to this question for another reason:
      It's true that if you have reads from a female, you will have twice aligned reads on ChrX in comparison to a male... so for RNA-seq or ChIP-seq, this would introduce a biais in finding peaks or expressed transcripts, won't it?

      Second, to merge .bam files from male and female replicats with samtools, this generate an error of the type : different target sequence name: 'chrY' != 'chrM'
      because ChrY is absent from female .bam header....

      Any idea of how I could merge these files ? should only add chrY line in the header of the female.bam? is it sufficient? and How can I do? because .bam files are compressed binary files...

      Thanks for your help!
      Last edited by spacup; 12-02-2013, 03:00 AM.

      Comment


      • #4
        For downstream analysis, you would need to account for gender in your model fit (so "Counts ~gender + SomeFactor ..."), which is done by associating samples with factors. If you align to gender-specific genomes and need to subsequently merge files, then simply reheader the female samples and use a genome sorted such that chrY is last (that way you can simply swap in a new header without needing to modify any of the reads).
        Last edited by dpryan; 12-04-2013, 02:21 AM.

        Comment


        • #5
          Thanks dpryan for your fast answer, the question is how to reheader the female sample since the .bam file are compressed binary files.... ?

          EDIT: ok, i found this :
          samtools reheader <in.header.sam> <in.bam>

          I didn't know this command, I will try it!
          Last edited by spacup; 12-02-2013, 03:23 AM.

          Comment


          • #6
            i see the need to account for gender in downstream analysis, but for what kind of data would it make a difference for the alignment (to include/not include chrY)?

            Comment


            • #7
              One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.

              Comment


              • #8
                Hi, I see that my previous message has not been published...
                The idea is not to align data as when you have a .bam file, data are already mapped. I need to combine .bam file to perform peak calling.

                I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.

                By the way, the command samtools reheader <in.header.sam> <in.bam> worked, thanks!
                But my results are quite different from ENCODE ones...

                Comment


                • #9
                  Just be very careful when you reheader a BAM file. If you change the order of the chromosomes then everything will be screwed up. If the order is such that chrY is last and everything else is the same, then things should work OK.

                  Comment


                  • #10
                    Originally posted by spacup View Post
                    I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.
                    i still dont understand. why not just merge the files as they are? why the reheader?

                    if the biological replicates are of different gender, maybe it would be best to exclude all reads mapping to chrX and Y. the inactive copy of chrX will be quite different from the active ones.

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.
                      you would probably also create bias on other chromosomes, e.g. reads then mapping uniquely to chrX ..

                      Comment


                      • #12
                        If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.
                          if there are only female samples in the whole study then i agree. otherwise the increased accuracy, which you couldnt achieve in male samples, might lead to false positive results?

                          so ENCODE performs/provides alignments of female samples without chrY?

                          Comment


                          • #14
                            In this case, the difference in mapping would be accounted for in the downstream statistics. If you wanted to directly compare the genders, though, then this would be an issue. I actually haven't a clue whether the original samples were aligned to a genome lacking chrY or not. From what spacup wrote, that's just my presumption.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Recent Developments in Metagenomics
                              by seqadmin





                              Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                              09-23-2024, 06:35 AM
                            • seqadmin
                              Understanding Genetic Influence on Infectious Disease
                              by seqadmin




                              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                              09-09-2024, 10:59 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 10-02-2024, 04:51 AM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 10-01-2024, 07:10 AM
                            0 responses
                            19 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 09-30-2024, 08:33 AM
                            0 responses
                            24 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 09-26-2024, 12:57 PM
                            0 responses
                            18 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X