Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to convert sorted.txt files from the Illumina pipeline v1.3.4 to bam or sam?

    Hi all,

    how to convert the sorted.txt file from illumina pipeline to bam or sam file?

    Thanks.

  • #2
    Hi all,

    I found http://genomewiki.ucsc.edu/index.php/ABRF2010_Tutorial

    I downloaded the latest version of samtools

    I use export2sam.pl --read1=chr21_export.txt \
    | perl -wpe 's/(chr.*)\.fa/$1/' \
    > chr21.sam

    it says

    ERROR: Unexpected number of fields in export record on line 1 of read1 export file. Found 16 fields but expected 22.

    my sorted.txt file only contains 16 fields, and the program is complaining about that.

    but the example file given in the link above also contains only 16 fields.
    I though may be they changed the format. so I then downloaded the earlier version of samtools (0.1.8,0.1.9 etc...)

    still, it gave me error like:

    Use of uninitialized value $t[21] in string eq at export2sam.pl line 279,


    earlier version just says die at liine 17.

    Can anyone help me? Thanks!

    Comment


    • #3
      Hey were you ever able to find a solution to your problem? I am currently running into the same issue as well.

      [jhpce01 /amber3/feinbergLab/personal/sramazan/chip-seq]$ /amber3/feinbergLab/personal/sramazan/perl/scripts/export2sam.pl --read1=GSM1053091_mm9.nac.inp1.sorted.txt
      @PG ID:export2sam.pl VN:2.3.1 CL:/amber3/feinbergLab/personal/sramazan/perl/scripts/export2sam.pl --read1=GSM1053091_mm9.nac.inp1.sorted.txt

      ERROR: Unexpected number of fields in export record on line 1 of read1 export file. Found 16 fields but expected 22.
      ...erroneous export record:
      HWI-EASXXX 1 2 35 1301 1347 0 1 ATGTAGCTAGAGACTTGAGCTCTGGGGGGTACTGGT aaa^]`aa`a_a_[_^`^`__`^^^][_XLQR[[]S chr10.fa 3003189 F 36 12

      Comment


      • #4
        Originally posted by Nino View Post
        Hey were you ever able to find a solution to your problem? I am currently running into the same issue as well.

        [jhpce01 /amber3/feinbergLab/personal/sramazan/chip-seq]$ /amber3/feinbergLab/personal/sramazan/perl/scripts/export2sam.pl --read1=GSM1053091_mm9.nac.inp1.sorted.txt
        @PG ID:export2sam.pl VN:2.3.1 CL:/amber3/feinbergLab/personal/sramazan/perl/scripts/export2sam.pl --read1=GSM1053091_mm9.nac.inp1.sorted.txt

        ERROR: Unexpected number of fields in export record on line 1 of read1 export file. Found 16 fields but expected 22.
        ...erroneous export record:
        HWI-EASXXX 1 2 35 1301 1347 0 1 ATGTAGCTAGAGACTTGAGCTCTGGGGGGTACTGGT aaa^]`aa`a_a_[_^`^`__`^^^][_XLQR[[]S chr10.fa 3003189 F 36 12
        see my post here http://crazyhottommy.blogspot.com/20...-bam-file.html

        Comment


        • #5
          @crazyhottommy: You should clarify on your blog post that your modifications are specifically targeted for human (?) data. If someone else has a different genome it would be incorrect to follow your procedure, as is.

          @Nino/@crazyhottommy: I am not sure what the downstream application is/was in your case but you have to account for the Q-scores probably being in non-sanger format (this is old data). Most new tools will expect them to be in sanger format.

          @Nino: Check your PM. I sent you a script to recreate fastq sequence file yesterday. That may be a safer place to start. I can post it here if it works for you.

          Comment


          • #6
            @GenoMax: I received your PM if you could please look at my response to see if the file I am working on is an alignment file. Also the data I am working with is from NCBI website which I downloaded to use, apparently they (people who uploaded the data) used the CASAVA Illumina pipeline (this is all the information that was given to me).

            Comment


            • #7
              Originally posted by GenoMax View Post
              @crazyhottommy: You should clarify on your blog post that your modifications are specifically targeted for human (?) data. If someone else has a different genome it would be incorrect to follow your procedure, as is.

              @Nino/@crazyhottommy: I am not sure what the downstream application is/was in your case but you have to account for the Q-scores probably being in non-sanger format (this is old data). Most new tools will expect them to be in sanger format.

              @Nino: Check your PM. I sent you a script to recreate fastq sequence file yesterday. That may be a safer place to start. I can post it here if it works for you.
              Thanks, I updated the post accordingly.

              Comment


              • #8
                @crazyhottommy

                Try this script. It gives you the fastq file instead of the sam/bam file

                #!/usr/bin/perl -w

                use warnings;
                use strict;

                my $datafile = $ARGV[0];
                my $outfile = $ARGV[1];

                open (IN, $datafile) or die "can't open the datafile: $datafile\n";
                open (OUT, ">$outfile") or die "can't open the outputfile: $outfile\n";

                while(my $line=<IN>){
                chomp $line;
                my @i = split(/\t/, $line);
                print OUT "@".$i[0].":".$i[1].$i[2].":".$i[3].":".$i[4].":".$i[5]."#".$i[6]."/".$i[7]."\n".$i[8]."\n"."+"."\n".$i[9]."\n";
                }
                close IN;
                close OUT;

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin



                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  Yesterday, 02:07 PM
                • seqadmin
                  Multiomics Techniques Advancing Disease Research
                  by seqadmin


                  New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                  A major leap in the field has
                  ...
                  02-08-2024, 06:33 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 02-23-2024, 04:11 PM
                0 responses
                44 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-21-2024, 08:52 AM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-20-2024, 08:57 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-14-2024, 09:19 AM
                0 responses
                65 views
                0 likes
                Last Post seqadmin  
                Working...
                X