[ensembl-dev] issue with my script to fetch genes from a file using API

Hans-Rudolf Hotz hrh at fmi.ch
Wed Nov 2 16:35:37 GMT 2011


Hi Nathalie

well, then I would write the loop instead of:

	my $genes = $slice->get_all_Genes;
	foreach my $gene( @{$genes} ){
    		print  OUT 
$coordinates[0],"\t",$coordinates[1],"\t",$coordinates[2],"\t",$coordinates[3],"\t",$coordinates[4],"\t",$coordinates[5],"\t",$gene->external_name,"\t",$coordinates[6], 
"\n";
	}

something like this (code not tested):

	print  OUT 
$coordinates[0],"\t",$coordinates[1],"\t",$coordinates[2],"\t",$coordinates[3],"\t",$coordinates[4],"\t",$coordinates[5],"\t";

	my $genes = $slice->get_all_Genes;
	foreach my $gene( @{$genes} ){
    		print  OUT $gene->external_name.",";
	}

	print  OUT "\t"$coordinates[6], "\n";



Regards, Hans



On 11/02/2011 05:17 PM, Nathalie Conte wrote:
> Hans-Rudolf Hotz wrote:
>> Hi Nathalie
>>
>>
> HI, thanks for your answer
>> What do you mean you only get "1 gene name"?
>>
> Sorry,I probably wasn't clear in my previous email in the output I
> wanted. I would like for each region, the full gene list separated by
> commas, instead of a line each time a gene is called. For example if
> within a region there is n genes I want them in the same line rather
> than in n lines.
> like this:
>
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 AC166644.1
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Crygf
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm8809
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm15659
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm8812
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Mtap2
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 SNORA26.5
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm10558
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 U1.85
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Unc80
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Rpe
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 1110028C15Rik
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm15789
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Acadl
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm15793
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 U7.115
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 SNORA17.101
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Myl1
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm10072
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
> 1 loss 3002738 2.0E+08 6.6E+07 6.7E+07 Gm15826
> PKBL_11_2a_PT_PapillenTumor_1469518286.txt,PPAB_4_2d_PT_251469517421.txt
>
> 1 loss 3002738 2.0E+08 3154599 3210686 Xkr4
> 1 loss 3002738 2.0E+08 3406028 3844616 Xkr4
>
>
>>
>> Well, I copy/paste'd your script and run it with a short version of
>> your file (ie the first five lines) and I got back the following two
>> lines:
>> 1 gain 38934531 38934531 38934531 38934531 Chst10
>> PPAB_21_9f_PT_251469518757.txt,PPAB_21_9g_PT_251469518753.txt,PPAB_27_9c_PT_251469518755.txt
>>
>> 1 gain 74668174 74678672 74668174 74678672 Stk36
>> PPAA_3_4g_PT_251469518838.txt,PPAB_3_6b_PT_251469518594.txt
>>
>> without any errors/problems.
>>
>>
>> If I run the complete file, I get your error as soon as I ask for
>> chromosome 20
>>
> thanks for pointing this out, this is mouse, but chr20 is X and chr21 Y
>> maybe you should change
>>
>> my $slice_adaptor = $registry->get_adaptor('Mouse', 'Core', 'Slice');
>>
>> to
>>
>> my $slice_adaptor = $registry->get_adaptor('Human', 'Core', 'Slice');
>>
>>
>> and then your script runs through the whole file.
>>
>>
>>
>> Regards, Hans
>>
> best,
> Nat
>>
>>
>> On 11/02/2011 04:18 PM, Nathalie Conte wrote:
>>> HI, I have a file see format attached
>>> basically the coordinates of the regions I want to retrieve genes from
>>> are column 1( chromosome) 5(start) and 6(end)
>>> I used this script to parse through the file and give my external gene
>>> names , in the output I get only 1 gene name and a message:
>>> Can't call method "get_all_Genes" on an undefined value at
>>> ./fetch_gene_API.pl line 34.
>>> I don't understand where this comes from , I wanted to check whether you
>>> could help/advise
>>> thanks
>>> Nathalie
>>>
>>> #!/software/bin/perl
>>> use warnings;
>>> use strict;
>>> use Bio::EnsEMBL::Registry;
>>> use Bio::EnsEMBL::Utils::Sequence qw(reverse_comp);
>>> #use lib "/nfs/team82/nac/amy/may2011/lib";
>>>
>>> my $registry = "Bio::EnsEMBL::Registry";
>>> $registry->load_registry_from_db(-host => 'ensembldb.ensembl.org', -user
>>> => 'anonymous');
>>> my $file =
>>> "/nfs/team82/nac/Roland/November_2011/PT_mcrs0.25T_50_2_20M_sdundo.bind_API.txt";
>>>
>>>
>>>
>>> unless (open(REGIONS, $file)){
>>> print "Cannot open file \"$file\"\n\n";
>>> }
>>>
>>> my @regions = <REGIONS>;
>>>
>>> close REGIONS;
>>> open(OUT,">PT_mcrs0.25T_50_2_20M_sdundo.bind_API_fetchGene.txt");
>>> my $slice_adaptor = $registry->get_adaptor('Mouse', 'Core', 'Slice');
>>>
>>> foreach my $region(@regions){
>>> chomp $region;
>>>
>>> my @coordinates = split(/\t/, $region);
>>>
>>> my $chromosome = $coordinates[0];
>>> my $start = $coordinates[4];
>>> my $end = $coordinates[5];
>>>
>>> my $slice = $slice_adaptor->fetch_by_region('chromosome',$chromosome,
>>> $start, $end);
>>>
>>>
>>> my $genes = $slice->get_all_Genes;
>>> foreach my $gene( @{$genes} ){
>>> print OUT
>>> $coordinates[0],"\t",$coordinates[1],"\t",$coordinates[2],"\t",$coordinates[3],"\t",$coordinates[4],"\t",$coordinates[5],"\t",$gene->external_name,"\t",$coordinates[6],
>>>
>>> "\n";
>>>
>>> }
>>> }
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>
>
>
>




More information about the Dev mailing list