[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?
Yan P. Yuan
yuan at embl.de
Tue Jul 24 12:00:28 BST 2018
Hi,
Currently, I'm interested in genomic aligments between 2 species: homo sapience
and mus musculus.
If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor to fetch the
genomic alignments, the output misses the corresponding gene names (or involved
transcript names).
Can someone point out how to get the gene names via the used methods? Or a
different way to do it?
Thanks a lot.
Yan
PS:
The folllowing code was used:
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
use Bio::AlignIO;
my $reg = 'Bio::EnsEMBL::Registry';
$reg->load_registry_from_db(
-host=>'ensembldb.ensembl.org',
-port=>5306,
-user=>'anonymous',
);
my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
"MethodLinkSpeciesSet");
my $mlss =
$mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
['human', 'mouse']);
my $species_slice_adaptor = $reg->get_adaptor("mouse", "core", "Slice");
my $genomic_align_block_adaptor = $reg->get_adaptor("Multi", "compara",
"GenomicAlignBlock");
# Define a region to fetch
my ($species_start, $species_end) = (143702058, 143702091);
my $species_slice = $species_slice_adaptor->fetch_by_region("chromosome", 1,
$species_start, $species_end, -1);
my $all_genomic_align_blocks =
$genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
$species_slice);
my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
-fh=>\*STDOUT,
-format=>'clustalw',
-idlength=>20);
# print the restricted alignments
foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
my $restricted_gab =
$genomic_align_block->restrict_between_reference_positions($species_start,
$species_end);
foreach my $genomic_align ( @{ $restricted_gab->get_all_GenomicAligns()
} ) {
printf("%s %s:%d-%d:%s:%s\n", $genomic_align->genome_db->name,
$genomic_align->dnafrag->name, $genomic_align->dnafrag_start,
$genomic_align->dnafrag_end, $genomic_align->dnafrag_strand,
$genomic_align->aligned_sequence());
}
print "\n";
}
The output is then:
homo_sapiens 1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
mus_musculus 1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC
One way to get to the gene name/transcripts I've used is GeneMember-Adaptor &
Homology-Adaptor, but then I'd miss the genomic alignments.
This was taken from an ENSEMBL example:
use strict;
use warnings;
# using this script to get the ortholog of a given gene of mus musculus
use Bio::EnsEMBL::Registry;
my $reg = 'Bio::EnsEMBL::Registry';
$reg->load_registry_from_db(
-host=>'ensembldb.ensembl.org',
-port=>5306,
-user=>'anonymous',
);
# Getting the gene_member_adaptor
my $gene_member_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
'compara', 'GeneMember');
my $gene_member =
$gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');
# Getting the homology adaptor
my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
'compara', 'Homology');
my $homologies = $homology_adaptor->fetch_all_by_Member($gene_member);
#homologies = hash array
foreach my $homology ( @{$homologies} ) {
foreach my $member ( @{$homology->get_all_Members} ) {
if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
print $member->description, "\n";
}
}
}
Output:
Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X Start:114017033
End:114017102 Acc:RF00088
Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11 Start:62853663
End:62853732 Acc:RF00088
Or is there a better way to get homology members only between 2 species?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/af252683/attachment.html>
More information about the Dev
mailing list