[ensembl-dev] Regulatory Features by CellType
Daniel Sobral
sobral at ebi.ac.uk
Tue Apr 12 17:28:22 BST 2011
Hi Fiona,
To fetch Regulatory Features, you should use the Regulatory Feature Adaptor.
You could use the fetch_all function, common to all adaptors
my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all()};
This can take a while, as the adaptor fetches all the data associated to
all regulatory features.
Moreover, for the moment it also fetches all Regulatory Features,
irrespective of cell type.
You should probably consider dividing it by slices and use
fetch_all_by_Slice.
This has the further advantage of only fetching, by default, the global
MultiCell regulatory features.
Note that annotation (eg. Promoter-specific) is only cell-specific.
my @regulatory_features_slice =
@{$regulatory_feature_adaptor->fetch_all_by_Slice($slice)};
For each of the features you retrieve like this, you can then explore
their cell-specific context.
This includes cell-specific annotation, as well as extra boundaries
deriving from cell-specific histone and polymerase data.
foreach my $rf (@rfs){
my @cell_specific_rfs =
@{$regfeat_adaptor->fetch_all_by_stable_ID($rf->stable_id)};
foreach my $cell_specific_rf (@cell_specific_rfs){
#You could do the filtering here by feature_type->name
print
$cell_specific_rf->stable_id."\t".$cell_specific_rf->cell_type->name."\t".$cell_specific_rf->feature_type->name."\n";
}
}
If you want to filter by their annotation, for the moment you have to do
it after fetching them all first.
The basic use pattern that the API is currently centered on is to fetch
MultiCell Regulatory Features, and then for each individual regulatory
feature, to fetch specific cell-type information. It is not currently
optimized to fetch data from individual cell lines, although it is still
relatively easy to do it. We are actively considering what users
actually want to do and try to design the API to make it easier and more
efficient for them to do it.
Hope this helps.
Regards,
Daniel
On 12/04/2011 13:45, Fiona Nielsen wrote:
> Still digging into the Ensembl Regulatory Features...
>
> I am trying to retrieve the regulatory features defined on/by multiple
> cell lines, e.g. I am trying to retrieve a dataset similar to the
> BioMart query of:
> -
> Homo sapiens features (GRCh37.p2)
> Filters
> Feature Type : Gene Associated,Non-Gene Associated,Promoter
> Associated,RegulatoryFeature,Unclassified
> Cell Type : MultiCell
> Attributes
> Feature Set
> Feature Type
> Chromosome Name
> Start (bp)
> End (bp)
> Cell Type
> -
>
> However, the Feature Set Adaptor requires a CellType object to specify
> the cell type, and the CellTypeAdaptor does not work with the name
> 'MultiCell':
>
> my $ct_adaptor = $efg_db->get_CellTypeAdaptor();
> my $ct = $ct_adaptor->fetch_by_name('HeLa'); # does not work with
> 'MultiCell'
> my @rf_fsets = @{$fset_adaptor->fetch_all_by_CellType($ct)};
>
> foreach my $rf_fset(@rf_fsets){
> $returnstring .= $rf_fset->name.",";
> }
>
> How then is the best way to retrieve the regulatory features from Cell
> Type = 'MultiCell'?
> Is there a function that returns all possible Cell Type names? (by the
> hypothesis that these features might be named differently in the
> database)
>
> Next, if I want only a subset of the results above, e.g. only the
> Feature Type = 'Promoter Associated', do I then have to sort through
> the result myself, or is there another way to specify both of these
> filters through the API?
>
>
> All suggestions are appreciated,
>
> Thanks,
> -Fiona-
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
More information about the Dev
mailing list