[ensembl-dev] Regulatory Features by CellType

Daniel Sobral sobral at ebi.ac.uk
Tue Apr 12 17:28:22 BST 2011


Hi Fiona,

To fetch Regulatory Features, you should use the Regulatory Feature Adaptor.
You could use the fetch_all function, common to all adaptors

my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all()};

This can take a while, as the adaptor fetches all the data associated to 
all regulatory features.
Moreover, for the moment it also fetches all Regulatory Features, 
irrespective of cell type.

You should probably consider dividing it by slices and use 
fetch_all_by_Slice.
This has the further advantage of only fetching, by default, the global 
MultiCell regulatory features.
Note that annotation (eg. Promoter-specific) is only cell-specific.

my @regulatory_features_slice = 
@{$regulatory_feature_adaptor->fetch_all_by_Slice($slice)};

For each of the features you retrieve like this, you can then explore 
their cell-specific context.
This includes cell-specific annotation, as well as extra boundaries 
deriving from cell-specific histone and polymerase data.

foreach my $rf (@rfs){
     my @cell_specific_rfs = 
@{$regfeat_adaptor->fetch_all_by_stable_ID($rf->stable_id)};
     foreach my $cell_specific_rf (@cell_specific_rfs){
         #You could do the filtering here by feature_type->name
         print 
$cell_specific_rf->stable_id."\t".$cell_specific_rf->cell_type->name."\t".$cell_specific_rf->feature_type->name."\n";
     }
}

If you want to filter by their annotation, for the moment you have to do 
it after fetching them all first.


The basic use pattern that the API is currently centered on is to fetch 
MultiCell Regulatory Features, and then for each individual regulatory 
feature, to fetch specific cell-type information. It is not currently 
optimized to fetch data from individual cell lines, although it is still 
relatively easy to do it. We are actively considering what users 
actually want to do and try to design the API to make it easier and more 
efficient for them to do it.

Hope this helps.

Regards,
Daniel

On 12/04/2011 13:45, Fiona Nielsen wrote:
> Still digging into the Ensembl Regulatory Features...
>
> I am trying to retrieve the regulatory features defined on/by multiple
> cell lines, e.g. I am trying to retrieve a dataset similar to the
> BioMart query of:
> -
> Homo sapiens features (GRCh37.p2)
> Filters
>   Feature Type : Gene Associated,Non-Gene Associated,Promoter
> Associated,RegulatoryFeature,Unclassified
>   Cell Type : MultiCell
> Attributes
>   Feature Set
>   Feature Type
>   Chromosome Name
>   Start (bp)
>   End (bp)
>   Cell Type
> -
>
> However, the Feature Set Adaptor requires a CellType object to specify
> the cell type, and the CellTypeAdaptor does not work with the name
> 'MultiCell':
>
>      my $ct_adaptor = $efg_db->get_CellTypeAdaptor();
>      my $ct = $ct_adaptor->fetch_by_name('HeLa'); # does not work with
> 'MultiCell'
>      my @rf_fsets = @{$fset_adaptor->fetch_all_by_CellType($ct)};
>
>      foreach my $rf_fset(@rf_fsets){
>   	$returnstring .= $rf_fset->name.",";
>      }
>
> How then is the best way to retrieve the regulatory features from Cell
> Type = 'MultiCell'?
> Is there a function that returns all possible Cell Type names? (by the
> hypothesis that these features might be named differently in the
> database)
>
> Next, if I want only a subset of the results above, e.g. only the
> Feature Type = 'Promoter Associated', do I then have to sort through
> the result myself, or is there another way to specify both of these
> filters through the API?
>
>
> All suggestions are appreciated,
>
> Thanks,
> -Fiona-
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list