[ensembl-dev] Dev Digest, Vol 39, Issue 19

Sam Seaver samseaver at gmail.com
Thu Sep 19 17:10:14 BST 2013


Dear Dan,

I missed your reply because I forgot that I have my email settings set to
"digest" and was also filtering these emails.

After a bit of digging, I reached a conclusion:

The version of the Chlamydomonas genome installed at EnsemblPlants is
actually version 3.1, whilst the version installed at the JGI (note, not
Phytozome itself) is version 3.0, you can see a separate set of files in
this page:

http://genome.jgi-psf.org/Chlre3/Chlre3.download.ftp.html

(it took me a while to realize that there was actually two different
sub-versions available)

The problem was exacerbated by the fact that some of the genes are almost
identical in both versions, but have different gene ids, so some recent
data I downloaded, containing Chlamydomonas annotation, was using gene ids
not found in version 3.1.

On a related note, I see that Phytozome is planning on releasing v5.0, does
EnsemblPlants have plans to include this version?

Thanks
Sam

On Tue, Sep 17, 2013 at 6:00 AM, <dev-request at ensembl.org> wrote:

> Send Dev mailing list submissions to
>         dev at ensembl.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.ensembl.org/mailman/listinfo/dev
> or, via email, send a message with subject or body 'help' to
>         dev-request at ensembl.org
>
> You can reach the person managing the list at
>         dev-owner at ensembl.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Dev digest..."
>
>
> Today's Topics:
>
>    1. Missing genes from Chlamydomonas genome (Sam Seaver)
>    2. Re: [1000G #353572] How to get population names for       SNP from
>       homo_sapiens_variation_73_37 - browser.1000genomes.org
>       (Patricia Buendia)
>    3. Re: Missing genes from Chlamydomonas genome (Dan Staines)
>    4. VEP ignoring SNVs when called alongisde an insertion      or
>       deletion (David Parry)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 16 Sep 2013 15:30:48 -0500
> From: Sam Seaver <samseaver at gmail.com>
> Subject: [ensembl-dev] Missing genes from Chlamydomonas genome
> To: Ensembl developers list <dev at ensembl.org>
> Message-ID:
>         <CAGwzEpYYZTKm6PeWu244eSswxb7E7JzoOo69MBRqH=qCssm=
> Ag at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear Ensembl,
>
> I just found out that approximately 10% of the C. reinhardtii genome (v3)
> in the JGI database is missing from the C. reinhardtii genome installed at
> EnsemblPlants.
>
> Would anybody be able to explain this discrepancy for me?
>
> Thanks
> Sam Seaver
>
> --
> Postdoctoral Fellow
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 9700 S. Cass Avenue
> Argonne, IL 60439
>
> http://www.linkedin.com/pub/sam-seaver/0/412/168
> samseaver at gmail.com
> (773) 796-7144
>
> "We shall not cease from exploration
> And the end of all our exploring
> Will be to arrive where we started
> And know the place for the first time."
>    --T. S. Eliot
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.ensembl.org/pipermail/dev/attachments/20130916/e6f77b96/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 16 Sep 2013 17:54:48 -0400
> From: "Patricia Buendia" <paty at infotechsoft.com>
> Subject: Re: [ensembl-dev] [1000G #353572] How to get population names
>         for     SNP from homo_sapiens_variation_73_37 -
> browser.1000genomes.org
> To: <dev at ensembl.org>
> Cc: "'Laura Clarke via RT '"@sanger.ac.uk
> Message-ID: <002401ceb327$613fca30$23bf5e90$@com>
> Content-Type: text/plain;       charset="utf-8"
>
> To the dev mailing list:
>
> I would very much appreciate getting some help with this question:
>
> I have a question regarding the Ensembl mySQL database. I do not want to
> use the API but directly query the database.
>
> What SQL statement would I use to obtain the records shown in
> http://browser.1000genomes.org/Homo_sapiens/Variation/Population?db=core;g=ENSG00000134242;r=1:114356433-114414381;source=dbSNP;v=rs114092230;vdb=variation;vf=27418953#_
>
> for SNP rs114092230?
>
> When running an SQL query linking the population, allele and variation
> tables in homo_sapiens_variation_73_37, I get only population.name=
> "1000GENOMES:pilot_1_YRI_low_coverage_panel" for that SNP, but the above
> link shows many more populations. How do I get the same data using an SQL
> statement.
>
> Paty
>
>
> -----Original Message-----
> From: Laura Clarke via RT [mailto:info at 1000genomes.org]
> Sent: Friday, September 13, 2013 2:33 PM
> To: paty at infotechsoft.com
> Subject: [1000G #353572] How to get population names for SNP from
> homo_sapiens_variation_73_37 - browser.1000genomes.org
>
> I would recommend reading the tutorial
>
> http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html
>
> and if that doesn't help email the dev mailing list
>
> thanks
>
> Laura
>
> On Fri Sep 13 18:35:28 2013, paty at infotechsoft.com wrote:
> > Thank you, Laura. So I just have to wait and don't need to send an
> > email to dev at ensembl.org?
> > Paty
> >
> > -----Original Message-----
> > From: Laura Clarke via RT [mailto:info at 1000genomes.org]
> > Sent: Friday, September 13, 2013 4:19 AM
> > To: paty at infotechsoft.com
> > Subject: [1000G #353572] How to get population names for SNP from
> > homo_sapiens_variation_73_37 - browser.1000genomes.org
> >
> > Fri Sep 13 09:19:18 2013: Request 353572 was acted upon.
> > Transaction: Taken by laura at ebi.ac.uk
> > Queue: 1000genomes
> > Subject: How to get population names for SNP from
> > homo_sapiens_variation_73_37 - browser.1000genomes.org
> > Owner: laura at ebi.ac.uk
> > Requestors: paty at infotechsoft.com
> > Status: new
> > Ticket <URL: https://rt.sanger.ac.uk/Ticket/Display.html?id=353572 >
> >
> >
> > Your ticket has been assigned to an engineer, as shown in the Owner
> > field above.
> >
> > Regards,
> > 1000 Genomes Project Helpdesk
> > info at 1000genomes.org
> >
> >
> >
>
>
>
> This email is sent from the Hinxton Campus RT tracking system, which is
> managed for the Sanger Institute and the EBI by the Sanger Institute.
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
>
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 17 Sep 2013 08:51:43 +0100
> From: Dan Staines <dstaines at ebi.ac.uk>
> Subject: Re: [ensembl-dev] Missing genes from Chlamydomonas genome
> To: <dev at ensembl.org>
> Message-ID: <defe8ae1442616bb61cc0a98aa00bac7 at ebi.ac.uk>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 2013-09-16 21:30, Sam Seaver wrote:
> > Dear Ensembl,
> >
> > I just found out that approximately 10% of the C. reinhardtii genome
> > (v3) in the JGI database is missing from the C. reinhardtii genome
> > installed at EnsemblPlants.?
> >
> > Would anybody be able to explain this discrepancy for me?
>
> Hi Sam,
>
> This genome was loaded from the assembly and annotation submitted to
> INSDC:
> http://www.ebi.ac.uk/ena/data/view/GCA_000002595.2
> The most likely explanation is that the JGI version has been updated
> more recently but has not been resubmitted. However, we'll do some more
> digging to as there are some discrepancies about numbers of submitted
> scaffolds that we need to examine.
>
> Thanks,
>
> Dan.
>
> --
> Dan Staines, PhD               Ensembl Genomes Technical Coordinator
> EMBL-EBI                       Tel: +44-(0)1223-492507
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 17 Sep 2013 10:22:24 +0100
> From: David Parry <D.A.Parry at leeds.ac.uk>
> Subject: [ensembl-dev] VEP ignoring SNVs when called alongisde an
>         insertion       or deletion
> To: "dev at ensembl.org" <dev at ensembl.org>
> Message-ID: <52381F50.8010805 at leeds.ac.uk>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> I apologize if I have misunderstood the caveats given regarding the VCF
> input format for the VEP but I am observing unexpected behavior that I
> don't think is covered by the documentation. If I provide a multiallelic
> variant with both an insertion and a deletion call at the same site the
> VEP correctly outputs both consequences. However, if a variant contains
> either an insertion or deletion alongside a substitution the VEP ignores
> the substitution variant.  For example, while the following variant in a
> VCF:
>
> 6       32634300        .       G       C,CTA
>
> gives the output:
>
> ## ENSEMBL VARIANT EFFECT PREDICTOR v73
> ## Output produced at 2013-09-17 09:57:41
> ## Connected to
> ## Using cache in /home/davidparry/.vep/homo_sapiens/73
> ## Using API version 73, DB version ?
> ## Extra column keys:
> ## DISTANCE : Shortest distance from variant to transcript
> #Uploaded_variation     Location        Allele  Gene    Feature
> Feature_type    Consequence     cDNA_position   CDS_position
> Protein_position        Amino_acids     Codons  Existing_variation
> Extra
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344
> ENST00000484729 Transcript
> frameshift_variant,NMD_transcript_variant,feature_elongation    115-116
> 84-85   28-29   -       -       -
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344
> ENST00000399082 Transcript      frameshift_variant,feature_elongation
> 129-130 84-85   28-29   -       -       -
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344
> ENST00000399084 Transcript      frameshift_variant,feature_elongation
> 263-264 84-85   28-29   -       -       -
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344
> ENST00000434651 Transcript      frameshift_variant,feature_elongation
> 171-172 84-85   28-29   -       -       -
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344
> ENST00000399079 Transcript      frameshift_variant,feature_elongation
> 141-142 84-85   28-29   -       -       -
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344
> ENST00000374943 Transcript      frameshift_variant,feature_elongation
> 161-162 84-85   28-29   -       -       -
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000241287
> ENST00000443574 Transcript      upstream_gene_variant   -       -
> -       -       -       -       DISTANCE=4073
> 6_32634301_-/-/TA       6:32634300-32634301     TA      ENSG00000179344
> ENST00000487676 Transcript
> non_coding_exon_variant,nc_transcript_variant,feature_elongation
> 115-116 -       -       -       -  -
>
> In this case the substitution variant is ignored and we only get a
> consequence for the insertion.  Similarly, for a deletion at the same
> site as a substitution:
>
> 6       32634300        .       GTA     G,CTA
>
> gives:
>
> ## ENSEMBL VARIANT EFFECT PREDICTOR v73
> ## Output produced at 2013-09-17 09:51:08
> ## Connected to
> ## Using cache in /home/davidparry/.vep/homo_sapiens/73
> ## Using API version 73, DB version ?
> ## Extra column keys:
> ## DISTANCE : Shortest distance from variant to transcript
> #Uploaded_variation     Location        Allele  Gene    Feature
> Feature_type    Consequence     cDNA_position   CDS_position
> Protein_position        Amino_acids     Codons  Existing_variation
> Extra
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344
> ENST00000484729 Transcript
> frameshift_variant,NMD_transcript_variant,feature_truncation    114-115
> 83-84   28      -       -       -
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344
> ENST00000399082 Transcript      frameshift_variant,feature_truncation
> 128-129 83-84   28      -       -       -
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344
> ENST00000399084 Transcript      frameshift_variant,feature_truncation
> 262-263 83-84   28      -       -       -
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344
> ENST00000434651 Transcript      frameshift_variant,feature_truncation
> 170-171 83-84   28      -       -       -
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344
> ENST00000399079 Transcript      frameshift_variant,feature_truncation
> 140-141 83-84   28      -       -       -
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344
> ENST00000374943 Transcript      frameshift_variant,feature_truncation
> 160-161 83-84   28      -       -       -
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000241287
> ENST00000443574 Transcript      upstream_gene_variant   -       -
> -       -       -       -       DISTANCE=4074
> 6_32634301_TA/-/TA      6:32634301-32634302     -       ENSG00000179344
> ENST00000487676 Transcript
> non_coding_exon_variant,nc_transcript_variant,feature_truncation
> 114-115 -       -       -       -  -
>
> ...we only get the consequence for the deletion.
>
> Generally I am processing multisample VCF files with VEP and outputting
> in VCF format.  I want to be able to assess the consequences for a given
> sample's genotype but this sometimes fails at sites like this where my
> script can't find an allele corresponding to the substitution in the VEP
> output.  A workaround would be to separate my indel and my substitution
> calls before running the VEP, but I wondered whether this is
> known/desired behaviour for this tool?
>
> The VEP is a really great tool, so it would be brilliant if there were a
> fix for this.
>
> Cheers,
>
> Dave
>
>
>
> ------------------------------
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> End of Dev Digest, Vol 39, Issue 19
> ***********************************
>



-- 
Postdoctoral Fellow
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S. Cass Avenue
Argonne, IL 60439

http://www.linkedin.com/pub/sam-seaver/0/412/168
samseaver at gmail.com
(773) 796-7144

"We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time."
   --T. S. Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130919/20cd621e/attachment.html>


More information about the Dev mailing list