Thursday, 19 May 2011

Well that KEGGing sucks - but how much?

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a hugely important database of genomic pathways and interactions that has been used daily by countless molecular biologists over the past 15 years (up to 200K unique web site visitors per month).

Even though the data sources that KEGG integrates to build its database are predominantly available to all, free of restriction, the KEGG database itself has traditionally carried a dual license - free for academic use, but non-free for commercial use through their Pathway Solutions licensing agent. I'm no great lover of dual licenses as they discourage commercial use thereby restricting translational application of the resource 'for the good of humanity'. Well, two days ago KEGG announced that it would go even further, by charging up to $5000 for academics to download the database (starting July 1st).

Can we use this unfortunate circumstance to assess the impact of limiting access to an established resource such as KEGG? And do it using a scientific measure that really matters; citations? Given that KEGG have 1000 citations/year and a 15 year trading record, the returns for the next few years should be very revealing.

Footnote - funding for large integrated databases is notoriously difficult to maintain over the long term even though the resources themselves are enormously valuable. In February NCBI tackled budget challenges by throwing their SRA toys out of the pram (on which I have commented before), whereas KEGG have been far more pragmatic in looking for alternative sources. I have huge respect for both projects.


  1. Manuel Corpas writes about the cut in funding for the NCBI OMIM database:
    Would it have been preferable for OMIM to continue as a non-free database rather than face it's current future of stagnation?

  2. Next? Well TAIR (Arabidopsis model organism database) has been looking shaky for a while
    Others? Not sure - anyone keeping a list?