[iks-community] Introducing myself and the Nuxeo platform

Olivier Grisel ogrisel at nuxeo.com
Tue Oct 20 14:57:12 CEST 2009


Dear all,

My name is Olivier Grisel and I am an R&D Engineer at Nuxeo,
specialized on semantic related features with some background on
Machine Learning and Semantic Web related techs.

Nuxeo EP is an Open Source ECM (Enterprise Content Management)
platform based on a runtime component system with partial OSGi
compatibility and featuring a default Document Management Seam/JSF web
application (Nuxeo DM) with workspaces, document types, workflows,
versioning, access rights, publication, ... Nuxeo DM already features
Jena based knowledge base (triple store) to link documents together,
with external URIs or with comment threads. We also have an
XHTML-based ajax annotation system that uses on the RDF Annotea
standard as datamodel. Furtheremore Nuxeo uses the Dublincore standard
as a datamadol for the base document properties.

As part of the Scribo project [1], we are working on integrating
semantic knowledge extractors to semi-automatically enrich the
knowledge base with named entities and semantic relationship found in
unstructured text content using UIMA components. We plan to integrate
a CRFs-based Named Entities extractor trained on multilingual corpora
such as wikipedia. CRFs are a machine learning algorithms to perform
Natural Language Processing of token sequences.

We are also working on a Digital Asset Management (Multimedia
collections management system) application and want to use make it
extract semantic metadata as automatically as possible to make it
trivial to browse the collection in smart ways. To achieve this goal I
started a python prototype / proof of concept  to implement similarity
based search for pictures [3] based on a semantic hashing algorithm
[5] that takes GIST image descriptors in 960 float dimensions as input
[4] and give 64 bit binary code as output to enable fast database
lookups on very large image collections.

The same kind on semantic hashing algorithms should also work on
textual content [6] described with sparse TF-IDF vectors. A
preliminary backlog a semantic related feature for the Nuxeo platform
is to be found here in our Jira instance [7].

[1] http://www.nuxeo.com/en
[2] http://www.scribo.ws/
[3] http://wiki.iks-project.eu/index.php/User-stories#Story_03:_Similarity-based_Image_Search
[4] http://code.oliviergrisel.name/pyleargist/src/tip/README.txt
[5] http://code.oliviergrisel.name/libsgd/src/9f3f374becc8/examples/semantic_hashing.py
[6] http://wiki.iks-project.eu/index.php/User-stories#Story_09:_Similarity_based_document_search
[7] http://jira.nuxeo.org/secure/IssueNavigator.jspa?reset=true&pid=10273&status=1

Looking forward to meeting you all in Roma,

-- 
Olivier - http://twitter.com/ogrisel


More information about the iks-community mailing list