[iks-community] Semantic search engine kickoff

Bertrand Delacretaz bertrand.delacretaz at day.com
Fri Jul 17 18:28:00 CEST 2009


Hi Stéphane,

Sorry about taking so long to reply, been traveling and I'll be on
vacation the next two weeks...slow start then.

I've had a look at what you suggest and it seems like all the pieces
would be there to setup a semantic search engine + RDFa crawler for
IKS.

I didn't find a download link at http://www.swse.org, is the software
available for installation on IKS servers? And if yes, any special
requirements to run the stack that you suggest?

I'll get back to this after my holidays, and start playing with the
suggested components to see how this all fits together.

Thanks!
-Bertrand



On Fri, Jul 3, 2009 at 1:25 PM, Stephane
Corlosquet<stephane.corlosquet at deri.org> wrote:
> ...Below is the architecture that DERI would like to suggest for the IKS
> Semantic Search Engine. The figure [1] contains a set of CMS sites complying
> to the best practises of RDF data publishing, which include RDFa, a local
> schema export (site vocabulary), a SPARQL endpoint. We have worked on a set
> of modules for Drupal detailed in a technical report at [2], but their
> features could be generalized to other CMSs. The sites can request to be
> included in the IKS search engine via a form on the IKS search engine site
> or programmatically via a ping. Pings are also used in the case where a
> specific resource/page has been updated on a given site in order for the
> search engine to schedule a recrawl of the resource as soon as possible.
>
> The semantic search engine stack is composed of several layers of data
> gathering, parsing, validation and indexing. The search engine first gathers
> the data by crawling the sites, it then parses the RDF data with the any23
> parser [3], a java library that extracts structured data in RDF format from
> a variety of Web documents (supports microformats, RDFa and other common RDF
> serialization formats). If needed, the NxParser [4] cleans up the data and
> formats it in n-quads [5]. Before a site can be included in the IKS search
> engine, it first goes through the RDFAlerts validator, which ensures the RDF
> data contained in the sites complies with the RDF publishing best practices.
> RDFAlerts also does some RDF consistency checking. Additionally, other IKS
> specific policies regarding the sites included in the search engine could be
> added here. Finally, the SWSE engine [6] takes care of the indexing and
> storage of the data. Powered by YARS2, it provides distributed storage and
> retrieval facilities. Indexing structures are optimized for retrieval of RDF
> statements including context (quads) while minimizing the need for joins,
> plus Lucene fulltext indexing for efficient keyword searches. SWSE's SPARQL
> endpoint allows to plugin any RDF visualization tool, e.g. VisiNav [7] for
> example. See the screencast at [8] (1'36) for the possibilities offered by
> VisiNav.
>
>
> [1] http://srvgal65.deri.ie/files/iks_search_engine_cloud.pdf
> [2] http://www.deri.ie/fileadmin/documents/DERI-TR-2009-04-30.pdf
> [3] http://code.google.com/p/any23/
> [4] http://sw.deri.org/2006/08/nxparser/
> [5] http://sw.deri.org/2008/07/n-quads/
> [6] http://www.swse.org/
> [7] http://visinav.deri.org/
> [8] http://www.youtube.com/watch?v=r4WgTRIRoa0
>
> Bertrand Delacretaz wrote:
>>
>> Hi,
>>
>> Time has flown and I haven't kicked off the semantic search engine
>> disussions yet, following up on our discussions at the Salzburg IKS
>> meeting.
>>
>> I'll be mostly offline next week, but I wanted to at least start the
>> discussion here, so that we can go forward.
>>
>> The idea is to start from the
>> http://www.interactive-knowledge.org/content/iks-search-engine-proposal,
>> and prototype something that we can play with quickly.
>>
>> The first use case that I'd like us to implement is like:
>>
>> 0. Select a website that contains interesting data in microformats and/or
>> RDFa
>> 1. Add the homepage URL to the search engine crawler config
>> 2. Search engine crawls website, indexes full text and structured data
>> extracted from microformats and/or RDFa
>> 3. Simple UI allows for searching that data, both full-text and structured
>> 4. Structured data should be exportable in standard formats for
>> further processing with semantic tools
>>
>> If anyone knows of existing software that would allow us to set this
>> up with no or minimal programming work that would be cool (I don't). I
>> assume we can host that on IKS servers, though details of that have to
>> be finalized.
>>
>> If there's no existing software that does that, lets see what are the
>> minimal steps that allow us to implement this, just as a first
>> prototype that can be used as a basis for creating the next one. I'd
>> lean towards Lucene, Solr or Jackrabbit as those are the things that I
>> know best in this area, but this is all open.
>>
>> Comments are welcome, of course!
>>
>> -Bertrand (mostly offline until next Thursday July 2nd)
>> _______________________________________________
>> iks-community mailing list
>> iks-community at iks-project.eu
>> http://lists.iks-project.eu/cgi-bin/mailman/listinfo/iks-community
>>
>



-- 
-- Bertrand Delacretaz
-- Senior Developer, R&D
-- Day Software, www.day.com


More information about the iks-community mailing list