Webternity


Ortelio is working with the University of Passau (Digital Libraries and Web Information Systems lab) to add new semantic features in the Webternity crawler. We expect to release the new version of the crawler by the end of August 2016.

The Webternity crawler is based on a state of the art blog crawler algorithm that is capable of identifying and retrieving the structural elements of blogs: title, post, date, author, thematic description, and subject.

The crawler retrieves semi-structured information, as opposed to current alternatives that retrieve unstructured data.