usb
Universidad Simón Bolívar

eswc


Motivation
Target
Audience
Technnical Requirements
Content Schedule Speakers


 


  

TUTORIAL DESCRIPTION                                           OUTLINE OF THE TUTORIAL

Motivation

In the context of the Cloud of Linked Data, a large number of huge RDF linked datasets have become available, and this number keeps growing. Simultaneously, scalable and efficient RDF engines that follow the traditional optimize-then-execute paradigm have been developed to locally access RDF data, and SPARQL endpoints have been implemented for remote query processing. However, given the size of existing datasets, lack of statistics to describe available sources, and unpredictable conditions of remote queries, existing solutions are still insufficient.
First, the most efficient RDF engines rely their query processing algorithms on physical access and storage structures that are locally stored; however, because of the size of existing linked datasets, loading the data and their links is not always feasible. Second, remote linked data query processing can be extremely costly because of the lack of query planning; also, current techniques are not adaptable to unpredictable data transfers or data availability, thus, executions can be unsuccessful. To overcome these limitations, query physical operators and execution engines need to be able to access remote data and adapt query execution schedulers to data availability.
In this tutorial we present the basis of adaptive query processing frameworks defined in the database area, and their applicability in the Linked data context. This tutorial targets any conference attendee who wants to know limitations of existing RDF engines, adaptive query processing techniques and how traditional RDF data management approaches can be extended to remotely access linked data and be well-suitable to runtime conditions.

Target Audience

Researchers and practitioners that develop or use query engines to consume Linked data.

Technical Requirements

We expect participants to have just a basic understanding of RDF and SPARQL.

Content

The tutorial covers traditional data management solutions that implement the optimize-then-execute paradigm, and their pros and cons for Linked data query processing; novel storage and access data structures, and query optimization and execution techniques implemented by state-of-the-art RDF engines  are described. Then, adaptive frameworks defined in the database area to manage remote query processing, are analyzed; adaptive operators such as symmetric hash joins (binary and n-ary), routing operators, and adaptive engines are studied. Finally, the applicability of adapting techniques are illustrated with an adaptive query processing engine for SPARQL endpoints; we show the implemented physical operators and the query scheduler as well as their performance.


      Schedule

Introduction (20 minutes):
• Traditional data management system architecture and its main components.
• Basic terminology.

Lecture 1-The Optimize-then-Execute Paradigm (50 minutes):
• Cost-based optimization techniques.
• Traditional iterator model architecture.
• Centralized data management physical operators.
• Centralized data management query engines.

Lecture 2-Existing RDF Engines (50 minutes):
• Query optimization and execution techniques in existing RDF engines like RDF-3X and Jena.
• SPARQL endpoints and their execution model.
• Current linked data query processing approaches.

Coffee-Break (15 minutes)

Lecture 3-Adaptive Query Processing (50 minutes):

• Adaptive physical operators: symmetric hash joins and n-ary joins.
• Adaptive query processing schedulers, routing policies.
• Adaptive query engines.

Lecture 4: An Adaptive Paradigm for Linked Data (50 minutes):
• Requirements of physical operators for Linked data query processing.
• An adaptive query engine for Linked data.

      Speakers

edna
Edna Ruckhaus is a Full Professor of the Computer Science department at the Universidad Simón Bolívar, Caracas, Venezuela where she has taught several Database courses at undergraduate level. Prof. Ruckhaus has  participated in several international projects supported by AECI (Spain).Prof. Ruckhaus has over 20 publications in international and national conferences and journals. She has been reviewer and has participated in the Program Committee of several International Conferences.

mev

Maria-Esther Vidal is a Full Professor of the Computer Science department at the Universidad
Universidad Simón Bolívar, Caracas, Venezuela, where she has taught several Database and Semantic Web courses at undergraduate and graduate level. Prof. Vidal has  participated in several international projects supported by NFS (USA), AECI (Spain) and CNRS (France). She has advised five PhD students and more than 45 master and undergraduate students. Professor Vidal has published more than 50 papers in International Conferences and Journals of the Database and The Semantic Web areas. She has been reviewer and has participated in the Program Committee of several International Journals and Conferences.