Get the conference poster!
PDF (2.2 MB)

Get the Call for papers.
PDF (0.5 MB)

user warning: UPDATE command denied to user 'icde2011_wp'@'server2.l3s.uni-hannover.de' for table 'dp_cache_filter' query: UPDATE dp_cache_filter SET data = '<p>Following Seminars will held during ICDE 2011:</p>\n<ul>\n<li>\n <a href=\"#seminar1\">Non-Metric Similarity Search Problems in Very Large Collections</a></li>\n<li>\n <a href=\"#seminar2\">Next Generation Data Integration for the Life Sciences</a></li>\n<li>\n <a href=\"#seminar3\">Modern B-tree Techniques</a></li>\n<li>\n <a href=\"#seminar4\">Keyword-based Search and Exploration on Databases</a></li>\n<li>\n <a href=\"#seminar5\">Query Optimizer Plan Diagrams: Production, Reduction and Applications.</a></li>\n<li>\n <a href=\"#seminar6\">Schemas for Safe and Efficient XML Processing</a></li>\n</ul>\n<p> &nbsp;</p>\n<p> <strong><a name=\"seminar1\"></a>Seminar 1</strong></p>\n<p> April 12, 2011, Tuesday: 11:00 &ndash; 12:30</p>\n<p> Location: Room 27 &amp; 28</p>\n<p> <strong>Non-Metric Similarity Search Problems in Very Large Collections</strong></p>\n<p> Benjamin Bustos (Univ. of Chile), Tom&aacute;&scaron; Skopal (Charles Univ. in Prague, Czech Republic)</p>\n<p> Similarity search is a fundamental problem in many disciplines like multimedia databases, data mining, bioinformatics, computer vision, and pattern recognition, among others. The standard approach for implementing similarity search is to define a dissimilarity measure that satisfies the properties of a metric (strict positiveness, symmetry, and the triangle inequality), and then use it to query for similar objects in large data collections. The advantage of this approach is that there are many index structures (so-called metric access methods) that can be used to efficiently perform the queries. However, a recent survey has shown that similarity measures not holding the metric properties have been widely used for contentbased retrieval, because these (usually) complex similarity measures are more effective, i.e., they return better results.</p>\n<p> The goal of this tutorial is to provide an interdisciplinary overview of non-metric similarity measures and their applications, focusing on their usage with very large data collections. We start the tutorial presenting the basics of similarity search and the motivation for using non-metric similarity measures. Next, we present many general non-metric measures that can be used in a wide variety of application domains, and several different research areas that share the necessity of efficient similarity search algorithms in non-metric spaces. Then, the efficiency issue will be addressed, describing the current state of the art in non-metric indexing, both for general and specific non-metric measures. Finally, we end the tutorial with a summary of the current techniques for searching with non-metric measures in large data collections, remarking the current challenges for the database community on this topic.</p>\n<p> &nbsp;</p>\n<p> <strong><a name=\"seminar2\"></a>Seminar 2</strong></p>\n<p> April 12, 2011, Tuesday: 14:00 &ndash; 18:00</p>\n<p> Location: Room 27 &amp; 28</p>\n<p> <strong>Next Generation Data Integration for the Life Sciences</strong></p>\n<p> Sarah Cohen-Boulakia (Universit&eacute; Paris-Sud 11, France), Ulf Leser (Humboldt-Universit&auml;t zu Berlin, Germany)</p>\n<p> Ever since the advent of high-throughput biology (e.g., the Human Genome Project), integrating the large number of diverse biological data sets has been considered as one of the most important prerequisites for advancement in the Life Science. Whereas the early days of research in this area were dominated by virtual integration systems (multi-/federated databases), the current predominantly used architecture is based on materialization. Results from research in data integration achieved in the database community have been picked up only very reluctantly in the domain, and despite a decade of development on algorithms for query planning, schema matching, resource discovery etc., systems (still) are built using ad-hoc techniques and a large amount of scripting. However, recent years have seen a shift in the understanding of what a data integration systemactually should do, revitalizing research in this direction.</p>\n<p> In this tutorial, we review the past and current state of data integration for the Life Science, discuss potential reasons for the slow uptake of research results, and present recent trends in detail: Usage of scientific workflow systems for integration, adoption of Semantic Web techniques, and the problem of ranking integrated search results.</p>\n<p> &nbsp;</p>\n<p> <strong><a name=\"seminar3\"></a>Seminar 3</strong></p>\n<p> April 13, Wednesday:&nbsp; 14:30 &ndash; 16:00</p>\n<p> Location: Room 27 &amp; 28</p>\n<p> <strong>Modern B-tree Techniques</strong></p>\n<p> Goetz Graefe (Hewlett-Packard Laboratories, USA), Harumi Kuno (Hewlett-Packard Laboratories, USA)</p>\n<p> Less than 10 years after Bayer and McCreight introduced B-trees in 1970, and now more than a quarter century ago, Comer in 1979 called B-tree indexes ubiquitous. Gray and Reuter asserted in 1993 that B-trees are by far the most important access path structure in database and file systems.B-trees in various forms and variants are used in databases, keyvalue stores, information retrieval, and file systems. It could be said that the world&rsquo;s information is at our fingertips because of B-trees.</p>\n<p> Many students, researchers, and professionals know the basic facts about B-tree indexes. Basic knowledge includes their organization in nodes including one root and many leaves, the uniform distance between root and leaves, their logarithmic height and logarithmic search effort, and their efficiency during insertions and deletions. This tutorial briefly reviews the basics but assumes that the audience is interested in more detailed information about modern B-tree techniques.</p>\n<p> Not all relevant topics can be covered in a short time. The selection of topics is focused on current opportunities for B-tree indexes in novel contexts. Thus, the focus is on duplicate key values (including bitmaps, column storage, and compression), updates (including alternative B-tree structures, load utilities, and update execution plans), and the effects of novel hardware (including very large memory, flash storage, and memory hierarchies).</p>\n<p> &nbsp;</p>\n<p> <strong><a name=\"seminar4\"></a>Seminar 4</strong></p>\n<p> April 14, 2011, Thursday: 11:00 &ndash; 12:30</p>\n<p> Location: Room 27 &amp; 28</p>\n<p> <strong>Query Optimizer Plan Diagrams: Production, Reduction and Applications.</strong></p>\n<p> Jayant R. Haritsa (Indian Institute of Science, India)</p>\n<p> The automated optimization of declarative SQL queries is a classical problem that has been diligently addressed by the database community over several decades. However, due to its inherent complexities and challenges, the topic has largely remained a &quot;black art&quot;, and the quality of the query optimizer continues to be a key differentiator between competing database products, with large technical teams involved in their design and implementation.</p>\n<p> Over the past few years, a fresh perspective on the behavior of modern query optimizers has arisen through the introduction and development of the &quot;plan diagram&quot; concept. A plan diagram is a visual representation of the plan choices made by the optimizer over a space of input parameters, such as relational selectivities.&nbsp; In this tutorial, we provide a detailed walk-through of plan diagrams, their processing, and their applications.</p>\n<p> We begin by showcasing a variety of plan diagrams that provide intriguing insights into current query optimizer implementations, often appear similaring to &quot;cubist paintings&quot;, with a large number of plans covering the parameter space and possessing optimality regions characterized by highly intricate patterns and irregular boundaries.&nbsp; A suite of techniques for efficiently producing plan diagrams are then outlined. Subsequently, we present a suite of post-processing algorithms that take optimizer plan diagrams as input, and output new diagrams with demonstrably superior query processing characteristics, such as robustness to estimation errors.&nbsp; Following up, we explain how these offline characteristics can be internalized in the query optimizer, resulting in an intrinsically improved optimizer that directly produces high-quality plan diagrams.&nbsp; Finally, we enumerate a variety of open technical problems, and promising future research directions.</p>\n<p> All the plan diagrams in the tutorial are sourced from popular industrial-strength query optimizers operating on benchmark decision-support environments, and will be graphically displayed on the Picasso visualization platform.</p>\n<p> &nbsp;</p>\n<p> <strong><a name=\"seminar5\"></a>Seminar 5</strong></p>\n<p> April 14, 2011, Thursday: 14:00 &ndash; 15:30</p>\n<p> Location: Room 27 &amp; 28</p>\n<p> <strong>Schemas for Safe and Efficient XML Processing</strong></p>\n<p> Dario Colazzo (Universite Paris Sud &ndash; INRIA, France), Giorgio Ghelli (Universita di Pisa, Italy), Carlo Sartiani (Universita della Basilicata, Italy)</p>\n<p> Schemas have always played a crucial role in database management. For traditional relational and object databases, schemas have a relatively simple structure, and this eases their use for optimizing and typechecking queries. In the context of XML databases, things change. Several different schema languages have been defined, tailored for different application classes. Moreover, XML schema languages are inherently more complex, as they host mechanisms for describing highly irregular and flexible structures. In this tutorial we will describe the theoretical models behind these languages, their formal properties, and will also present the complexity of the basic decision problems. We will explore some theoretical and practical applications of schemas for query processing; finally, we will discuss how decision problems can be efficiently solved, at the price of some restrictions on the expressible types.</p>\n<p> &nbsp;</p>\n<p> <strong><a name=\"seminar6\"></a>Seminar 6</strong></p>\n<p> April 15, 2011, Friday: 9:00 &ndash; 10:30, 11:00 &ndash; 12:30</p>\n<p> Location: Room 27 &amp; 28</p>\n<p> <strong>Keyword-based Search and Exploration on Databases</strong></p>\n<p> Yi Chen (Arizona State Univ., USA), Wei Wang (Univ. of New South Wales, Australia), Ziyang Liu (Arizona State Univ., USA)</p>\n<p> Empowering users to access databases using simple keywords can relieve users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast-evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword-based search and exploration on databases. Several topics will be discussed, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, result comparison, query cleaning and suggestion, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. Finally we identify the challenges and opportunities for future research to advance the field.</p>\n', created = 1433151996, expire = 1433238396, headers = '', serialized = 0 WHERE cid = '1:f1f2e759bd542baf4116f59ab6e70b85' in /var/www/other/ICDE2011/includes/cache.inc on line 109.

Seminars

Following Seminars will held during ICDE 2011:

 

Seminar 1

April 12, 2011, Tuesday: 11:00 – 12:30

Location: Room 27 & 28

Non-Metric Similarity Search Problems in Very Large Collections

Benjamin Bustos (Univ. of Chile), Tomáš Skopal (Charles Univ. in Prague, Czech Republic)

Similarity search is a fundamental problem in many disciplines like multimedia databases, data mining, bioinformatics, computer vision, and pattern recognition, among others. The standard approach for implementing similarity search is to define a dissimilarity measure that satisfies the properties of a metric (strict positiveness, symmetry, and the triangle inequality), and then use it to query for similar objects in large data collections. The advantage of this approach is that there are many index structures (so-called metric access methods) that can be used to efficiently perform the queries. However, a recent survey has shown that similarity measures not holding the metric properties have been widely used for contentbased retrieval, because these (usually) complex similarity measures are more effective, i.e., they return better results.

The goal of this tutorial is to provide an interdisciplinary overview of non-metric similarity measures and their applications, focusing on their usage with very large data collections. We start the tutorial presenting the basics of similarity search and the motivation for using non-metric similarity measures. Next, we present many general non-metric measures that can be used in a wide variety of application domains, and several different research areas that share the necessity of efficient similarity search algorithms in non-metric spaces. Then, the efficiency issue will be addressed, describing the current state of the art in non-metric indexing, both for general and specific non-metric measures. Finally, we end the tutorial with a summary of the current techniques for searching with non-metric measures in large data collections, remarking the current challenges for the database community on this topic.

 

Seminar 2

April 12, 2011, Tuesday: 14:00 – 18:00

Location: Room 27 & 28

Next Generation Data Integration for the Life Sciences

Sarah Cohen-Boulakia (Université Paris-Sud 11, France), Ulf Leser (Humboldt-Universität zu Berlin, Germany)

Ever since the advent of high-throughput biology (e.g., the Human Genome Project), integrating the large number of diverse biological data sets has been considered as one of the most important prerequisites for advancement in the Life Science. Whereas the early days of research in this area were dominated by virtual integration systems (multi-/federated databases), the current predominantly used architecture is based on materialization. Results from research in data integration achieved in the database community have been picked up only very reluctantly in the domain, and despite a decade of development on algorithms for query planning, schema matching, resource discovery etc., systems (still) are built using ad-hoc techniques and a large amount of scripting. However, recent years have seen a shift in the understanding of what a data integration systemactually should do, revitalizing research in this direction.

In this tutorial, we review the past and current state of data integration for the Life Science, discuss potential reasons for the slow uptake of research results, and present recent trends in detail: Usage of scientific workflow systems for integration, adoption of Semantic Web techniques, and the problem of ranking integrated search results.

 

Seminar 3

April 13, Wednesday:  14:30 – 16:00

Location: Room 27 & 28

Modern B-tree Techniques

Goetz Graefe (Hewlett-Packard Laboratories, USA), Harumi Kuno (Hewlett-Packard Laboratories, USA)

Less than 10 years after Bayer and McCreight introduced B-trees in 1970, and now more than a quarter century ago, Comer in 1979 called B-tree indexes ubiquitous. Gray and Reuter asserted in 1993 that B-trees are by far the most important access path structure in database and file systems.B-trees in various forms and variants are used in databases, keyvalue stores, information retrieval, and file systems. It could be said that the world’s information is at our fingertips because of B-trees.

Many students, researchers, and professionals know the basic facts about B-tree indexes. Basic knowledge includes their organization in nodes including one root and many leaves, the uniform distance between root and leaves, their logarithmic height and logarithmic search effort, and their efficiency during insertions and deletions. This tutorial briefly reviews the basics but assumes that the audience is interested in more detailed information about modern B-tree techniques.

Not all relevant topics can be covered in a short time. The selection of topics is focused on current opportunities for B-tree indexes in novel contexts. Thus, the focus is on duplicate key values (including bitmaps, column storage, and compression), updates (including alternative B-tree structures, load utilities, and update execution plans), and the effects of novel hardware (including very large memory, flash storage, and memory hierarchies).

 

Seminar 4

April 14, 2011, Thursday: 11:00 – 12:30

Location: Room 27 & 28

Query Optimizer Plan Diagrams: Production, Reduction and Applications.

Jayant R. Haritsa (Indian Institute of Science, India)

The automated optimization of declarative SQL queries is a classical problem that has been diligently addressed by the database community over several decades. However, due to its inherent complexities and challenges, the topic has largely remained a "black art", and the quality of the query optimizer continues to be a key differentiator between competing database products, with large technical teams involved in their design and implementation.

Over the past few years, a fresh perspective on the behavior of modern query optimizers has arisen through the introduction and development of the "plan diagram" concept. A plan diagram is a visual representation of the plan choices made by the optimizer over a space of input parameters, such as relational selectivities.  In this tutorial, we provide a detailed walk-through of plan diagrams, their processing, and their applications.

We begin by showcasing a variety of plan diagrams that provide intriguing insights into current query optimizer implementations, often appear similaring to "cubist paintings", with a large number of plans covering the parameter space and possessing optimality regions characterized by highly intricate patterns and irregular boundaries.  A suite of techniques for efficiently producing plan diagrams are then outlined. Subsequently, we present a suite of post-processing algorithms that take optimizer plan diagrams as input, and output new diagrams with demonstrably superior query processing characteristics, such as robustness to estimation errors.  Following up, we explain how these offline characteristics can be internalized in the query optimizer, resulting in an intrinsically improved optimizer that directly produces high-quality plan diagrams.  Finally, we enumerate a variety of open technical problems, and promising future research directions.

All the plan diagrams in the tutorial are sourced from popular industrial-strength query optimizers operating on benchmark decision-support environments, and will be graphically displayed on the Picasso visualization platform.

 

Seminar 5

April 14, 2011, Thursday: 14:00 – 15:30

Location: Room 27 & 28

Schemas for Safe and Efficient XML Processing

Dario Colazzo (Universite Paris Sud – INRIA, France), Giorgio Ghelli (Universita di Pisa, Italy), Carlo Sartiani (Universita della Basilicata, Italy)

Schemas have always played a crucial role in database management. For traditional relational and object databases, schemas have a relatively simple structure, and this eases their use for optimizing and typechecking queries. In the context of XML databases, things change. Several different schema languages have been defined, tailored for different application classes. Moreover, XML schema languages are inherently more complex, as they host mechanisms for describing highly irregular and flexible structures. In this tutorial we will describe the theoretical models behind these languages, their formal properties, and will also present the complexity of the basic decision problems. We will explore some theoretical and practical applications of schemas for query processing; finally, we will discuss how decision problems can be efficiently solved, at the price of some restrictions on the expressible types.

 

Seminar 6

April 15, 2011, Friday: 9:00 – 10:30, 11:00 – 12:30

Location: Room 27 & 28

Keyword-based Search and Exploration on Databases

Yi Chen (Arizona State Univ., USA), Wei Wang (Univ. of New South Wales, Australia), Ziyang Liu (Arizona State Univ., USA)

Empowering users to access databases using simple keywords can relieve users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast-evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword-based search and exploration on databases. Several topics will be discussed, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, result comparison, query cleaning and suggestion, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. Finally we identify the challenges and opportunities for future research to advance the field.