HeisenData -- Towards a Next-Generation Uncertain Data Management System


Project Funding: FP7 Marie-Curie International Reintegration Grants
                             (FP7-PEOPLE-2009-RG, Reference No. 249217)
Project Duration: 1/3/2010 - 28/2/2014
Fellow & Scientist-in-Charge: Prof. Minos Garofalakis


Description

Several real-world applications need to manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings, e.g., for motion prediction and human behavior modeling; information-extraction tools can assign different possible labels with varying degrees of confidence to segments of text, due to the uncertainties and noise present in free-text data. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex correlation patterns present in real-life data. Unfortunately, to date, approaches to Probabilistic Database Systems (PDBSs) have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilities are typically associated with individual data tuples, with little or no support for capturing data correlations. This research proposal aims to design and build a novel, extensible PDBS that supports a broad class of statistical models and probabilistic-reasoning tools as first-class system objects, alongside a traditional relational-table store. Our proposed architecture will employ statistical models to effectively encode data-correlation patterns, and promote probabilistic inference as part of the standard database operator repertoire to support efficient and sound query processing. This tight coupling of relational databases and statistical models represents a major departure from conventional database systems, and many of the core system components need to be revisited and fundamentally re-thought. The proposed research will attack several of the key challenges arising in this novel PDBS paradigm (including, query processing, query optimization, data summarization, extensibility, and model learning and evolution), build usable prototypes, and investigate key application domains (e.g., information extraction).



People:           Minos Garofalakis   |   Katerina Ioannou   |   Vagelis Vazaios



Relevant Publications

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. Other restrictions to copying individual documents may apply.
  1. Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis, Joseph M. Hellerstein, and Michael L. Wick. "Hybrid In-Database Inference for Declarative Information Extraction", Proceedings of ACM SIGMOD'2011 Athens, Greece, June 2011.

  2. Vibhor Rastogi, Nilesh Dalvi, and Minos Garofalakis. "Large-Scale Collective Entity Matching", Proceedings of VLDB'2011 (PVLDB, Vol. 4, No. 4), Seattle, Washington, August 2011.

  3. Graham Cormode and Minos Garofalakis. "Histograms and Wavelets on Probabilistic Data", IEEE Transactions on Knowledge and Data Engineering, 2010 ("Best of ICDE'2009" Special Issue).

  4. Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis, and Joseph M. Hellerstein. "Querying Probabilistic Information Extraction", Proceedings of VLDB'2010 (PVLDB, Vol. 3), Singapore, September 2010.   [Daisy's talk slides (pdf)]

  5. Daisy Zhe Wang, Eirinaios Michelakis, Michael J. Franklin, Minos Garofalakis, and Joseph M. Hellerstein. "Probabilistic Declarative Information Extraction", Proceedings of IEEE ICDE'2010, Long Beach, California, USA, March 2010.   (short paper)   [Daisy's talk slides (pdf)]

  6. Graham Cormode, Antonios Deligiannakis, Minos Garofalakis, and Andrew McGregor. "Probabilistic Histograms for Probabilistic Data", Proceedings of VLDB'2009 (PVLDB, Vol. 2), Lyon, France, August 2009.   [Antonis' talk slides (pdf)]

  7. Graham Cormode and Minos Garofalakis. "Histograms and Wavelets on Probabilistic Data", Proceedings of IEEE ICDE'2009, Shanghai, China, March 2009.   [Preliminary arxiv/CoRR Tech. Report version: arXiv:0806.1071v1 [cs.DB]]   [Graham's talk slides (pdf)]
    [** ICDE'2009 Best Paper Award **] [ award photo ]

  8. Daisy Zhe Wang, Eirinaios Michelakis, Minos Garofalakis, and Joseph M. Hellerstein. "BAYESSTORE: Managing Large, Uncertain Data Repositories with Probabilistic Graphical Models", Proceedings of VLDB'2008 (PVLDB, Vol. 1), Auckland, New Zealand, August 2008. [Daisy's talk slides (pdf)]




Workin'
Page under perpetual construction...