Yingyi Bu
Email: buyingyi@gmail.com
I'm a senior software engineer in Couchbase. Prior to that, I did my Ph.D. in Computer Science at University of California, Irvine, advised by Prof. Michael J. Carey.
News:
- 2016/11/05 We just released Couchbase Analytics, which is built on top of Apahce AsterixDB!
- 2016/04/10 AsterixDB has graduated and become an Apache top level project.
- 2016/04/05 Our tutorial "Big Graph Analytics Systems" has been accepted to SIGMOD'16!
- 2016/02/05 Our paper about scaling Google Fusion Tables has been accepted to ICDE'16!
- 2015/08/20 I defended my thesis and joined Couchbase.
- 2015/06/22 After 5 years's R&D of Algebricks, our paper about the framework has been accepted to SOCC'15!
- 2014/11/11 Our paper about the Facade compiler and runtime for Big Data applciations has been accepted to ASPLOS'15!
- 2014/08/05 After 3 years's R&D of Pregelix, our paper about the system has been accepted to VLDB'15!
- 2014/08/04 After 5 years's R&D of AsterixDB, our paper about the system has been accepted to VLDB'15!
- 2014/07/06 Our paper about graph connectivity analysis on Pregel has been accepted to VLDB'15!
- 2014/06/23 Start a returned internship in Google Research, working with Chris Olston and Peter Hawkins!
- 2014/05/09 New Pregelix website is online now! Checkout our juicy perf. numbers!
- 2013/06/11 I'm awarded the 2013-2015 Google Fellowship in Structured Data!
- Archived news
Research Interests
My primary area of research interest is in building and evaluating Big Data management systems.
Current projects:
- AsterixDB. We are working towards an open source data-intensive computing platform, with new technologies for ingesting, storing, managing, indexing, querying, analyzing, and subscribing intensive semi-structured data.
Past projects:
- Pregelix. Pregelix is an open-source implementation of Google's Pregel programming model. We architect the Pregel programming model on top of a general-purpose data-parallel execution engine, which leads to better scaling properties, out-of-core support, physical flexibility, and software simplicity.
- HaLoop. In HaLoop, we designed and implemented a modified version of the Hadoop MapReduce framework for efficiently support data-intensive iterative data analysis.
Publications (dlbp entry) (google scholar)
In Proceedings of the 32nd IEEE International Conference on Data Engineering (ICDE
2016) (Industrial Track)
Helsinki, Finland, May 16 - May 20, 2016.
In Proceedings of the 2015 ACM SIGMOD/SIGOPS Symposium on Cloud Computing (SOCC
2015)
Kohala, Hawaii, August 27 - August 29, 2015.
In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
2015)
Istanbul, Turkey, March 2015.
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine [PDF][PPT][Open Source System][Tech Report]
(In the News(1), In the News(2))
In Proceedings of the Very Large Database Endowment, Volume 8 (VLDB
2015)
Kohala, Hawaii, August 31 - September 5, 2015.
AsterixDB: A Scalable, Open Source BDMS [PDF][PPT][Open Source System][Tech Report]
(Start Apache incubation in March 2015) (In the News) (Press Release)
(Start Apache incubation in March 2015) (In the News) (Press Release)
In Proceedings of the Very Large Database Endowment, Volume 7 (VLDB
2015)
Kohala, Hawaii, August 31 - September 5, 2015.
Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees [PDF][PPT][Implementation]
In Proceedings of the Very Large Database Endowment, Volume 7 (VLDB
2015)
Kohala, Hawaii, August 31 - September 5, 2015.
A Bloat-Aware Design for Big Data Applications [PDF][PPT][An independent Chinese translation]
(Open-source systems using our design paradigm: AsterixDB, Hyracks, Pregelix )
(Open-source systems using our design paradigm: AsterixDB, Hyracks, Pregelix )
In Proceedings of the 2013 ACM SIGPLAN International Symposium on Memory Management (ISMM 2013)
Seattle, WA, June 20-21, 2013.
The HaLoop Approach to Large-Scale Iterative Data Analysis [PDF][Implementation]
The VLDB Journal (VLDBJ), Volume 21, Number 2, April 2012.
HaLoop: Efficient Iterative Data Processing on Large Clusters [PDF][PPT][Talk
in Berkeley][Implementation]
(Best of VLDB 2010 )
(Best of VLDB 2010 )
In Proceedings of the Very Large Database Endowment, Volume 3 (VLDB
2010)
Singapore, 11-17 September, 2010. (Acceptance Rate: 33/204 = 16.1%)
In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
2009)
Paris, France, June 28-July 1, 2009. (Acceptance Rate: 105/537 = 19.6%)
In Proceedings of the Very Large Database Endowment, Volume 1 (VLDB 2008)
Auckland, New Zealand on 24-30 Aug, 2008. (Acceptance Rate: 46/273 = 16.8%)
WAT: Finding Top-K Discords in Time Series Database
[PDF][Source
Code]
In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM
2007)
Minneapolis, MN, USA, April 26-28, 2007. (Acceptance Rate: 25%)
System Demos and Posters
Pregelix: Dataflow-Based Big Graph Analytics
[PDF][Open Source System]
In Proceedings of the 2013 ACM SIGMOD/SIGOPS Symposium on Cloud Computing (SOCC
2013)
Santa Clara, CA, October 1-3, 2013.
Comparing SSD-placement strategies to scale a Database-in-the-Cloud
[PDF]
In Proceedings of the 2013 ACM SIGMOD/SIGOPS Symposium on Cloud Computing (SOCC
2013)
Santa Clara, CA, October 1-3, 2013.
ASTERIX: An Open Source System for "Big Data" Management and Analysis
[PDF]
In Proceedings of Very Large Data Bases Endowment, Volume 5 (VLDB
2012)
Istanbul, Turkey, August 27-31, 2012.
Honors and Awards
- 2013-2015 Google Fellowship in Structured Data
- 2013-2014 Facebook Fellowship Finalist
- 2010 Yahoo! Key Scientific Challenage Award