All-pairs: An abstraction for data-intensive cloud computing
- 1 April 2008
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2008 IEEE International Symposium on Parallel and Distributed Processing
Abstract
Although modern parallel and distributed computing systems provide easy access to large amounts of computing power, it is not always easy for non-expert users to harness these large systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we propose that production systems should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data intensive workloads. We present one example of an abstraction - all-pairs - that fits the needs of several data-intensive scientific applications. We demonstrate that an optimized all-pairs abstraction is both easier to use than the underlying system, and achieves performance orders of magnitude better than the obvious but naive approach, and twice as fast as a hand-optimized conventional approach.Keywords
This publication has 11 references indexed in Scilit:
- Scaling up all pairs similarity searchPublished by Association for Computing Machinery (ACM) ,2007
- A notation and system for expressing and executing cleanly typed workflows on messy scientific dataACM SIGMOD Record, 2005
- Overview of the Face Recognition Grand ChallengePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed SystemsScientific Programming, 2005
- Chimera: a virtual data system for representing, querying, and automating data derivationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Condor and the GridPublished by Wiley ,2003
- Persistent distributed data structures to simplify cluster-based Internet servicesACM SIGOPS Operating Systems Review, 2000
- High-performance sorting on networks of workstationsPublished by Association for Computing Machinery (ACM) ,1997
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Linda and FriendsComputer, 1986