Keynote

TITLE The Architectural Reawakening and the Worldwide Flock
SPEAKER Erik Altman, IBM TJ Watson Research Center
Abstract

As has been widely noted, the end of Dennard scaling coupled with the improvements in AI has spurred a flowering of architectural ideas. Alas that flowering has come at the expense of adherence to traditional instruction set architecture. Historically ISA has had a unifying effect on both hardware and software -- enabling effortless cohesion of developers pursuing disparate paths to improve overall capability. This talk examines the results and speculates on likely directions. In particular it hypothesizes that with appropriate network and parallel computing capabilities, we are in position to have a "worldwide flock" of robots and cyberphysical systems that all learn from each other, enabling new capabilities not possible with today's more isolated approaches. Finally, the talk discusses how the economic potential of such a worldwide flock can enable it to serve as a new point of unification.

BIO minyi guo

Dr. Altman has been with IBM for almost 25 years, mostly at the IBM T.J. Watson Research Center. He is currently pursuing multiple interests including unsupervised anomaly detection in high dimensional data, development and use of synthetic data, and understanding impacts of current AI trends on architecture and economics. In earlier research, Dr. Altman has explored compilation issues, and with Dr Kemal Ebcioglu pioneered the DAISY dynamic binary translation providing efficient full system translation from PowerPC to an underlying VLIW architecture. Dr. Altman was also an original architect of the Cell processor in Sony's Playstation-3. He has pushed a number of other broad research pursuits including techniques for exploiting the massive and as-yet-untapped instruction-level-parallelism that resides in most code. He also led the WAIT project, one of IBM's first SaaS (Software-as-a-Service) offerings and the Liquid Metal Project, seeking to extend the Java paradigm of write-once-run-anywhere to include not just CPUs, but GPUs, FPGAs, and other accelerators. Outside of research efforts Dr. Altman acted as corporate technology advisor and director of technology, helping organize monthly readouts to the CEO and other senior IBM executives on key technology issues. He also served as TA and Chief-of-Staff to multiple VPs.

Dr. Altman has also been active in the Community serving as Chair of ACM SIGMICRO, Chair of the ACM SIG Governing Board, and ACM Secretary-Treasurer, and as candidate for ACM President in 2016. He currently serves on the ACM Investment Committee. He also served as Editor-in-Chief of IEEE Micro, and on the Publications Board of the IEEE Computer Society. He has also served as General Chair, Program Chair, or Program Vice-Chair for numerous conferences and workshops including ISCA, PACT, and NPC.

Dr Altman received an SB from MIT and Master's and PhD degrees from McGill University under the supervision of Prof Guang R. Gao.

Invited Talk




TITLE Algebraic Structures and the quest for Performance, Portability, AND Productivity
SPEAKER Tim Mattson,Intel, USA
Abstract

If you want to design programming models that programmers might actually use, you need to understand what programmers need. I like to think of programmer’s needs in terms of the three Ps of programming: Performance, Portability and Productivity. Conventional wisdom suggests you can get two of the three Ps in one programming model, but you can’t get all three.

In this talk, I will say a bit about the history of programming models and hopefully convince you that we can get all three Ps in one system. It’s been done before and I hope we can do it again. The key is to go back to the roots of computer science and think of the programming problem mathematically. We need to think about the fundamental algebras behind our programming models and from them, derive the programming models of the future.

I haven’t done this yet. I have some ideas based on my work on designing APIs for graph algorithms and storage engines for sparse arrays. Think of this talk as a call to action, a search for collaborators to take on the quest of a single programming model grounded in a rigorous algebraic formulation that helps programmers write lots of code (productive), runs on all relevant hardware (portable) and delivers what the hardware is capable of (performant).

BIO minyi guo

Tim Mattson is a parallel programmer obsessed with every variety of science (Ph.D. Chemistry, UCSC, 1985). He is a senior principal engineer in Intel’s parallel computing lab.

Tim has been with Intel since 1993 and has worked with brilliant people on great projects including: (1) the first TFLOP computer (ASCI Red), (2) MPI, OpenMP and OpenCL, (3) two different research processors (Intel's TFLOP chip and the 48 core SCC), (4) Data management systems (Polystore systems and Array-based storage engines), and (5) the GraphBLAS API for expressing graph algorithms as sparse linear algebra.

Tim is passionate about teaching. He's been teaching OpenMP longer than anyone on the planet with OpenMP tutorials at every SC'XY conference but one since 1998. He has published five books on different aspects of parallel computing, the latest (Published November 2019) titled “The OpenMP Common Core: making OpenMP Simple Again”.

TITLE Big Data Systems on Emerging Hardware: Our Ten Years’ Journey and Outlook of the Next Ten Years
SPEAKER Bingsheng He, National University of Singapore, Singapore
Abstract

In Post-Moore's law, heterogeneous architectures have been emerging for big data systems. In many big data systems, high performance is a must, not an option. We are facing the challenges (and also opportunities) at all levels ranging from sophisticated algorithms and procedures to mine the gold from massive data to high-performance computing (HPC) techniques and systems to get the useful data in time. How to fully unleash the power of heterogeneous architectures are a hot research topic to tame the performance challenges of big data applications. Our research has been on the novel design and implementation of in-memory database management systems on emerging hardware (many-core CPUs, GPUs, and FPGAs etc). Interestingly, we have also observed the interplay between emerging hardware and big data systems. In this talk, I will present our research efforts in the past 10 years and outline the research agenda as well as challenges and opportunities faced by the community.

BIO minyi guo

Dr. Bingsheng He is currently an Associate Professor and Vice-Dean (Research) at School of Computing, National University of Singapore. Before that, he was a faculty member in Nanyang Technological University, Singapore (2010-2016), and held a research position in the System Research group of Microsoft Research Asia (2008-2010), where his major research was building high performance cloud computing systems for Microsoft. He got the Bachelor degree in Shanghai Jiao Tong University (1999-2003), and the Ph.D. degree in Hong Kong University of Science & Technology (2003-2008). His current research interests include cloud computing, database systems and high performance computing. His papers are published in prestigious international journals (such as ACM TODS and IEEE TKDE/TPDS/TC) and proceedings (such as ACM SIGMOD, VLDB/PVLDB, ACM/IEEE SuperComputing, ACM HPDC, and ACM SoCC). He has been awarded with the IBM Ph.D. fellowship (2007-2008) and with NVIDIA Academic Partnership (2010-2011). Since 2010, he has (co-)chaired a number of international conferences and workshops, including IEEE CloudCom 2014/2015, BigData Congress 2018 and ICDCS 2020. He has served in editor board of international journals, including IEEE Transactions on Cloud Computing (IEEE TCC), IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), IEEE Transactions on Knowledge and Data Engineering (TKDE), Springer Journal of Distributed and Parallel Databases (DAPD) and ACM Computing Surveys (CSUR). He has got editorial excellence awards for his service in IEEE TCC and IEEE TPDS in 2019.

TITLE HPC Application Performance Optimization on Kunpeng System
SPEAKER Long Wang, Huawei, China
Abstract

High-performance computing on ARM architecture is one of the hot topics currently discussed in the supercomputing community. In this talk, we will introduce the performance optimization practice and analysis on Huawei Kunpeng-920 chips by CSO Lab:

1) Performance portability framework (based on Dace of ETH). We provide a demonstration for Kunpeng platform on stencil computing HPC application;

2) Performance optimization analysis of some HPC open source application on Kunpeng 920;

3) We will introduce preliminary work on HPC+AI computing performance portable framework and its application.

BIO minyi guo

Wang Long got his science Ph.D. in Academy of Mathematics and Systems Science, Chinese Academy of Sciences(2006). He has more than ten years of experience in the field of HPC and AI application performance optimization. He is currently the director of Computing System Optimization Lab in Huawei, and chief expert of Huawei's intelligent computing technology planning.