In the PC/single core times, we assumed flop is expensive. As such, the "best" practice having been followed up to date is to trade frequent memory access for reduced and saved flops; the philosophy of code/algorithm optimization is to reduce the number of flops. However, following the current trend of multi/many-core hardware change, flop is becoming the next round of "free lunch", whereas data movement appears to the new bottleneck because the energy consumed on data moment is 100X that on flop. Hence, from the viewpoint of hardware efficiency and energy effectiveness, the more flops per unit data movement, the merrier. For the future's extreme-scale application, it would be prone to adopt novel algorithms that can maximize the number of flops per unit data movement to take full advantage of the "free flop". This is in perfect contrast with the programming habit and the way of thinking in the past. The algorithm that was deemed to be expensive in the PC/single core times might be reversely a good candidate on the emerging computer architecture; tomorrow highly possibly would see a paradigm shift in defining what a "good" algorithm is.
As the power of modern super computing systems continues to advance at an exciting pace forward to extreme scales, it is quite clear that the associated numerical software development challenges are also increasingly formidable. On the emerging architectures, memory and data motion present increasingly serious bottlenecks as the required low-power consumption requirements lead to systems with significant restrictions on available memory and communications bandwidth. In consideration of the current trend of hardware change, if no change is made to the key numerical algorithm, the waste of the floating point capability seems unavoidable. Consequently, computational science experts in multiple application domains will need to re-visit key application algorithms and solvers with the likelihood that new capabilities will be demanded in order to keep up with the dramatic architectural changes that accompany the impressive increases in compute power.
Co-design, in the most basic sense, engages the necessary collaborations between hardware designers, computer scientists, applied mathematicians, and computational science experts in multiple application domains to carry out the essential interdisciplinary research that will enable harvesting in a timely way the scientific and technological benefits as HPC hardware moves forward to extreme scales.
Following the successes of CoDesign2011, 2012, 2013, 2014 and 2015, the sixth international workshop CoDesign 2016 will be held on October 27-29, 2016 at SHANXI GUESTHOUSE, Xi'an, Shanxi, China. It is an official part of HPC China as one of most important international events of the largest annual domestic conference on HPC in China.
The primary motivation for the international workshop is to enable productive and timely interdisciplinary discussions with focus on stimulating progress in domain applications that engage extreme-scale computing. This features new challenges and opportunities encountered in the development of software-hardware needed for computing at the extreme scale. By gathering insights from successful experiences in petascale applications, it is hoped that this workshop will help optimize a converged co-design path toward computing at the extreme scale and associated big data challenges.
To provide some context and background information, the web-site for the previous international exascale Co-Design workshop hosted by China can be found at here.