November 9th 2020
Assistant Professor, Electrical Engineering & Computer Science
University of Michigan, USA
Keynote Title: Practical Memory Disaggregation: A Case Study in Network-Informed Data Systems Design
Abstract: Data systems leverage Big Data and AI tools to extract value out of data, be it local or distributed across the world. Over the last five years, we have demonstrated that network-informed data systems design can yield order-of-magnitude performance and efficiency improvements. In the process, we have established two complementary threads of research: (1) tailoring Big Data and AI applications to their underlying networks and (2) applying networking principles in designing new data systems. In this talk, I will provide an overview of the broader landscape and dive deep into a case study of the former: practical memory disaggregation.
High-performance data systems often over-provision memory because applications today cannot access otherwise unused memory beyond their machine boundaries even when their performance grinds to a halt. But could they? Many attempted to answer this question since the 80s, but practical memory disaggregation remained elusive. I will start by presenting the first scalable memory disaggregation system that allows any application to use remote memory without any changes to the application, the underlying operating system, or requiring new hardware. Even a microsecond is almost an order-of-magnitude slower than accessing a local memory page. I will present a new prefetching algorithm to breach this latency barrier. Finally, I will introduce the fastest lock manager in the world – processing billions of lock requests every second – that enables safe concurrent remote page access. Our open-source solutions allow unmodified data systems to run with only 25% local memory with little to no performance loss.
Bio: Mosharaf Chowdhury is an Assistant Professor in the EECS Department at the University of Michigan, Ann Arbor. His current research focuses on application-infrastructure symbiosis across different layers of software and hardware stacks. Mosharaf invented coflows and is a co-creator of Apache Spark. Software artifacts from his research have been deployed in Microsoft, Facebook, Google, and Amazon datacenters. He has received an NSF CAREER award, the 2015 ACM SIGCOMM doctoral dissertation award, best paper awards at NSDI and ATC, multiple faculty fellowships and awards from Google, VMware, and Alibaba, as well as a Facebook fellowship and a Cheriton Scholarship. He had also been nominated for an NSDI community award and a University of Waterloo alumni gold medal. He received his PhD from the AMPLab at UC Berkeley in 2015.
November 10th 2020
Jie Zhang, Associate Professor
School of Computer Science and Engineering
Nanyang Technological University, Singapore
Keynote Title:Robust Trust Management for Cloud Service Provisioning
Abstract: The increasing number of service providers (e.g. cloud service providers) makes it challenging to select a provider for a specific service demand. More challengingly, service providers may not deliver the services as what they promised. A typical approach to address this issue is to allow service consumers to model the trust of the providers and select those trustworthy ones to interact with. However, trust models may become a target of attacks. It is important for trust management to be robust against various attacks. In this talk, I will identify the kind of attacks that can be launched by attackers and summarize various trust models that have been proposed to cope with the robustness issue. Remaining issues and future directions will also be discussed.
Bio: Jie Zhang is currently an Associate Professor of the School of Computer Science and Engineering, Nanyang Technological University, Singapore. He obtained Ph.D. in Cheriton School of Computer Science from University of Waterloo, Canada, and was the recipient of the Alumni Gold Medal at the 2009 Convocation Ceremony. The Gold Medal is awarded once a year to honour the top PhD graduate from the University of Waterloo. Jie Zhang’s research is in the general area of Artificial Intelligence and focuses on trust modeling and preference modeling for various emerging application domains (e.g. e-commerce, VANET, IoT, collaborative systems, etc.). His papers have been published by top AI conferences (such as NeurIPS, AAAI and IJCAI) and top networking and security journals (such as IEEE TIFS and IEEE TNSM). He has won several best paper awards at the conferences like IM, CNSM, IFIPTM, etc. Jie Zhang is also active in serving research communities. He is serving as Senior Editor of the Electronic Commerce Research and Applications journal and Associate Editor of IEEE TNSM. He also served as General Chairs and PC Chairs for several international conferences.
November 11th 2020
Prof. Frank Wuerthwein
Executive Director Open Science Grid (OSG)
Professor of Physics and Data Science at UC San Diego
Lead of High Throughput Computing group at San Diego Supercomputer Center
Keynote Title : Near Exascale Computing in the Cloud: the use of GPU bursts for Multi-Messenger Astrophysics with IceCube Data
Abstract: On Saturday before SC19, we performed a GPU cloud burst during which we purchased the entire for-sale capacity of all GPUs available across AWS, Azure, and GCP worldwide. We describe this activity from science motivation to preparations to execution, and follow up bursts. In addition, we will discuss the extend to which this is applicable to other sciences, as well as commercial applications. At peak, we operated a sustained workflow across 51,500 GPUs across 8 generations of NVIDIA GPUs and 28 cloud regions across three commercial providers. In a follow up burst, we integrated cloud and on-prem, simplifying the workflow such that it could be repeated easily any time for any science application consistent with the distributed High Throughput computing paradigm. In preparation for these burst, we measured the networking bandwidth between regions within the same cloud providers, and from the cloud to various locations on-prem worldwide.
Bio: Prof. Wuerthwein, an experimental particle physicist analyzing data from the Large Hadron Collider, is as executive director of OSG responsible for providing an integrated data and compute platform that advances Open Science through distributed High Throughput Computing. The platform has been used in the detection of gravitational waves (Nobel Prize 2017), and the discovery of the Higgs Boson (Nobel Prize 2013), as well as many other scientific achievements. His cyberinfrastructure research includes globally distributed cloud and on-prem data and compute integration.