Big Data and Extreme-scale Computing BDEC2 Common Digital Continuum Platform for Big Data and Extreme Scale Computing with first meeting BDEC2-1 November 28-30, 2018 Bloomington Indiana USA Evening Reception on November 28 followed by two days of work Meeting focus Defining application requirements for a Common Digital Continuum Platform for Big Data andofExtreme Scale Computing Record meeting with white papers, presentations, recordings Next meetings: February 19-21 Kobe, Japan (with a focus on platform) followed by two in Europe, one in USA and one

Digital Continuum platform supporting Science and Engineering data research spanning HPC, Clouds, simulations, data analytics (Common Digital Continuum Platform for Big Data and Extreme Scale Computing) First meeting had a focus on application requirements but kicked off other key parts of BDEC2 and included working groups on Applications and Requirements Platform architecture/design and Community (Academia, Government, Industry, Open Source, .orgs) building It included overview talks expanding on initial talks at BoF discussing overall issues and regional (Asia, Europe, US) perspectives on Digital Continuum. There were two rounds of breakouts to start off working groups 35 white papers were submitted; many of these were presented with 40 talks at BDEC2-1 We planned working group activities with some virtual meetings in between the 6 inperson events The application working group has held 5 zooms with 12-15 attendees each before Kobe. 2 Dan Reed -- HPC and the Digital Continuum Why us? No one else is creating software services specifically for science Otherwise, we must adapt/adopt other solutions Why now? HPC, streaming data, and AI are the future We need to act rather

than react Why fusion? Integration will enable new science Its more than workflows, containers, and libraries Manish Parashar: Transforming Science through Cyberinfrastructure: Envisioning a Cyberinfrastructure Continuum NSF 10 Big Ideas Harnessing the Data Revolution (HDR): Many Dimensions of Data Investments at NSF is a convergence accelerator NSF 19-549 DCL: Scalable Cyberinfrastructure to Accelerate Data-Driven Science and Engineering Research (NSF 18-076 DCL) Networking, Campus CI as the fundamental layer and underpinning of the CI Continuum CC* (NSF 19-533), IRNC, CICI Advanced Computing Systems & Services: Adapting to the Rapid Evolution of Science and Engineering Research (NSF 19-534) Exploring Clouds for Acceleration of Science

(E-CAS) Rosa Badia: European perspective European Open Science Cloud International LOFAR telescope demonstrator Universe gets BIGGER as LOFAR telescope scientists discover 300,000 MORE galaxies Feb 19 2019 PRACE scientific case Recurring core part of Nobel Prizes in Physics & Chemistry Saving billions with better weather forecasting Batteries & supercapacitors Improving human health with genomics, personalized medicine

3-4% better fuel efficiency of aircraft & wind turbines every year Disrupting communication, transportation and manufacturing Design of future materials from scratch based on desired properties Artificial intelligence, machine learning, sensors, open data Edge to cloud projects and other initiatives mf2c smart fog hub use case 5 Satoshi Matsuoka: Japan Flagship 2020 Post K Supercomputer Arm system with upto 100x K performance e Society 5.0 Apps Co-Design with 9 Application areas Convergence of HPC and AI Acceleration of Simulation with AI AI replacing simulation

Acceleration of AI with HPC 6 Haohuan FU (China): Big Data and Extreme Computing: Look Back and Look Ahead Climate Science has a Big Data Challenge! 7 Topics of White Papers and Application Talks Pathology Genome Alignment Interface of Machine Learning, Simulation and Observation Network Science Biomolecular Simulations

Climate Material Science Weather; Data Assimilation Fusion Electrical Power Grid Real time race car monitoring Smart Cities Precision Agriculture Edge and Fog Computing UAV and Environmental monitoring Square Kilometer Array Light Sources & Experiment Control Continuum Platform: Modern Clouds, workflow, HPC, Data Transport, Benchmarking

8 BDEC2-1 Platform Presentations Edge, Cyberinfrastructure and Cloud Technologies M. Beck, Glimpsing a Yottascale Data Ecosystem when the Fog Lifts (the Edge is the Computer) U. Ramachandran, Elevating the Edge to be a Peer of the Cloud (using the Fog) G. Antoniu, The Sigma Data Processing Architecture: Leveraging Future Data for Extreme-Scale Data Analytics to Enable High-Precision Decisions R. Badia, Workflow environments for advanced cyberinfrastructure platforms M.Tsuji, Toward integration of multi-SPMD programming model and advanced cyberinfrastructure platform C. Costa, Converged Ecosystem for Data Analytics and Extreme-Scale Computing D. Gannon, Pathways to Convergence An Additional Scenario T. Hanawa, Advanced Cyberinfrastructure Platform Design O. Tatebe, Memory-Storage Hierarchy T. Kosar, OneDataShare: A Universal Data Sharing Building Block for Data-Intensive Applications 9 BDEC2 Application Working Group Strategy

BDEC2 is designing a new platform that will be firmly rooted by the requirements across a broad range of future-looking applications in science and engineering. One goal of the BDEC2 is to discover and document those requirements that cover multiple use cases so we can use shared cyberinfrastructure. We tackle by collecting individual cases by several mechanisms including white papers and presentations at our plenary meetings. We find common requirements by dividing the use cases into a few (eventually 3-6) broad buckets. Experts on individual case cases contribute the documentation of their use case and an analysis of the common and special features if use

cases in each bucket. The members of the platform group will be challenged to support the key features of each bucket. We intend to demonstrate the new platform with a set of international demonstration projects tackling a well-chosen exemplar in each bucket. 10 Challenge Platform Document Somehow Use Case 1 Buckets of Use Cases

Use Case 2 Big Data Analytics Use Case 3 Use Case Use Case Use Case N-2 Use Case N-1 Use Case N Demos Categorize in

many ways Benchma rks Chose n App(s ) HPC and ML Edge to Cloud 11 Use case Survey based on NIST Big Data Survey Sample of the most basic fields. Form has more optional fields NIST had 54 use cases Overall Features Detailed Features

Use Case Title Use Case Description Use Case Contacts Data Source, Volume, Velocity Use Case URL(s) Data Analytics and Computational Methods Pictures and Diagrams? Actors / Stakeholders Summary of Use Case and its Solution Computer Systems Infrastructure Key words and Tags for classification Security and Privacy Issues 12

BDEC2 Bucket 1: Classic Observational Data plus ML Astronomy BDEC2App-1: M. Deegan, Big Data and Extreme Scale Computing, 2nd Series (BDEC2App) - Statement of Interest from the Square Kilometre Array Organisation (SKAO) Environmental Science BDEC2App-2: M. Rahnemoonfar, Semantic Segmentation of Underwater Sonar Imagery based on Deep Learning BDEC2App-3: M. Taufer, Cyberinfrastructure Tools for Precision Agriculture in the 21st Century Healthcare and Life sciences BDEC2App-4: J. Saltz, Multiscale Spatial Data and Deep Learning BDEC2App-5: R. Stevens, Exascale Deep Learning for Cancer BDEC2App-6: S. Chandrasekaran, Development of a parallel algorithm for whole genome alignment for rapid delivery of personalized genomics BDEC2App-7: M. Marathe, Pervasive, Personalized and Precision (P3) analytics for massive bio-social systems 13 Comments on Bucket 1 SC BOF included Oil prospecting use case from David Keyes Instruments include Satellites, UAVs, Sensors (see edge examples), Light sources (X-ray MRI Microscope etc.), Telescopes, Accelerators,

Tokomaks (Fusion), Computers (as in Control, Simulation, Data, ML Integration) Image-based Applications One cross-cutting theme is understanding Generalized (light, sound, other sensors such as temperature, chemistry, moisture) Images with 2D, 3D spatial and time dependence Modalities include Radar, MRI, Microscopes, Surveillance and other cameras, X-ray scattering, UAV hosted, and related non-optical sensor networks as in agriculture, wildfires, disaster monitoring and Oil exploration. GIS and geospatial properties are often relevant 14 BDEC2 Bucket 2: Control, Simulation, Data, ML Integration BDEC2App-8: W. Tang, New Models for Integrated Inquiry: Fusion Energy Exemplar BDEC2App-9: O. Beckstein, Convergence of data generation and analysis in the biomolecular simulation community BDEC2App-10: S. Denvil, From the production to the analysis phase: new approaches needed in climate modeling BDEC2App-11: T. Miyoshi, Prediction Science: The 5th Paradigm Fusing the Computational Science and Data Science (weather forecasting) See also Marathe and Stevens talks Material Science BDEC2App-12: K. Yager, Autonomous Experimentation as a Paradigm for Materials Discovery

BDEC2App-13: L. Ward, Deep Learning, HPC, and Data for Materials Design BDEC2App-14: J. Ahrens, A vision for a validated distributed knowledge base of material behavior at extreme conditions using the Advanced Cyberinfrastructure Platform BDEC2App-15: T. Deutsch, Digital transition of Material Nano-Characterization. 15 Comments on Control, Simulation, Data, ML Integration Simulations often involve outside Data but always inside Data (from simulation itself). Fields covered include Materials (nano), Climate, Weather, Biomolecular, Virtual tissues (no use case written up) We can see ML wrapping simulations to achieve many goals. ML replaces functions and/or ML guides functions Initial Conditions Boundary Conditions Data assimilation Configuration -- blocking, use of cache etc.

Steering and Control Support multi-scale ML learns from previous simulations and so can predict function calls Digital Twins are a commercial link between simulation and systems There are fundamental simulations covered by laws of physics and growingly Complex System simulations with Bio (tissue) or social entities. 16 MLforHPC and HPCforML We distinguish between different interfaces for ML/DL and HPC. HPCforML: Using HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms (theory guided machine learning), which are then used to understand experimental data or simulations. HPCrunsML: Using HPC to execute ML with high performance SimulationTrainedML: Using HPC simulations to train ML algorithms, which are then used to understand experimental data or simulations. MLforHPC: Using ML to enhance HPC applications and systems See review at

MLautotuning: Using ML to configure (autotune) ML or HPC simulations. 17 MLafterHPC: ML analyzing results of HPC, e.g., trajectory analysis in biomolecular simulations MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations. The same ML wrapper can also learn configurations as well as results MLControl: Using simulations (with HPC) in control of experiments and in objective driven computational campaigns, where simulation surrogates allow real-time predictions. BDEC2 Bucket 3: Edge Computing Smart City and Related Edge Applications BDEC2App-16: P. Beckman, Edge to HPC Cloud BDEC2App-17: G. Ricart, Smart Community CyberInfrastructure at the Speed of Life BDEC2App-18: T. El-Ghazawi, Convergence of AI, Big Data, Computing and IOT (ABCI)Smart City as an Application Driver and Virtual Intelligence Management (VIM) BDEC2App-19: M. Kondo, The Challenges and opportunities of BDEC systems for Smart Cities Other Edge Applications BDEC2App-20: A Pothen, High-End Data Science and HPC for the Electrical Power Grid BDEC2App-21: J. Qiu, Real-Time Anomaly Detection from Edge to HPC-Cloud There are correlated edge devices such as power grid and nearby vehicles (racing, road). Also largely independent edge devices interacting via databases such as surveillance cameras

18 BDEC2 Remaining Talks crossed all categories BDEC Ecosystem BDEC2App-22: I. Foster, Learning Systems for Deep Science (Central Hub for wisdom including models) BDEC2App-23: W. Gao, BigDataBench: A Scalable and Unified Big Data and AI Benchmark Suite (later meeting at IEEE Big Data December 2018 also involved MLPerf) 19 Conclusions of 5 Application Working Group Virtual Meetings They are documented with links to recordings at b8DQN0gQhMJ2ITtUgeG0gs/edit?usp=sharing

Agreement on the three buckets as being roughly reasonable Documented discussion on details of of application classes (such as the edge) and common services Discussion on the relation between 23 BDEC2Apps and the 54 NIST use cases and application (bucket) classification is consistent Point discussions such as synergy between astronomy and pathology as an image-based Bucket 1 Application No agreed next step; review results from Kobe to see what to 20do!

