Revolutionising HPC: Bursting from Cloud to On-Premise 


Published 27/08/2024 By NAG

Revolutionising HPC: Bursting from Cloud to On-Premise 

Imagine a high-performance computing environment that seamlessly shifts between the cloud and on-premise infrastructure, dynamically optimising cost and performance while safeguarding your most sensitive data. Bursting from cloud to on-premise is no longer just a concept—it’s a reality transforming industries under intense pressure to advance. This approach not only redefines how we think about scalability and security but also integrates seamlessly into existing hybrid cloud strategies, ensuring that HPC environments are more efficient, flexible, and resilient than ever before. 

Unlocking the Potential of Hybrid Cloud Solutions for Engineering and Science 

In today’s competitive landscape, industries such as aerospace, automotive, engineering, oil and gas, and nuclear energy are under immense pressure to get results quickly and efficiently. High-performance computing (HPC), whether cloud or on-premise, has become a crucial tool, enabling advanced simulations, complex computations, and large-scale data processing.  

Bursting to Cloud HPC: A Paradigm Shift 

Of course, bursting to the cloud has allowed companies to leverage cloud resources during peak demand periods, mitigating the need to invest in expensive, idle on-premise hardware. This approach ensures scalability, flexibility, and cost-efficiency, enabling organisations to handle spikes in workload without compromising performance or incurring prohibitive costs.  

The New Frontier: Bursting from Cloud to On-Premise 

Recently, a new innovative approach of bursting from cloud to on-premise is something early adopters are keen to explore, as it addresses several critical needs not solved by the more well-known technique of bursting from on-premise to cloud.  This strategy allows organisations to continue to use their unique customised systems, whilst maximising all the benefits of Cloud HPC.  

The inertia of implementing Cloud solutions often stems from outdated ideas that Cloud HPC isn’t as performant or secure. This perception, however, is no longer accurate. Modern cloud infrastructures offer robust security measures and performance levels that frequently surpass traditional on-premise systems. By adopting a bursting from cloud to on-premise approach, organisations can break free from the constraints of legacy thinking and embrace the full potential of Cloud HPC. 

Additional Benefits of Bursting from Cloud to On-Premise: 

Staged Migration and Infrastructure Flexibility: Bursting from cloud to on-premise simplifies the process of migrating workloads. Organisations can utilise existing on-premise infrastructure without disrupting users while gradually reducing its size or decommissioning. This approach also allows companies to keep sensitive data and models on-premise, ensuring compliance with regulatory requirements and safeguarding intellectual property. 

Centralised and Resilient Infrastructure: By centralising infrastructure in the cloud, organisations can leverage cloud-native tools and services for enhanced automation, resilience, logging, and monitoring, thereby reducing overall operational risk. Additionally, with data primarily stored in the cloud, the autoscaling of pre and post-processing services is no longer constrained by on-premise capabilities. 

Cost Efficiency and Data Management: Bursting from the cloud to the on-premise shifts costs from output to input, reversing the typical egress/ingress cost structure and optimising financial efficiency. Furthermore, with data stored in the cloud, businesses can take advantage of a wide array of cloud-native backup and archiving services, eliminating the need to continually expand on-premise storage. 

Alignment with Cloud-First Strategies: For companies adopting a cloud-first strategy, bursting from cloud to on-premise integrates HPC into a unified IT infrastructure, creating a more streamlined and cohesive environment. This approach not only simplifies the management of HPC resources but also aligns with broader corporate IT initiatives, making the transition to cloud-based operations less complex. 

Elimination of Additional Tooling and Downtime: Leveraging cloud resources eliminates the need for additional third-party tools to manage bursting and ensures no downtime for updates or maintenance, enhancing the overall efficiency and reliability of the HPC environment. 

Unified User Experience: From the user’s perspective, this approach creates a more homogeneous environment, satisfying the need for cloud utilisation while presenting it as a single, integrated system, minimising the disruption to their work. This makes the HPC environment more cohesive and easier to manage. 

Embracing Hybrid Cloud: Opportunities and Challenges 

But before we dive into bursting from cloud to on-prem and an example of how one company, NAG, has delivered such a model, let’s look at some of the Opportunities and Challenges.  

Opportunities of Hybrid Cloud: 

Scalability: Hybrid cloud solutions allow organisations to scale their computational resources up or down based on demand, which is crucial for industries with variable workloads. But how much do you need this extra resource? 

Cost Efficiency: Companies can optimise IT expenditure and often achieve massive savings through the best use of cloud resources vs. on-premise, taking into account existing and future demand, relative functionality, and sensitive operations. What is the best solution for your goals whilst remaining on budget? 

Data Security: Sensitive data can be kept on-premise, reducing the risk of exposure, while the cloud can be leveraged for less critical tasks. How critical is the security of your data, and what steps are you taking to ensure its protection? 

Performance Optimisation: Tasks requiring low latency and high performance can be executed on-premise while the cloud handles other computations, ensuring overall efficiency. How does your current infrastructure handle high-performance tasks, and where can improvements be made? 

Challenges of Hybrid Cloud: 

Integration Complexity: Ensuring seamless integration between cloud and on-premise systems can be technically challenging, requiring custom interfaces and workflows. 

Data Management: Managing data across hybrid environments can be difficult, especially with large datasets. Consistency, latency, and bandwidth issues need to be addressed. 

Cost Management: Hybrid solutions can be cost-effective but require careful monitoring and management to avoid unexpected expenses. 

NAG’s Hybrid Cloud, Bursting to On-prem Solution: A Case Study 

Successfully implementing a comprehensive HPC solution to fully realise these advantages, and to solve those challenges requires careful planning and expertise. NAG has been at the forefront of developing innovative HPC solutions, including the groundbreaking concept of bursting from cloud to on-premise.  

This approach to HPC solutions emphasises performance engineering and optimisation. By working closely with clients, NAG ensures that HPC systems are tailored to performance and cost requirements.  

Step 1: Assessment and Planning 

The project began with an in-depth assessment and planning phase. This step involved close consultation with the client to fully understand their requirements and the specific workloads they needed to manage. Workload analysis was critical to determining the computational needs, data movement patterns, and potential bottlenecks. Based on this analysis, a detailed proposal was developed outlining the architecture, timelines, and expected outcomes. 

Step 2: Design Architecture 

The next phase focused on designing the architecture. The network configuration was carefully planned to support the data transfer requirements between cloud and on-premise systems, ensuring low latency and high throughput. Topology considerations included redundancy and failover capabilities to maintain system availability. The software stack included SLURM for workload management and other essential HPC tools and libraries. 

Step 3: Procurement and Setup 

With the architecture in place, we moved on to procurement and setup. The network setup required custom routing to ensure seamless connectivity between cloud and on-premise environments. A parallel file system was configured for storage for efficient data access and distribution. A custom data movement solution was implemented to ensure that data is always available when needed, minimising latency and optimising performance. Importantly, all cloud infrastructure was deployed using an Infrastructure-as-Code (IaC) approach, ensuring consistency, repeatability, and scalability. 

Step 4: Software Installation and Configuration 

The software installation and configuration phase was critical, particularly in the context of bursting from cloud to on-prem.  

The SLURM configuration was the most pivotal and innovative aspect of this phase. SLURM needed to be configured and customised to facilitate communication between cloud and on-premise nodes, enabling bidirectional data and task management. This ensured that jobs could be dynamically distributed across cloud and on-premise resources based on real-time demand. 

Building the images for the on-premise nodes was another critical step. These custom images were designed to communicate effectively with the SLURM controller in the cloud, enabling seamless integration and automated on-premise node provisioning. This automation was crucial for scalability and efficiency, allowing the system to quickly adapt to changing computational needs. 

Highlighted Techniques: 

SLURM customisation: Enabled two-way communication between cloud and on-premise nodes, a key innovation for the bursting system. 

Custom image building: Ensured on-premises nodes could integrate seamlessly with cloud-based SLURM, allowing for automated provisioning. 

Step 5: Integration and Testing 

Finally, integration and testing was conducted to ensure the entire system functioned as expected. This included implementing corporate firewall rules to protect both cloud and on-premise environments. Authentication services were configured to work across the hybrid system, ensuring secure access and data integrity. Rigorous testing was performed to validate the bursting functionality, focusing on performance, reliability, and security. 

Summary of the Case Study 

This case study highlights the meticulous planning, innovative engineering, and technical expertise required to develop a hybrid cloud solution that bursts from cloud to on-premise. By focusing on key elements such as SLURM customisation, custom image building, and automated provisioning, the project successfully delivered a solution that meets the high demands of many industries. 

The result is a flexible, scalable, and secure HPC environment that empowers organisations to fully leverage the benefits of cloud computing while maintaining control over critical data and workloads. 

Key Considerations for Implementing Comprehensive HPC Solutions 

So, what were the key focuses of this project? How did it all happen? We would recommend these steps to ensure optimal performance and prepare organisations for future technological advancements: 

Evaluation: Before adopting HPC, thorough benchmarking or proof-of-concept studies is crucial. Setting up test environments to run specific workloads provides valuable insights into performance metrics and cost implications, helping organisations make informed decisions about hybrid solutions. 

Professional Implementation: Optimising HPC systems, whether on-premise, hybrid, or cloud-based, requires strategic and knowledgeable guidance. Expert services ensure systems are configured to achieve peak performance, facilitate smooth transitions to cloud environments, and design elastic HPC setups. 

Managed Services: Maintaining an HPC system involves comprehensive monitoring, regular updates, and reviews. Proactive management ensures systems run smoothly and efficiently, minimising downtime and allowing organisations to focus on innovation. 

Strategic Enhancement: Continuous assessment and optimisation of HPC environments are essential for sustained success. Leveraging historical data and feedback loops provides actionable insights, driving meaningful system improvements and aligning HPC infrastructure with evolving organisational objectives. 

What Does This All Mean for HPC Users? The Power of HPC in Industry 

HPC has revolutionised various industries by enabling advanced research and development. For example, HPC-driven simulations can significantly reduce the time and cost associated with crash testing and vehicle design in the automotive industry, allowing for thousands of virtual crash scenarios. Similarly, in the oil & gas sector, HPC accelerates seismic data analysis, enhancing exploration and production efficiency. In aerospace, HPC supports the design and testing of new aircraft, while nuclear energy relies on HPC for reactor design and safety analysis. 

The Evolution and Impact of HPC Technologies 

The rapid evolution of HPC technologies, driven by advancements in processor design, parallel computing, and machine learning, has significantly enhanced the capabilities of HPC systems. As Moore’s Law approaches its limits, the focus is shifting from hardware to software optimisation, making performance engineering crucial for maximising HPC efficiency. The integration of AI and ML with HPC is also opening new frontiers, enabling predictive maintenance, optimised supply chains, and advanced decision-making across various sectors. 

Embracing Hybrid Cloud: An Invitation to Innovate 

The hybrid cloud model represents a significant opportunity for industries to enhance their computational capabilities. Combining the best aspects of cloud and on-premise HPC allows organisations to achieve unprecedented efficiency, scalability, and performance. This approach enables businesses to be more agile, responding quickly to changing demands and market conditions. 

Summary and Conclusion 

Hybrid cloud solutions are transforming industries by merging the scalability of cloud computing with the control and performance of on-premise systems.  

The innovative concept of bursting from cloud to on-premise represents a significant advancement in HPC. By enabling organisations to keep sensitive tasks on-premise and utilise cloud resources for less critical workloads, this strategy addresses critical needs like compliance, data sovereignty, and performance optimisation. This capability is particularly beneficial for sectors such as aerospace and nuclear energy, where regulatory requirements and performance demands are stringent. 

Successfully implementing a comprehensive HPC solution requires careful planning and expertise. Thorough benchmarking helps organisations understand the feasibility and benefits of hybrid solutions. Strategic professional services ensure systems are configured for peak performance, while proactive managed services maintain system reliability and efficiency. Continuous strategic enhancement is essential to adapt to evolving technologies and business goals, ensuring HPC systems remain cost-effective and efficient. 

Mastering these elements is crucial for businesses aiming to achieve their strategic objectives with exceptional efficiency. Seamlessly integrating cloud and on-premise HPC environments to burst either way, can not only meet current computational demands but also prepare businesses for future challenges and opportunities. 

In conclusion, embracing hybrid cloud and the innovative strategy of bursting from cloud to on-premise is more than a technological upgrade; it is a strategic move towards sustained innovation, agility, and competitive advantage in the fast-evolving HPC landscape. 

    Please fill out all of the following questions





    What solvers are you interested in trying? (multiple choice)

    By clicking the button below you agree to our Privacy Policy

    This will close in 20 seconds