High Availability
What is it?
Availability is a self-evident concept that is sadly often overlooked by IT managers. Simply put, the equipment in your server room or data center must be online and accessible as much as possible. High availability (HA), then, is a technical term used to describe systems that offer more uptime (or less downtime) than the industry average. High availability is becoming the standard requirement for IT equipment in the digital age, when just about everything is connected to the internet or the IoT.
Availability should factor in the initial planning and budgeting stage. It is often necessary to buy redundant subsystems and components to mitigate risk and minimize downtime, even if it will run up expenses. Single points of failure (SPOF) must be eliminated. You also need to prepare a business continuity plan (BCP) for major outages; the BCP should cover how operations can continue during an outage, how normal operations can resume as soon as possible (sometimes called a disaster recovery plan, DRP), and how business will be affected as a result of the outage (sometimes called a business impact analysis, BIA).
Availability should factor in the initial planning and budgeting stage. It is often necessary to buy redundant subsystems and components to mitigate risk and minimize downtime, even if it will run up expenses. Single points of failure (SPOF) must be eliminated. You also need to prepare a business continuity plan (BCP) for major outages; the BCP should cover how operations can continue during an outage, how normal operations can resume as soon as possible (sometimes called a disaster recovery plan, DRP), and how business will be affected as a result of the outage (sometimes called a business impact analysis, BIA).
Why do you need it?
Since so much of modern life is connected to the IoT, IT equipment should be designed for high availability. In the case of cloud computing, cloud service providers (CSPs) should ensure their services are always available. For important research projects, equipment availability may be the key to achieving world-first scientific breakthroughs. And in the future, when smart city solutions like autonomous vehicles hit the streets, high availability will be necessary to help keep the smart world of the future running along smoothly.
How is GIGABYTE helpful?
GIGABYTE's server solutions offer these features to ensure high availability:
a. Smart Crises Management and Protection (SCMP): SCMP is a GIGABYTE-patented feature that is deployed in servers with non-fully redundant PSU designs. With SCMP, in the event of a faulty PSU or the system overheating, the system will force the CPU to go into ultra-low power mode, which reduces the power load and prevents the system from unexpected shutdowns.
b. Smart Ride Through (SmaRT): To prevent server downtime and data loss as a result of power loss, GIGABYTE has implemented SmaRT in all the server platforms. When a power loss event occurs, the system will throttle while maintaining availability and reducing power load. Capacitors within the power supply can provide power for ten to twenty milliseconds, which is enough time to transition to a backup power source for continued operation.
c. Dual ROM Architecture: In case the ROM that stores the BMC and BIOS fails to boot, the system will reboot with the backup BMC and/or BIOS. Once the primary BMC is updated, the ROM of the backup BMC will automatically update the backup through synchronization. For the BIOS, it can be updated based on user's choice of firmware version.
a. Smart Crises Management and Protection (SCMP): SCMP is a GIGABYTE-patented feature that is deployed in servers with non-fully redundant PSU designs. With SCMP, in the event of a faulty PSU or the system overheating, the system will force the CPU to go into ultra-low power mode, which reduces the power load and prevents the system from unexpected shutdowns.
b. Smart Ride Through (SmaRT): To prevent server downtime and data loss as a result of power loss, GIGABYTE has implemented SmaRT in all the server platforms. When a power loss event occurs, the system will throttle while maintaining availability and reducing power load. Capacitors within the power supply can provide power for ten to twenty milliseconds, which is enough time to transition to a backup power source for continued operation.
c. Dual ROM Architecture: In case the ROM that stores the BMC and BIOS fails to boot, the system will reboot with the backup BMC and/or BIOS. Once the primary BMC is updated, the ROM of the backup BMC will automatically update the backup through synchronization. For the BIOS, it can be updated based on user's choice of firmware version.