Data center management is the collection of tasks performed by those responsible for managing ongoing operation of a data center.[1] This includes Business service management and planning for the future.
Historically, "data center management" was seen as something performed by employees, with the help of tools collectively called data center-infrastructure management (DCIM) tools.[2]
Both for in-house operation and outsourcing, service-level agreements must be managed to ensure data-availability.[3]
Data center management is a growing major topic for a growing list of large companies who both compete and cooperate, including: Dell,[4]Google,[5]HP,[6]IBM,[5]Intel[6] and Yahoo.[6]
Hardware/software vendors who are willing to live with coopetition[7][8] are working on projects such as "The Distributed Management Task Force" (DMTF)[9] with a goal of learning to "more effectively manage mixed Linux, Windows and cloud environments."
With the DMTF a decade old, the list of companies is growing, and also includes companies much smaller than IBM, Microsoft, et al.[10]
Another major area is the cost of downtime regarding customer dissatisfaction & business loss,[13] and also the "astonishing" yet hidden cost and effect regarding personnel & productivity.[14]
Business-service management
Business-service management (BSM) treats IT as part of the larger enterprise strategy,[15] and helps fill the gap between business and IT.[16]
IBM notes that major problems often happen in the grey areas, particularly due to errors in the interfaces, and focuses on critical failures. Sufficient redundancy should allow failures in non-critical areas to protect the business from being affected.[16] BSM, which is positioned above IT Service Management (ITSM), promotes a customer-centric and business-focused approach to service management, aligning business objectives with IT or ICT from strategy through to operations. Tools that help BSM include a modeling language,[17] and a common dashboard, which together allow data center personnel to see problems before business customers do.[18]
Newer developments
Remote data center management[19] allows offsite experts to watch for situations needing their timely intervention at a lower cost than having such staff be onsite 24/7/365.
While some requirements for on-site hardware have been reduced,[20] spending in other hardware areas such as UPS may have to increase.[21]
Data center asset management
Data center asset management (also referred to as inventory management)[22] is the set of business practices that join financial, contractual and inventory functions to support life cycle management and strategic decision making for the IT environment. Assets include all elements of software and hardware that are found in the business environment.[23]
IT asset management generally uses automation to manage the discovery of assets[24] so inventory can be compared to license entitlements. Full business management of IT assets requires a repository of multiple types of information about the asset, as well as integration with other systems such as supply chain, help desk, procurement and HR systems and ITSM.
Hardware asset management
Hardware asset management entails the management of the physical components of computers and computer networks, from acquisition through disposal.[24] Common business practices include request and approval process, procurement management, life cycle management, redeployment and disposal management. A key component is capturing the financial information about the hardware life cycle which aids the organization in making business decisions based on meaningful and measurable financial objectives.
Software Asset Management is a similar process, focusing on software assets, including licenses. Standards for this aspect of data center management are part of ISO/IEC 19770.
Data center infrastructure management
Data center-infrastructure management (DCIM) is the integration[25] of information technology (IT) and facility management disciplines[26] to centralize monitoring, management and intelligent capacity planning of a data center's critical systems. Achieved through the implementation of specialized software, hardware and sensors, DCIM enables common, real-time monitoring and management platform for all interdependent systems across IT and facility infrastructures.
DCIM products can help data center managers identify and eliminate sources of risk[27] and improve availability of critical IT systems. They can also be used to identify interdependencies between facility and IT infrastructures to alert the facility manager to gaps in system redundancy, and provide dynamic, holistic benchmarks on power consumption and efficiency to measure the effectiveness of "green IT" initiatives.[28][29]
Important data center metrics include those regarding energy efficiency and use of servers, storage, and staff. In too many cases, disk capacity is vastly underused and servers run at 20% use or less.[30] More effective automation tools can also improve the number of servers or virtual machines that a single admin can handle.
DCIM providers are increasingly linking with computational fluid dynamics providers to predict complex airflow patterns in the data center. The CFD component is necessary to quantify the impact of planned future changes on cooling resilience, capacity and efficiency.[31]
Operations
Information technology operations, or IT operations (IT ops), are the set of all processes and services managed by IT staff[32] for use by internal or external clients. The term refers to the application of operations management to the technology used to run the business.
Operations work can include responding to support tickets generated for maintenance work or customer issues.[33] Some operations teams provide on-call support, responding to incidents outside of normal business hours.[33]
As lights out[34] operations increased, less of the staff are located near corporate headquarters.[35][36]Gartner defines IT operations as "the people and management processes associated with IT service management to deliver the right set of services at the right quality and at competitive costs for customers."[37]
Technical support
Technical support (often shortened to tech support) refers to services. Within a corporation, these are also known as help desks[38] often arrange their technical support structure as a three-tier (plus two) system:[39]
Tier 4: Outside support for "items not directly serviced by the organization"
Access to varying levels of support for products and services to in-house employees and corporate customers, providing information
and troubleshooting[40] is via various channels such as toll-free numbers,[41] websites, instant messaging, or email.
Help desk professionalism
An ITIL-compliant help desk is usually a part of a bigger service desk unit, which is part of ITSM.[42]
As the incoming phone calls are random in nature, help desk agent schedules are often maintained using an Erlang C calculation. Companies with custom application software may also have an applications team who are responsible for the development of in-house software. The help desk may assign to the applications team such problems as finding software bugs. Requests for new features or information about the capabilities of in-house software that come through the help desk are also assigned to applications groups. The help desk staff and supporting IT staff may not all work from the same location. With remote access applications, technicians are able to solve many help desk issues from another work location or their home office. While there is still a need for on-site support to effectively collaborate on some issues, remote support provides greater flexibility.
Some companies and organizations provide discussion boards for users of their products to interact; such forums allow companies to reduce their support costs[43] without losing the benefit of customer feedback.
Some fee-based service companies charge for premium technical support services.[44]
Outsourcing technical support
Many organizations relocated their technical support departments or call centers to countries or regions with lower costs. Dell was amongst the first companies to outsource their technical support and customer service departments to India in 2001, but then reshored.[45] There has also been a growth in companies specializing in providing technical support to other organizations. These are often referred to as MSPs (Managed Service Providers).[46]
For businesses needing to provide technical support, outsourcing allows them to maintain a high availability of service. Such need may result from peaks in call volumes during the day, periods of high activity due to introduction of new products or maintenance service packs, or the requirement to provide customers with a high level of service at a low cost to the business. It allows businesses to use specialized personnel whose technical knowledge base and experience may exceed the scope of the business, thus providing a higher level of technical support to their employees.
A common scam typically involves a cold caller claiming to be from a technical support department of a company like Microsoft. Such cold calls are often made from call centers based in India to users in English-speaking countries, although increasingly these scams operate within the same country. The scammer will instruct the user to download a remote desktop program and once connected, use social engineering techniques that typically involve Windows components to persuade the victim that they need to pay for the computer to be fixed and then proceeds to steal money from the victim's credit card.[47]
Preventive maintenance (or preventative[48] maintenance (PM)) is ongoing scheduled[49] inspection[50] intended to detect and correct incipient failures either before they occur or before they develop into major problems such as downtime.
Managing the capacity of a data center
With the increasing use of "the cloud" and what has been called "the Era of Infinite Capacity",[51] there is still a need for professional Data Center Capacity Planners.[52]
There is a need to know what will be needed, and when.[53] Data must continually be collected regarding usage of power/energy, computing power, data storage and networking/telecommunications. Plans must include awareness of cooling and space requirements.
Sometimes analysis of this data, and comparison to industry norms, can be outsourced.[53] The balance for the need to focus more on data collection[54] or analysis depends on current use levels: prior to 50%, the focus can stay more on data collection. Beyond 75%, the focus must shift to analysis, in preparation for upgrades, replacements and expansions. The data center is a resource in its own right.[55]
Top data centers and service providers
According to Cloudscene's Leaderboard for Q1 2018, data center operators are ranked "based on both data center density (total operated data centers)", as well as "the number of listed service providers in the facility". Cloud service providers are ranked based on "connectivity (the total number of PoPs) for the region." Chosen from a pool of more than 6,000 providers, the rankings are as follows:[56]
^Ben Zimmer (April 18, 2010). "Wellness". The New York Times. Complaints about preventative go back to the late 18th century ... ("Oxford English Dictionary dates preventive to 1626 and preventative to 1655) ..preventive has won"
^since this consumes both computing and storage resources
^J Xu; M Zhao; J Fortes; R Carpenter (2007). "On the use of fuzzy modeling in virtualized data center management". Fourth International Conference on Autonomic Computing (ICAC'07). p. 25. doi:10.1109/ICAC.2007.28. ISBN978-0-7695-2779-6. S2CID16153431.