As an experienced infrastructure architect working on a service-led pre-sales exercise, I have struggled to get to grips with what the overall solution looks like and have had to learn a lot about delivering IT Services in a very short time. This article is intended to express the basics of service management for architects who, like myself, are more familiar with technology than service. Itâ€™s not a complete reference, and a service architect may be dismayed by my interpretation of the processes; however it is based on advice and guidance from a specialist service management consultancy and I am grateful to Keith Webb for his assistance in reviewing and editing the article.
The IT Infrastructure Library (ITIL) takes service management best practice and presents it in a format that can be applied to IT (and non-IT) service support and delivery. Organised as a number of processes, ITIL describes:
- Incident Management.
- Configuration Management.
- Problem Management.
- Change Management.
- Release Management.
- Service Level Management.
- Availability Management.
- Capacity Management.
- IT Service Continuity Management.
- Financial Management for IT Services.
Looking at the way that service is often implemented, end users may raise a service request or incident (for example via a website, or by calling a service desk) or incidents may be reported automatically (e.g. an alert generated by an IT system). The incident or service request will be reported to a service desk and an incident record created and stored in the service management tool. Typically, the service desk will have access (via the Configuration Management Database – CMDB) to user information (from various directories â€“ e.g. the Human Resources system or Active Directory) and asset information â€“ the incident record will link to these as appropriate.
From an incident management perspective, the service desk will own the relationship with the end user through to resolution and closure, managing the communications and ensuring that the incident/service request is resolved to the userâ€™s satisfaction (and closed after a pre-defined interval if user contact is not possible). Drawing on resources such as a known error database, the service deskâ€™s aim in incident management is to ensure that service is restored as soon as possible according to defined priorities and within timescales agreed with the business. This can be achieved by what is known as a first time fix (where the incident is resolved by, and does not leave, the service desk); however some incidents will need to be assigned to second/third-line support teams for further tests/investigation. In such cases, it is the responsibility of the service desk to manage the incident through to resolution and closure, even if the incident is assigned elsewhere in the organisation.
Configuration management makes use of the CMDB to record information of any assets (typically in the live environment) the organisation wants to control â€“ this may include people, documentation, hardware, software, network equipment, locations, service management information, contractual information, â€œhowtosâ€, etc. and, whilst the assets may well be recorded in an asset database, this tends to be more about financial details (date purchased, end of warranty, etc.) and the CMDB builds out on this asset data to provide a more complete record â€“ including relationships between services and relationships between configuration items which combine to provide a service. Some items may not exist in the asset database (e.g. a PC, a monitor or a printer may be considered too expensive to track and inexpensive to purchase, so might be managed as a consumable) and it is part of the configuration management role to agree with the organisation which items are to be controlled (and hence stored in the CMDB). Whilst the CMDB is a relational database, providing a hierarchical structure of the relationship between assets (e.g. a server is a piece of hardware with a number of individual components, and certain software installed, configured in a particular manner, and connected to various networks, to provide a service as defined in a particular document, etc.). These relationships can quickly become complex, and the CMDB does not necessarily store all the records (e.g. certain configuration items may exist in another repository, linked from the CMDB) and the real value of the CMDB is the relational nature which allows for extremely flexible reporting. In reality, whilst various products may feed into the CMDB, the CMDB itself tends to be built around a service management tool such as BMC Remedy or HP Service Center.
Problem management examines incidents (either at one time, or over a period) and attempts to identify the root cause of a problem. Over time, this may feed into the incident management process but the two disciplines have distinctly differing attributes â€“ whilst incident management is about providing quick fixes to restore service, problem management is about examining the underlying cause and may take an extended period of time. Problem management has both reactive and proactive aspects: the reactive aspect is concerned with solving problems in response to one or more incidents and the proactive aspect is concerned with preventing incidents from occurring in the first place.
Where an incident leads to a change being required, a change record is created (in the service management tool, by the service desk or the change team) and the change management process is invoked. The change team will typically assess the risk, impact of doing/not carrying out the change, urgency, requirement and effect on other services relating to a change, owning and managing any related testing and communication with users of the service until a request for change is rejected or implemented.
Sometimes, a number of changes may be packaged as a release, as part of the release management process. Release Management undertakes the planning, design, build, configuration and testing of hardware and software to create a set of release components for a live environment. A release may be as small as a tiny code change to a script or it could be a service pack, a number of software components, an application, or even a piece of hardware.
Service level management is the name given to the planning, coordinating, drafting, negotiating, agreeing, monitoring and reporting of services and their associated Service Level Agreements (SLAs) and service targets. It is also the on-going review of service achievements against those targets to ensure that the required and cost-justifiable service quality is maintained and gradually improved. As well as reporting on service targets (e.g. number of incidents which are resolved as first-time fixes, percentage availability of a service during the past month, etc.), service level management includes a degree of relationship management â€“ agreeing the required levels of service with the customer. For example, a KPI for first-time fixes which is too high may be counterproductive as it will encourage rapid call closure rather than problem management (which would be expected to improve the overall level of service over time). Tools may be used to monitor or alert via alarms when service level limits (e.g. time to respond, time to resolve, etc.) are being approached for a given incident.
Capacity management is about providing appropriate capacity (the right level, at the right time) in an infrastructure component or service and needs to be aware of the business plans as these may have a serious impact upon service. For example, a major marketing campaign may increase the load on a website or a call centre and appropriate capacity will need to be provided in order to continue to meet defined service levels.
Availability management is linked to capacity management but is aligned with service continuity. If a service is judged to be business-critical and requires high availability, then additional components may be provided to increase the resilience of the solution but the use of such components needs to be balanced with introducing additional cost to a solution, affecting profitability (even if service is over-delivered against the agreed levels, it is unlikely that this will result in extra revenue). As new technologies emerge which provide dynamic solutions to capacity and availability management (e.g. server virtualisation) become mainstream, a new generation of service management tools will be required to cope with dynamic discovery and reporting as the CMDB is constantly updated to reflect changes to controlled configuration items.
IT service continuity management is concerned with managing an organisation’s ability to continue to provide a pre-determined and agreed level of IT Service to support the minimum business requirements following an interruption to the business. As organisations become more dependent upon technology, which is now a core component of most business processes, continued availability of IT and the delivery of IT services is critical to their survival. This is accomplished by introducing risk reduction measures such as resilient systems, and recovery options including back-up facilities.
Financial management for IT services accounts for the cost of service provision, providing a valuation of the services being delivered, the valuation of the assets that are in use in enabling those services to be delivered and the costs of operating and supporting those services. Financial Management also looks at to recover those costs, where applicable, in a controlled manner. It provides the sound stewardship of the monetary resources of the organisation. It supports the organisation in planning and executing its business objectives.
ITIL 2 is a non-proprietary framework of service management best practices â€“ it is not mandatory but organisations may increase their effectiveness by adopting and adapting the disciplines that are most appropriate to the environment. ITIL 3 is an extension of ITIL 2 (i.e. it includes all of ITIL 2 and more).
ISO/IEC 20000-1 is an auditable standard, aligned with ITIL 2 and organisations can both gain or lose ISO/IEC 20000-1 accreditation as a result of their adherence (or otherwise) to the required standards.
A solution may consist of a number of service towers, each of which may implement various ITIL processes. In practice, the processes will be similar across the various towers; however the specifics may vary. There will be economies of scale where multiple towers are implemented using the same tools (i.e. introducing service efficiencies) and the business will typically be able to supply volumetric data which will aid in distinguishing the number of people required to deliver each service.