Data Center Migration – Planning is Essential
A growing number of our clients are looking to move their data center assets – to a new company-managed data center facility, into a hosted data center, into “the cloud”, or a hybrid of these. While migration is probably necessary and beneficial, it can also be one of the riskiest activities an IT team will undertake. In no other activity do you intentionally risk so much impact to so many IT assets and services, in a single act of bravado.
When these projects go wrong, they go wrong in a very bad way. The key to mitigating these risks is proper planning and management. Due to the relative infrequency of these projects, few technology managers and project managers have the opportunity to develop a strong experience base in these disciplines. We highly recommend that our clients engage a project manager with experience in data center migrations, who can apply a rigorous methodology for planning and managing the project to mitigate risks, minimize negative business impact and maximize the odds of success.
At Varrow, we apply a five phase methodology for data center migrations.
Each phase builds on the knowledge gained in earlier phases, and enables a migration project with a high degree of predictability and a relatively low risk of unplanned negative business impact. These phases may be performed serially for small migration projects, or may be performed in a cascading parallel manner (as depicted above) for larger migration projects.
The Current State Assessment focuses on identifying and documenting necessary details about the current environment which will be migrated. In most cases, the existing environment has evolved over time. There will be incomplete documentation and knowledge of exactly how it all works together, and what all the dependences are. Comprehensive inventories of applications, services, devices, and circuits must be assembled and validated. Unless the environment is very simple, dependency mapping of applications, devices and network connectivity must be performed to ensure that these dependencies are not “broken” by the sequence of migration events. There are numerous tools which can assist in gathering this inventory information and identifying dependencies, but significant human effort is still required to validate and organize the findings of these tools. Technical, physical, and resource constraints should be identified and documented, as well as any known risks in the environment. The combination of all this information provides a basis for developing migration strategies and designing the post-migration future state environment.
Armed with a thorough understanding of the current environment, as well as the application and service portfolio, migration strategies are developed for each application and service. The objective of this phase is to develop strategies and approaches, not detailed plans or designs (yet). The details will be handled in the next phase. Take care not to get too bogged down in the details at this point – many of the details are likely to change later, anyway.
Business requirements for the overall migration, and business impact of downtime for each service and application must be well understood and documented. Current and future strategies for availability and resiliency of applications and services must also be understood. These strategies may enable migration with minimal user impact, or the migration project may be the time to address gaps in existing availability and resiliency capabilities. You may choose to implement and utilize new data replication and data center virtualization technologies to facilitate the migration.
For each application or service, and its enabling infrastructure, the appropriate migration strategy must be identified, based on what is technically viable, financially justifiable, and aligned with availability requirements. Will the service be turned off, physically moved, and then brought back up during an outage window? Will “swing” equipment be needed to build a temporary environment in the destination location to minimize downtime? Can virtualization technologies be leveraged to seamlessly migrate services between the sites? Or will a hybrid of these approaches be required based on the characteristics of the specific application and environment? This must be worked out on an application-by-application basis. It’s hard, it’s time-consuming, and it’s necessary.
Based on technical and logical dependencies, the applications, services and supporting infrastructure are grouped together into “move groups”. These are the smallest groupings of equipment and services that must be migrated as a unit to ensure dependencies are not broken. Multiple groups can be combined into a “wave”, a set of move groups that make sense to perform together as a sub-project or event. These groups and waves are preliminary at this point, and may be modified later as more detailed planning for the migration is completed.
For each group, appropriate risk mitigation strategies must be identified. How will we test the migration beforehand? How will we rollback to the prior state if a major unforeseen problem is encountered during the move event? Based on all of this work, a conceptual migration schedule can be developed. Again, this will change as planning progresses.
Now that we have an overall view of requirements, strategies, and a preliminary “game plan” for how to approach the migration, we roll up our sleeves and dive into the details. For all but the smallest migrations, these tasks will likely be distributed to multiple focused teams to perform design and planning of the relevant parts.
Final design of the destination data center and network must be performed. If you haven’t done it already, this is a good time to contract for data center space and required data circuits, as those elements may have significant lead times. If new network equipment or other IT infrastructure will be required, it’s also a good time to order those items which have long lead times. The migration project may be a good opportunity to further virtualize and consolidate your server and storage environment. If so, ensure that you’re planning for that in the future state.
Once the detailed design is understood, preliminary move groups, waves, and schedules can be re-evaluated and modified as appropriate. The project manager should perform detailed planning for all relevant activities required before, during, and after the actual migration event. Where physical relocation of equipment is required, the project manager should engage moving professionals with appropriate training, equipment, and insurance to perform the physical move. In some cases, it will be necessary to coordinate with equipment manufacturers to perform the physical move, or to handle re-certification of the equipment at the new site to ensure continuous support.
Good migration plans include very detailed schedules of who does what, and when. In time-critical move events, this may be scheduled to the minute! Transportation routes need to be identified, validated (you don’t want your truck stuck in unexpected construction traffic), and contingency plans need to be developed.
Post move planning includes any de-commissioning of old facilities, local migration off temporary “swing” equipment, enhanced incident and problem management support to quickly remediate any end-user issues that may arise from the migration, etc.
When describing what you plan to do and how you plan to do it, you should never have to use the phrase “in theory”. You need to KNOW how things are going to work, come the day of move. Nothing should be left to chance, and nothing should be based on the way you “think” things will work. TEST EVERYTHING. Where there are “new” technical approaches being employed, perform a proof-of-concept early in the process (parallel with the strategies phase) to ensure that the theoretical approach actually can work in your environment. Not sure whether something will work over the new network connection or with new addressing or with limited bandwidth? TEST IT.
Prior to the day-of-move, meet with all participants of the move event and walk through the plan. You may even want to physically walk through and rehearse the move, to the extent practical. Will your storage array fit through the door? Can you pull something that weighs half a ton up a ramp with a hand cart? Leave nothing to chance.
When done right, the actual migration event(s) will be anti-climactic… because you’ll only be doing something you’ve already tested and rehearsed multiple times before. It’s not stressful, because you KNOW it’s going to work. You’ve proved it. And you have a plan that accounts for possible contingencies that may arise.
Ready? Really ready? Then DO IT!
Again, much of the pre-move activity will likely be happening in parallel with completion of prior phases. You can be building out the destination environment, connectivity, and network while completing detailed planning, testing, and remediation. Day-of-move and post move activities should happen like clockwork, because you planned appropriately, and you confirmed and re-confirmed schedules with all participating employees and third parties before move day. On the day of move, you held an initial meeting with all parties to review the schedule AGAIN, to ensure everyone understood their part and their time constraints. The project manager tracked all activities as they happened, and had a communication channel (like a bridge line) open for addressing any problems or contingencies in real time. Easy, right?
If this sounds like a lot, that’s because it IS a lot. The greatest trap we see our clients fall into is only looking at the migration from a technical perspective, because often the person tasked with the project is a technical expert. There is great risk in ignoring the broader planning, testing, and logistics aspects of a data center migration. Don’t trivialize it. If you need help with the strategy, planning and project management tasks, get it. The success of your migration depends on it.