Search This Site
Menu
In This Site
Proud Member of the cuasterisk.com Network.
BuyCUSO
Introducing cuasterisk.com

CU*ANSWERS HIGH AVAILABILITY PROGRAM REVIEW
08/07/2011 – 08/10/2011

Summary

As part of a robust business continuity program, CU*Answers actively maintains a High Availability (HA) system and assesses its capabilities by performing role-swap exercises  and processes live from the HA business continuity system at regular intervals. We do this to assess our normal operational capabilities in a rollover situation, test communications with third party vendors, and to perform system maintenance. A summary of recent activities is included in the following review.
This particular role-swap event was especially noteworthy from the standpoint that this was our first extended rollover event hosted at our new Muskegon business continuity site. We ran from that location for four days and although we anticipated some wrinkles (as is always the case when a major project such as this is initiated), the event was immensely successful.

Event Review

The role-swap to the high availability (HA) system commenced on Sunday 08/07/11 at 4:02 PM ET and we rolled back to our primary production system at 9:48 PM EST on Thursday 08/11/11.

In preparation for the event, CU*Answers made arrangements to have early Monday morning representation in the Client Service, Systems and the Internal Networking areas to address any issues that might occur attendant to the rollover.
This was the first role-swap event in which all client processing was performed using the HA system located in Muskegon. Although we did not anticipate any issues, several route configurations and firewall changes had been made to the PCs used for operational processing in moving the units to the Muskegon site. Because of this, we elected to have only the night-time shift process from Muskegon while holding the day and evening shifts back at the 44th Street, Grand Rapids facility to process remotely from the HA system. This allowed us to completely test all processing, including PGP encrypted FTP transmissions, IFS transmissions,  Federal Reserve Bank ACH file processing for both CU*Answers online clients as well as in-house clients, end-of-day and beginning-of-day processing, and ISO subsystem start and end routines.
All processing was carried out from the recovery site for a span of four days. Having ensured that all general daily processing can be completed without hindrance at Muskegon, the Operations Team will expand the scope of our next rollover exercise to include additional staffing onsite from all shifts.

CHALLENGES

  1. Several documents used by the Operations team were missing from the IFS drive. Replication was immediately shut down for the specific folder and all files were restored manually from the IFS drive on the CUAPROD system to the IFS drive on CUAHA1. The iSeries Administration Team researched with iTera and found an issue with replication of documents that have been renamed. A project to fix this issue has been started by iTera.
  2. Reports were unavailable to view in GOLD immediately after the rollover. An IP address needed to be redirected in the iTera application to route requests to the CU*Spy server from the HA system. The change was made in less than five minutes and reports were available as normal. Likewise, IP addresses needed to be reinitialized in the proper sequence on the initial roll. This was a planned occurrence, as additional address changes still had to be made in iTera mapping. Adjustments were made for the roll back to the production system and that process completed without any issues.
  3. ROBOT console and GUI interfaces used for monitoring by Operations were pointing to the production IP address. The correct IP address was entered at each work-station to correct this. This has been added to the post rollover procedures for Operations.
  4. On Monday the 8th the eFunds transmission erred out for the CU*Northwest clients. CU*Northwest had to configure a route to the Muskegon site on their primary firewall to allow traffic to pass to the HA system.
  5. Prior to processing at Muskegon, several notifications had been made to clients regarding necessary network route changes required to accommodate the addition of the Muskegon site. A few credit unions could not connect to the HA system on the morning of the 8th because they had not set-up the route to the HA system. The Systems and Internal Networking Teams worked with each credit union individually to make the route configuration changes.
  6. The EOD save file backup erred out on the morning of the 9th. There was a section of code within the program which created a conflict because we were processing on the HA system. When running from the production system, a backup is created on the HA system and written to tape. Since we were rolled and already on the HA system, the program was trying to write to libraries on the HA system which were already in use because we were running end-of-day backups and processing on that system. Programming, Operations and the iSeries Admin team determined this was an unnecessary process and the code was removed from the program.
  7. The Run 3 Item Processing file could not be transmitted to the HA system for Operations to process. The file was manually pushed over to the HA system on the 8th. The ImageCenter application required that reverse DNS match the forward DNS entry.  The reverse route was configured and we experienced no further issues while rolled.
  8. Operations was unable to transmit the daily reports and member messages to in-house eDoc clients. These clients were not configured to pass traffic through from the Muskegon site IP address. All routing issues were corrected manually by eDOC and Operations was able to transmit to all in-house clients by Thursday the 11th.
  9. Operations was unable to transmit the card maintenance file to FiServ. Internal Networking determined that we must manually enable/disable related firewall rules with each role-swap and will handle this change in the future. Once the rules were enabled we were able to send the maintenance file.
  10. Remote writers were still using the production system IP address. The iSeries Admin Team manually changed each writer to the correct address.
  11. Lingering stored procedures in CUBASEPTF resulted in transfer issues in the A2A services. The stored procedures were removed. In the future, libraries will be cleared on DEV and restored to PROD rather than just cleared on PROD, which should remove this issue.
  12. Operations was initially unable to establish a phone client connection to one PC and had issues in accessing the Internet from another PC; that aside, the overnight shift was able to perform all processing without difficulty.
  13. The DNS alias used by the A2A services didn't resolve properly.  The configuration was updated to use the FQDN, this will be standard practice going forward for configs. Also, possible replication issues caused stored procedures to exist in CUBASEPTF on the HA system which had been moved from the DEV and PROD systems.  This caused an error when the A2A service tried to report the status code and confirmation that was returned from Magic Wrighter.  All A2A transfers were submitted successfully by 12:30pm on 8/8/11.

SUCCESSES

  1. The Operations overnight shift was able to perform all processing, 24/7, for the entire span of the four day event from the Muskegon business continuity site.
  2. CU*Answers Operations was able to perform remote operational support processing on behalf of Tahquamenon CU while we ourselves were running from our High-Availability system.
  3. CU*Answers Operations was able to perform ACH processing support for CU*South while operating from the Muskegon HA facility.
  4. In the past, credit union writers in a released status would print reports with date stamps from prior rollovers which were left in their outQs. We added a pre role purge of these outQs on the HA system resulting in no outdated reports being printed.
  5. All ISO, shared branching, and in-house switches for all vendors performed as normal without any communication issues.
  6. All in-house credit unions were able to perform all normal day to day functions.
  7. All self-processor connections were tested and in an expected state.

CONTINUING EFFORTS AND RECOMMENDATIONS

  1. In the event of an actual disaster CUA operations will be unable to produce and ship the laser notices for credit union clients using this service as we have only one inserter/sealer system which is located at the 44th street facility. Operations is working with Sage Direct on a contingency plan for notice production in the event of a disaster.
  2. Having proved we are able to complete all processes without incident using the PCs at the Muskegon site, Operations will expand our shift coverage at the HA site during future rollovers.
  3. A few credit unions were unable to connect to the HA system even though they were notified in three separate alerts. Jim Lawrence, Manager of Disaster Recovery/Business Resumption will be meeting with multiple teams to discuss this and all other curreyeahnt procedures as CUA moves forward to improve our disaster recovery process.
 

Ways to Stay Connected

Subscribe
to RSS
Read the
NewsStand

also find us on and LinkedIn

Please wait... loading