On August 16, 2019, a U.S. Customs and Border Protection’s (CBP) system outage of information technology systems disrupted processing of incoming international travelers at airports nationwide for as long as 2.5 hours. Such an outage risks the entry of unauthorized individuals who could threaten national security and public safety.
A similar outage occurred on January 2, 2017, which the Office of Inspector General (OIG) reported on, including recommendations to prevent a recurrence. Now, OIG has reviewed the August 2019 outage and found that CBP’s implementation of these recommendations managed to minimize the length and severity of the outage, but not prevent it completely.
OIG praised CBP for establishing a more effective control structure for monitoring passenger screening systems, which enabled prompt action to identify and resolve the outage.
However, OIG found that CBP’s critical passenger applications were operating on an Oracle database device that was not properly configured and did not have up-to-date patches.
CBP initially reported the cause of the outage was missing code, allowing a “software bug” that caused memory to be mismanaged, resulting in a service degradation. However, relying on support from Oracle contractors, OIG determined that an incorrect configuration setting on an Oracle Exadata device was the primary cause.
The watchdog determined that the August 2019 lapse occurred because the Oracle patch did not execute properly and CBP did not ensure its configuration management policies and procedures were followed and patches were applied promptly.
Two years prior, on October 19, 2017, Oracle had published a document concerning a known issue with the Non Uniform Memory Architecture (NUMA) support. The reported issue was that a “bug” could cause systems to experience a sharp increase in workload, and/or decrease in database performance. A CBP official told the OIG review that this issue was addressed between the last quarter of 2017 and first quarter of 2018 on all 18 Exadata devices in the CBP data center except one. OIG notes however that it could not confirm the exact timing of the configuration setting update due to a lack of CBP documentation.
The outage resulted in longer wait times, delays for arriving passengers, and the need for CBP to revert to less effective backup systems to support passenger screening procedures. CBP personnel faced additional challenges during the outage, as they were unable to quickly access “offline” backup systems and were not fully prepared for backup procedures. OIG said this was due to inadequate training and ineffective communication from CBP headquarters during the outage.
CBP has standard operating procedures that outline protocols, known as mitigation procedures, to follow during an unscheduled system outage or a significant system slowdown. According to these procedures, a CBP port of entry Shift Supervisor should initiate mitigation procedures in the event that operations are adversely affected. Once such procedures are initiated, CBP officers should begin to use the backup systems within 30 minutes of an outage to continue screening incoming travelers. Specifically, when the Traveler Primary Arrival Client system is unavailable, CBP officers should use the Portable Automated Lookout System (PALS) instead. PALS provides a basic “watch list” of people designated as inadmissible to the United States. PALS is a basic, standalone application that does not interface with other technology when primary passenger screening systems are offline. Each month, the Office of Information and Technology (OIT) at CBP updates PALS data and distributes it electronically to all ports of entry to ensure they have the most up-to-date watch list data.
In keeping with CBP policy, some ports of entry used PALS during the August 16, 2019 outage. However, when attempting to deploy PALS to process incoming international travelers, CBP officers OIG surveyed from some of the airports stated they experienced difficulties accessing and using the system. The CBP OIT practice of sending a PALS password monthly to points of contact at ports of entry did not prove effective, especially to support exigent circumstances. For example, several CBP officers expressed difficulty finding the correct password in their emails, and some could not find it at all. Additionally, a number of officials had trouble entering the PALS password, delaying their access to the system needed to sustain passenger screening operations. In addition, some CBP officers told OIG that they had insufficient PALS equipment and others expressed difficulty using PALS because they had not been properly trained on the system.
OIG made five recommendations to CBP:
- Implement a verification process to ensure that configuration changes are fully implemented and patches are installed in a timely manner.
- Require new employee and recurring training for CBP staff performing passenger screening using Office of Field Operations outage mitigation applications, including deploying the PALS system.
- Require regular tests of outage mitigation applications such as PALS deployment procedures, and update those procedures based on the results.
- Ensure all CBP field offices and ports of entry are able to access outage mitigation applications such as the PALS system, by increasing awareness of the process to request necessary equipment and receive updated passwords for all workstations used to screen international passengers.
- Ensure all CBP field offices are aware of the National System Health Dashboard communication process for keeping field staff informed of system interruptions.
CBP concurred and has said it has already taken steps to implement these recommendations.