Maintenance System ------------------- NTP 297-1001-106 PREL., ISSUE 01D01 May 25, 1978 MAINTENANCE SYSTEM DMS-100/200 (c) NORTHERN TELECOM LIMITED, 1978 MAINTENANCE SYSTEM DMS-100/200 TABLE OF CONTENTS Page 1. GENERAL...........................................1 2. MAINTENANCE SUB-SYSTEMS.......................... 1 2.11 Maintenance Sub-System Interfaces............4 3. MAINTENANCE CONCEPT...............................5 3.02 Man-Machine Interface........................5 3.04 Telescoping..................................6 4. ROUTING AND REPORTING SUB-SYSTEM..................6 4.02 Routing Mechanism............................7 4.04 Logging Mechanism............................8 4.06 Software/Hardware Alarm......................9 5. MESSAGE TYPES.....................................9 5.01 Inter-Maintenance Sub-System messages........9 5.04 Messages from Maintenance SS to Routing and Reporting SS.............................10 5.06 Messages between VDU and Routing Mechanism...11 5.10 Messages between remote Maintenance Centers and the Trunks Maintenance SS........11 5.11 Messages between the I/O Message System or Alarm Detection SS and the Maintenance sub-systems..................................12 5.12 Messages from the CC Maintenance SS to Software Maintenance SS......................12 6. INTERACTION PROTOCOL..............................12 6.02 Maintenance SS Interaction Protocol (MSIP)...12 6.04 Man-Machine Protocol.........................13 7. REFERENCES........................................13 8. ABBREVIATIONS.....................................14 FIGURES FIGURE TITLE PAGE 2-1 Block Diagram. Maintenance Sub-systems........F-1 2-2 VDU Screen.....................................F-2 4-1 Routing and Reporting Sub-systems..............F-3 6-1 Sub-system Interaction Protocol................F-4 6-2 Man-Machine Protocol...........................F-5 1. GENERAL 1.01 The purpose of the Maintenance System for the DMS 100 Family of Digital Multiplex Switching (DMS) systems is to provide complete maintenance by detecting, analyzing, correcting and reporting errors in DMS software and hardware. 1.02 The Maintenance System consists of a number of functional areas, each having hardware elements and software resources, which interact through internal messages to perform the tasks involved in automatic maintenance. 1.03 Each functional area, referred to as a maintenance sub-system (SS) contains the software necessary for maintaining its own hardware. The functional areas correspond to the hardware and software components of DMS which are described in NTP 297-1001-100. Each maintenance SS locates and diagnoses errors occurring within the area of responsibility assigned to it. 1.04 The maintenance sub-systems are accessible to maintenance personnel through the Maintenance and Administration Position (MAP) or other terminal devices such as TTY. The MAP provides a Man-Machine Interface (MMI) through its Visual Display Unit (VDU) screen and keyboard. The VDU screen displays DMS status information in response to requests from maintenance personnel entered manually via the VDU keyboard. Refer to NTP 297-1001-110 for a detailed description of the MAP and its operating instructions. 2. MAINTENANCE SUB-SYSTEMS 2.01 Refer to Figure 2-1, which shows a block diagram of the DMS maintenance sub-systems and their connection with the other elements of the maintenance system. The maintenance sub-systems and their areas of responsibility are: Software_(SWERR)_Maintenance_SS 2.02 The Software maintenance SS is responsible for periodically checking the CC software to ensure that logical rules are followed (software sanity). The SWERR SS monitors the number of timeouts, the density of message flow, the amount of available system resources, and the percentage of memory used. Central_Control_(CC)_Maintenance_SS 2.03 The CC maintenance SS is responsible for maintaining the hardware on the Central Processor Unit (CPU), Data Store (DS), and Program Store (PS) shelves. This includes cards, such as the data port extenders and the associated control and interface cards. Also monitors the power converters and cooling unit. DS maintenance includes the memory extension (MEX) frame if such is in use. Reports changes in CPU status (active or inactive). Maintains links up to, but not including the Central Message Controller (CMC). Central_Message_Controller_(CMC) Maintenance_SS 2.04 Responsible for maintaining all hardware on the CMC shelves, plus the links to the Network Modules (NM) and Input-Output Controllers (IOC). Input-Output_Devices_(IOD)_Maintenance_SS 2.05 Responsible for maintainting all hardware on the IOC shelves, including the device controller (DC) cards for the local IO devices (tape drives, TTY, VDU, etc.), the remote IO devices (Via modems), and the common control cards. Also monitors the performance of the IO devices themselves and the power converter cards. Network_Module_(NET)_Maintenance_SS 2.06 Responsible for maintaining all Network Modules (NM) and the links to the Peripheral Modules (PM). Also maintains the power converters, and cooling units of all NM frames. Peripheral_Module_(PM)_Maintenance_SS 2.07 The PM maintenance SS is responsible for all types of peripheral modules such as; Trunk Module (TM), Line Module (LM), and Digital Carrier Module (DCM). In each type of PM, the common control cards, the interface cards to the NMs, and the power converters, are maintained. Not included, are the transmission interface cards to the trunks (TM), lines (LM), or digital carrier equipment (DCE). Trunks_(TRKS)_Maintenance_SS 2.08 Responsible for maintaining the trunk interface cards in the TM and the transmission facilities to the distant office. In a DCM, the TRKS SS maintains the DCE interface cards and the transmission facilities via the digital carrier equipment to the distant office. Lines_(LNS)_Maintenance_SS 2.09 Responsible for maintaining the line interface cards in the LM and the transmission facilities to the subscriber's station equipment. Traffic_(TRAF)_Maintenance_SS 2.10 Responsible for detecting abnormal traffic and overload conditions. Generates data on traffic patterns for display on the VDU. Permits manual overload control. Applies automatic high-priority line. External_(EXT)_Alarms_Maintenance_SS 2.11 Monitors the alarm circuits of any equipment outside the DMS system. Sends alarm indication to alarm hardware and VDU. 2.12 MAINTENANCE SUB-SYSTEM INTERFACES .13 Each of the maintenance sub-systems just described has three kinds of interfaces with other elements of the maintenance system. (a) ______. From other maintenance sub-systems, from the Inputs I/O system and the alarm detection system. (b) Outputs. To other maintenance sub-system and to the routing and reporting sub-system. (c) Manual. Via the VDU or TTY, as mentioned in para. l. Input-Output_(I/O)_Message_System 2.14 The I/O Message System handles the reception and routing of internal messages between components of the DMS. All messages contain an error indicator. If no error indicator is present the message proceeds without the maintenance system being involved. If an error is present, the error indicator is routed to the maintenance SS whose area of responsibility covers the source of the error. The I/O System is a major source of error detection inputs to the maintenance sub-systems. Refer to NTP 297-1001-104 for details of the I/O Message System. Alarm_Detection_Sub-System 2.14 The alarm detection SS is a software/hardware entity which performs the following functions: (a) Receives hardware-detected alarms from within the DMS system via alarm scan points. (b) Receives software alarms via error information from the I/O system (c) Interprets the type of alarm and its level of severity. (d) Routes alarm messages to the responsible maintenance SS for action. (e) Sends messages to the routing and reporting SS for status display update (VDU, TTY). (f) Resets alarm conditions when a problem has been resolved, or on a manual input from the VDU keyboard. Routing_and_Reporting_Sub-system 2.16 The routing and reporting SS provides an interface between the maintenance sub-systems, the VDU and the alarm hardware (visible and audible alarm devices). The action of the routing and reporting SS is described later. 3. MAINTENANCE CONCEPT 3.01 The basic concept of the DMS Maintenance System is that each maintenance SS has the responsibility to locate and diagnose an error condition which is presented to it. The maintenance SS must determine first if the error lies within its own area of responsibility. If the error is within the responsibility of the maintenance SS, the error is diagnosed to determine what item is causing the error, and a message is sent to the routing and reporting SS. If the error is not within the maintenance SS tself, the error message is passed to the next appropriate SS which takes similar action until the error is finally located. 3.02 MAN-MACHINE INTERFACE (MMI) 3.03 Maintenance personnel can access each of the maintenance SS via the VDU keyboard and obtain status information as a display on the VDU screen. An additional or alternative MMI can also be provided by TTY. 3.04 TELESCOPING 3.05 The MAP maintenance function uses the technique of 'telescoping' to examine the operation of the DMS. Telescoping permits ever-increasing details about system status or troubles to be obtained, starting at the maintenance sub-system level and descending to lower levels until the fault is eventually traced to a replaceable component level. 3.06 Refer to Figure 2-2. The screen of the VDU displays various levels of status information in the System Status and Command Interpreter areas. The Command Menu Area displays a list of possible functions or commands which the maintenance personnel can perform at each level of interrogation. The use of menus minimizes the necessity for memorizing the commands or refering to documentation. Commands entered are repeated in the Input Echo area. 4. ROUTING AND REPORTING SUB-SYSTEM 4.01 Refer to Figure 4-1. The routing and reporting SS provides an interface between the VDU, the readout and storage devices, and the maintenance sub-systems. The following lists the main functions of this sub-system: (a) Assigns priorities to messages incoming from the maintenance sub-systems, based on the type and severity of the fault. (b) Assigns output routes based on the message type and routing information contained therein. (c) Drives software and hardware alarms based on the severity of the fault information contained in the message. (d) Retrieves information from a message logging mechanism. The request for this information is usually originated by the maintenance personnel. (e) Monitors the number of instances of a specific message and then, outputs either successive instances of that message and/or a 'threshold message'. The routing and reporting sub-system is divided into three parts as follows:- - Routing mechanism - Logging Mechanism - Software/hardware Alarm OUTING MECHANISM 4.02 The core of the routing and reporting sub-system is the routing mechanism which interfaces with the VDU, the maintenance sub-systems, the alarm sub-system and the logging mechanism. The main function of the routing mechanism is to receive messages from the different sub-systems and decide whether the message should be: (a) Routed immediately to one or more VDUs. (b) Stored in the logging mechanism for future retrieval. (c) Routed for status update; i.e. if a sub-system reports a change in its resource status or organization, the routing mechanism will decide which VDU requires the level of status change. (d) Routed to the alarm sub-system in order to take appropriate action based on the type and severity of the alarm message. (e) Re-routed to an alternative I/O device if the intended device is out of service. 4.03 The routing mechanism also checks the threshold of alarms which are not associated with any particular maintenance sub-system. LOGGING MECHANISM 4.04 The logging mechanism can display and store data in any of the following forms: (a) Hardcopy on printer. (b) Storage device. (c) Memory device at a remote terminal capable of performing system correlation. 4.05 The routing mechanism determines whether the alarm message hould be directed to a printer or to a storage device. Maintenance personnel can at any time request retrieval of all he information pertaining to a particular alarm. This method prevents the maintenance personnel from being flooded with irrelevant data on topics other than the problem at hand. SOFTWARE/HARDWARE ALARM 4.06 The software/hardware alarm receives the status messages from maintenance sub-systems and acts as follows, depending on the severity of the alarm: (a) Activates the proper audio and visual alarm (e.g. bell and light). (b) Updates the top level status display of the appropriate man machine terminal. This is done on a continuous update basis. (c) Resets alarm conditions whenever a problem has been cleared. 5. MESSAGE TYPES INTER-MAINTENANCE SUB-SYSTEM MESSAGES 5.01 See Figure 2-1. Transient errors always stay in the maintenance sub-system where they occured. It is ONLY when these errors persist that, if not resolved, they will be communicated to another maintenance sub-system via inter-maintenance sub-system messages. 5.02 There are two reasons for communication to take place between two maintenance sub-systems: (a) A sub-system is giving an error indication to another sub-system. These are termed fault messages. (b) A sub-system is notifying another sub-system that a component is no longer available for service. These are termed status messages. 5.03 The receiving sub-system is responsible for taking the appropriate action on the message. Furthermore, it is responsible for communicating information concerning its activity to the routing and reporting SS. Note that the transmitting sub-system's responsibilities concerning problems related to other sub-systems terminate as soon as it has disposed of the information. MESSAGES FROM THE MAINTENANCE SUB-SYSTEMS TO THE ROUTING AND REPORTING SUB-SYSTEM 5.04 Refer to Figure 4-1. Any maintenance sub-system can find it necessary to send one of four types of messages to the routing and reporting sub-system. In the usual order of occurrence, the four types of messages are: (a) Error - an abnormal event has occurred in the sub-system. (b) Diagnostic - results from an attempt by the sub-system to test a given aspect. (c) Action - an alteration has occurred in the status or organization of the sub-system's resources. (d) Exception - a report that a threshold has been exceeded. (e) Information - a report on the instantaneous status of a sub-system. 5.05 These messages are passed to the routing and reporting ub-system in packed (i.e. internal) format. There are two kinds of messages involved: (a) Specific responses to requests from the command interpreter. (b) Unsolicited messages which are submitted to the routing and reporting sub-system without any specific destination from the sub-system's point of view. MESSAGES BETWEEN THE VDU AND THE ROUTING MECHANISM 5.06 Input from the VDU moves through the command interpreter (CI). This process is responsible for interpreting the common language definitions of system components into internal form. This translated result and the request or command is then passed on to: (a) The relevant maintenance sub-system, or: (b) the routing mechanism, or: (c) the logging mechanism. .07 In case (b) above, the input either performs routing and eporting functions such as setting thresholds, or is passed on to the logging mechanism or the software-hardware alarm system. In (c) the input is used to retrieve stored information for display. 5.08 Output to the VDU always emerges from the routing and reporting sub-system. This includes responses to requests or ommands from the command interpreter to the maintenance sub-system. 5.09 Output and inputs to any other devices (i.e. printer) are organized in the same way as the VDU. MESSAGES BETWEEN REMOTE MAINTENANCE CENTERS AND TRUNKS MAINTENANCE SS. 5.10 An external interface to the trunk maintenance SS is provided. It is through such a facility that messages will pass between DMS and remote maintenance and administration centers. MESSAGES BETWEEN THE I/O MESSAGE SYSTEM OR ALARM DETECTION SUB-SYSTEM AND THE MAINTENANCE SUB-SYSTEMS 5.11 Major sources of indications to the maintenance sub-system are the messages from the I/O message system or the alarm detection sub-system. Problems detected by these two systems are routed to the relevant maintenance SS. MESSAGES FROM THE CC MAINTENANCE SS TO SOFTWARE MAINTENANCE SS 5.12 When anomalies in the CC Maintenance SS, relating to the software environment itself, occur the CC passes a message to the SW maintenance SS. This is the only type of communication between the SW maintenance SS and any other sub-system. 6. INTERACTION PROTOCOL 6.01 This section defines maintenance sub-system interaction protocol and man-machine protocol. Refer to Figure 6-1. MAINTENANCE SUB-SYSTEM INTERACTION PROTOCOL (MSIP) 6.02 MSIP is initiated by a detection of a fault (FD). When the I/O system, alarm detection SS, or any of the maintenance sub-systems, detect a fault, the maintenance sub-system in whose area the problem falls, is responsible for running a diagnostic rocedure. Should the diagnosis be sucessful and the problem is not transient, then an error message is sent to the routing and reporting SS. Should the diagnosis fail, a diagnostic message is sent to the routing and reporting SS. The maintenance sub-system is capable of initiating an action when it can determine the cause of the failure. If so, an action message (followed possibly by a status message) is sent to the routing and reporting SS. 6.03 MSIP as initiated by a manual request (MR) from the VDU. Three types of messages can be requested by the VDU: (a) Action request (AR) - which may result in a status change and hence a status update on the VDU. (b) Status request (SR) - which is a simple request for information. (c) Diagnostic request (DR) - which is a simple request to run a diagnostic. No action is taken as a function of the result of the diagnosis when the request comes from the MAP. MAN-MACHINE PROTOCOL 6.04 Refer to Figure 6-2. Having observed a status change on the system VDU display, the maintenance personnel can request sublevel information, showing overall sub-system status. Further information is retrievable from the logging mechanism on diagnostic, error and status messages stored therein. Based on this information the fault is repaired and the maintenance personnel then request that the component be returned to service. This request results in a diagnostic being run, which if successful, then performs the action of returning the repaired component to service. The associated action and status messages are also generated at this time. If the diagnostic fails, a diagnostic message is sent resulting in a repetition of the repair attempt and a re-run of the diagnostic. The fault will continue to be displayed as an alarm on the VDU, even if the audible alarm is off, until the diagnostic is successful. 7. REFERENCES System Description. DMS 100 Family of Digital Multiplex Switching Systems. NTP 297-1001-100 Maintenance and Administration Position NTP 297-1001-110 I/O Message System NTP 297-1001-104 8. ABBREVIATIONS AF- - Access Fault AR- - Action Request CC- - Central Control (CPU, DS, PS) CI- - Command Interpreter CMC- - Central Message Controller CP- - Call Processing CPU- - Central Processor Unit DC- - Device Controller cards (part of IOC) DR- - Diagnostic Request DS- - Data Store FD- - Fault Detection IOC- - Input-Output Controller LM- - Line Module MAP- - Maintenance and Administration Position MMI- - Man-Machine Interface MR- - Manual Request MSIP- - Maintenance Sub-system Interaction Protocol NM- - Network Module NMC- - Network Message Controller ODM- - Office Data Modification PM- - Peripheral Modules (LM, TM) PS- - Program Store SR- - Status Request SS- - Sub-system TL- - Trunk/Line Maintenance TM- - Trunk Module TTY- - Teletype or printer VDU- - Visual Display Unit