Preview

Dependability

Advanced search
Vol 17, No 1 (2017)
View or download the full issue PDF (Russian) | PDF
https://doi.org/10.21683/1729-2646-2017-17-1

STRUCTURAL RELIABILITY. THE THEORY AND PRACTICE

4-10 7493
Abstract

Aim. Solving the task of ensuring the dependability of flexible space structures requires an unambiguous interpretation of the term “dependability”, as there is an objective need for considering each and every of the many factors that affect the operating performance. In this case, neither the parametric, nor the functional definitions of dependability given in GOST 27.002 are acceptable. The functional definition of dependability does not require a profound knowledge of the physical principles of flexible structures operation, identification and management of the factors that can cause failures, while the parametric definition of dependability does not allow for a complete parametric description of a product, as the explanation of the term “dependability” states and assumes the presence of factors that are “impossible” or “unnecessary” to characterize based on parameters.

Methods. The contradiction between the parametric and functional definitions of dependability can be resolved by means of the hypothesis of confluence of the parametric and functional approaches to dependability that implies that if all the parameters that characterize the ability of a product to perform the required functions continuously maintain their values in time in specified modes and conditions of operation, maintenance, storage and transportation, then the composite dependability indicator of such product also maintains its values in time in specified modes and conditions of operation, maintenance, storage and transportation. Under the hypothesis of confluence of the parametric and functional approaches to dependability omissions in the parametric description of a product in operation are not allowable. As a consequence, the parametric description must take into consideration not only the parameters, but also the indicators that are not technically measurable, but can be evaluated quantitatively. E.g. the probability of an event can be evaluated within the range from 0 to 1.

Results. The parametric description of a flexible structure based on all parameters and indicators that characterize the ability to perform the require functions allows expressing all values of parameters in different units and all abstract numeric values of indicators numerically to enable the “addition” of the parameters and indicator values. For that purpose, the values of each of the parameters and indicators within the specified limits are evaluated subject to the probability of being with the specified limits over the operation time. Thus found probabilities of the parameters and indicators being within the specified ranges can be reduced to a single generalized dependability indicator by using the method of dependability structure diagram that takes into consideration the functional connection between the operation of elements with a certain reliability in a specific sequence.

Conclusions. The article shows the possibility of a uniform understanding of parametric and functional dependability that are connected in terms of meaning, concepts, definition and methodology. In order to solve the flexible structures dependability tasks when every little detail must be taken into consideration, a parametric definition of the term “dependability” can be used with the addition of just two words to the definition given in GOST 27.002. As a result, the definition of the term “dependability” required and sufficient for the purpose of flexible structures dependability can be as follows: “Dependability is the property of an object to maintain in time and within the set limits the values of all parameters and/or indicators that characterize the ability of the system to perform the required functions in specified modes and conditions of operation, maintenance, storage and transportation”.

11-16 8771
Abstract

Automated control equipment is being progressively integrated into the power grid. Automated process control in the electric power industry involves the ability to control the position of electrical switching devices, monitor their current status, as well as display numeric data on currents, voltages, etc. Disruptions in the automated operations control facilities (AOCF) cause defects in power systems and grid equipment. AOCF failures impair condition monitoring of power grids and operation of switching devices. Due to the impossibility of real-time remote management of power supply installations, the power provider is unable to guarantee continuous power supply. The article analyzes the AOCF equipment currently in operation as part of distributed systems at power substations, looks into the advantages and drawbacks of specific facilities, suggests methods to increase the dependability of equipment operation in the context of a 35-110 kV distributed power supply network. An AOCF equipment certification procedure is proposed. It is also suggested to provide process control documentation for power supply facilities that contain operator process control facilities (OPCF). The documentation is to be stored in maintenance areas as hard copies and at the IT portal as scans. The availability of the documentation at power supply facilities increases labor productivity of engineering personnel that perform operational checks and accident recovery activities. Apart from the mandatory set of substation documents (for the operational, maintenance and RPEA personnel), it is suggested to equip substations with OPCF equipment diagrams. This optimization minimizes time expenditures and errors made during maintenance and repair activities on automated supervisory and process equipment at power supply facilities. That enables remote management of systems operation recovery (power supply, resetting of sensors, controllers, data collection and communication devices, etc.). The efficiency of operational checks by engineering personnel is increased. The absence of emergencies ensures uninterrupted power supply to all categories of consumers and thus increases the overall investment potential of the power supply industry. Therefore, the fail-safe operation of equipment is an obvious factor of Russian technology development as well as complies with the Rosseti regulations regarding the common engineering policy in the integrated power grid.

17-21 4046
Abstract

Redundancy is one of the primary ways of improving dependability. In particular, structural redundancy is used. In such cases fail-safe operation of elements, devices and systems can be ensured. Fail-safety can enable mitigation of both faults and failures. The paper examines the matter of increasing dependability by means of the so-called sliding redundancy that ensures the health of systems of n elements with m redundant elements that can replace any of the main elements. It is proposed to improve sliding redundancy through recovery of elements out of a number of failed elements that have retained some functionality (basis). For example, the basis of the logical (Boolean) function in terms of Post’s theorem is available if such function is not a zero-preserving function, not a one-preserving function, not a self-dual function, not a line function, not a monotone function. Previously, the author proposed the so-called functionally complete tolerant logical functions (FCTF) that do not only possess functional completeness but retain it under the specified failure model. Then even a failed element remains functionally complete, yet with reduced capabilities, e.g. becomes a 2OR-NOT, though the FCTF can be implemented with an element 2AND-2OR-NOT. In this case the recovery of the original function requires several 2OR-NOT elements. However, the diagnostics of such elements and their reconfiguration in case of failure are problematic. This approach can be interpreted with logic recovery of programmable logic devices (PLD) that is based on the so-called Look Up Tables (LUT) that are memory devices based on 16:1 multiplexers. The circuit is a transmitting transistor tree. If they fail, the healthy half of LUT can be used. By means of reconfiguration using standard PLD facilities that contain local and global connections matrix, such “semi-LUTs” can be transformed into LUTs whose functions are equivalent to initial ones. That equals to an increase of the number of redundant elements. Sliding redundancy with recovery of elements out of several failed ones that retained the basis can be used in critical system applications when repair or replacement of elements is impossible. The article proposes a formula that takes such recovery into consideration, analyzes the special features of such redundancy and evaluates the advantages for dependability.

22-26 4118
Abstract

Aim. Traditionally, the dependability indicators of resistive position sensors based on wire-wound potentiometers used in various control systems are confirmed by means of appropriate dependability tests or tests of comparable products. For cases of non-availability of comparable products test data or significant changes in the product’s design and materials, a method for short-term dependability testing and dependability indicators forecasting is required. Calculations of dependability indicators are to be based on statistical information on the variations of properties and parameters over the course of dependability testing along with research findings regarding the physical patterns, descriptions of process kinetics that cause such variations.

Methods. The analysis of physical processes that cause catastrophic changes in resistive position sensors has shown that under electrical loads thermal and electrical fields form that cause electrokinetic, thermoelectric, thermo-diffusion effects. In all cases the rates of physical and chemical processes are functions of material temperature, have temperature dependence and are described with the Arrhenius equation. The conducted research allowed establishing that variations of the position sensors’ impedance are largely defined by the processes occurring in the resistive element. The temporal dependence of impedance can be described with a logarithmic, exponential or polynomial dependence.

Results. Mathematical models that describe physical and chemical processes occurring in resistive position sensors in operation allowed developing a scientifically grounded calculation and experimental method for short-term reliability testing. The method includes the description of thermal and electrical modes, durability testing conditions and timing. It is shown that the results of such tests are used in subsequent statistical processing for the purpose of forecasting dependability values. Gamma-percentile time to failure and failure rate are evaluated by means of forecasting the degradation of the acceptance criterion values. The dependence of acceptance criteria values acquired in the course of the tests is approximated by a straight line, exponential curve or a polynomial equation. The form of the approximating line for forecasting the value of gammapercentile time to failure and failure rate is defined analytically based on the adopted model that describes the physical and chemical processes occurring in potentiometers in operation. The value of acceptability criteria of gamma-percentile time to failure required by the performance specifications and technical regulations is identified through extrapolation of the approximating line as a continuation of the chosen approximating curve (straight line).

Conclusions. The provided test data for short and long-term reliability corresponds to the calculated values of dependability indicators, which confirms the applicability of the developed calculation method. The application of the proposed method allows reducing the scope and duration of costly dependability tests.

27-31 4455
Abstract

The article deals with the identification of dependability of manufactured samples of radioelectronic systems. This task belongs to the class of a posteriori analysis. In order to identify the dependability characteristics of equipment, upon production of a pilot batch one performs a posteriori analysis whose first stage is the statistical test (ST). There are a lot of methods for such tests that primarily depend on identifying the time of test completion (r – to failure of r systems, T – upon reaching operation time T, n – to failure of all systems, as well as mixed ones) and the ability to replace failed systems with healthy ones. Such tests are necessary because at the design stage a designer does not possess complete a priori information that would allow identifying the dependability indicators in advance and with a sufficient accuracy. An important source of dependability information is a system for collection of data on product operational performance. There are two primary types of dependability tests. One of them is the determinative test intended for evaluation of dependability indicators. It is typical for mass-produced products. Another type of test is the control test designed to verify the compliance of a system’s dependability indicators with the specifications. This paper is dedicated to the first type of tests. It shows the procedure for statistical tests of radioelectronic systems using various procedures. Evaluation of the mean time to failure is usually performed by means of the method of maximum likelihood. The essence of the method is that in the process of statistical data processing the likelihood function is found, while the required parameter ( is the evaluation of parameter t*) equals to the argument value under which the likelihood function is maximal. The evaluation of the mean time to failure is a point estimate of the initial parameter t*, which in turn is a random value and within a specific test can take any positive value from 0 to ∞. Therefore, in addition to the point estimation an interval estimation of the measured parameter is usually performed. That means that estimation identifies the confidence interval ( ) in which the value of the measured parameter t* with a specified probability is found. Here are respectively the lower and upper limits of a confidence interval. The article considers two procedures of testing pilot batches of radioelectronic systems, and for each of them the following dependability indicators are defined: evaluation of mean time to failure; confidence interval of mean time to failure. It is shown that for the purpose of identifying the mean time to failure, test procedure [n, V, r] is more efficient than procedure [n, B, r].

FUNCTIONAL RELIABILITY. THE THEORY AND PRACTICE

32-39 2548
Abstract

The aim of the article is to develop a method that would allow for a quantitative evaluation of stability risks of hardware and software systems under simulated information technology interference and simulation of real management process cycle. The article shows the relevance and importance of the methods for risk evaluation of hardware and software systems stability in the context of targeted and coordinated information technology interference. Information technology interference is understood as targeted and coordinated hardware and software, as well as software actions aimed at temporary disruption of operation or logical defeat of hardware and software systems. Successful information technology interference is conditioned by the presence of vulnerabilities in the hardware and software systems that include IP and MAC addresses and communication equipment ports available to the intruder. The method presented in the article is based on the following: risk evaluation is performed using a test bed or active facilities with the involvement of respectively a fixed and portable information technology measures simulation system. The risk of destabilization of hardware and software systems is evaluated experimentally as the combination of frequency and consequences of successful information technology interference. The preliminary risk evaluation allows choosing the solution for information protection in order to eliminate potential vulnerabilities. The residual risk is evaluated based on the ability of hardware and software systems to eliminate the consequences of information technology interference through various inbuilt resilience features. The research resulted in the proposed method of evaluation hardware and software system security risks under information technology interference as a logical sequence of steps: risk analysis of information technology interference; identification of vulnerabilities, simulation of system operation processes under information technology interference at the trial facility; selection of the best information protection and system fault tolerance facilities; preliminary an final evaluation of system stability risks. As part of the method, probability and temporal indicators of hardware and software systems stability risk evaluation were developed that enable analysis of recovery from threats of combined information technology interference, selection of rational information protection and fault tolerance measures. As part of the method, it is proposed to use a cubic analysis scheme of elimination of vulnerabilities of critical elements of hardware and software systems that allows identifying the levels of tolerable risk and levels of reference model of interaction of open systems required for elimination of vulnerability subject to the frequency of information technology interference. Additionally, a certificate of evaluation of stability risks of hardware and software systems subject to the frequency of successful interferences was developed. In the conclusion it is noted that the developed method allows using the knowledge regarding potential vulnerabilities and experimental studies to identify the probabilistic values of security risks in order to determine the most hazardous threats and adoption of respective information protection measures.

ACCOUNTS

53-58 5059
Abstract

The matters of ensuring dependable and safe operation of NPP facilities is of significant relevance. That is due to the fact that the proportion of equipment at the end of assigned service life in the nuclear power industry is very high, thus dependability analysis of NPP elements and systems is required. In the process of dependability characteristics analysis a number of problems occur, i.e. evaluation of residual life of equipment, justification of life extension decisions. Also, it is required to provide spare parts for elements and systems, select maintenance strategies, etc. That increases the value of activities aimed at analyzing the dependability of nuclear power facilities and, subsequently, the requirement to develop the methods of analysis of statistical information on the operation of NPP elements, subsystems and systems for the purpose of identifying their performance parameters. At nuclear power plants, activities are organized to collect information on the operation of various facilities, i.e. failures and defects of system components, maintenance procedures, operating modes, storage conditions, etc. The information provided by the NPPs has a number of distinctive features. That is due to the following factors: presence of censorship of failure data, absence of sufficient service hours within the given observation interval and the limited volume of available data. All those factors cause an uncertainty in the resulting evaluations and, subsequently, lower that optimal accuracy on dependability characteristics calculation. In the process of evaluating the dependability of facilities in operation a certain part of facilities and systems often does not fail over the period of observation. In such situations statistical analysis of dependability is required that is based on the so-called right censored samples of which the distinctive feature consists in the fact that the inspected product does not fail within the period of observation. In some cases the operation times of specific facilities are unknown. For instance, at the initial stage of facility operation information on its performance was not collected, and the decision to collect data was taken later. In this case the required method must take into consideration the missing information that was not collected at the initial stage. The limited volume of information is due to the fact that the nuclear energy facilities fall into the category of highly dependable equipment. Failures are rare events. Therefore in order to increase the reliability of dependability indicators estimation all the available information must be used. Thus, taking into account all the available information enables more accurate results that can be used to calculate NPP facility service life. The purpose of this article is to show the application of the method of repeated sample and examine its efficiency. The main focus is on missed data that are to be recovered. The authors provide the results of evaluation of the exponential distribution law parameter subject to right censored and missed data. The suggested method of repeated sample is compared with the bootstrap method and mean substitution method. For evaluation of exponential distribution law parameter the authors suggest using the maximum likelihood method. Statistical characteristics calculation is provided. All the calculations and results are based on test cases.

FUNCTIONAL SAFETY. THE THEORY AND PRACTICE

40-45 5253
Abstract

Aim. Industrial safety (OS) is the state of protection of operating personnel from harmful effects of manufacturing processes, energy, equipment, objects, conditions and schedule of work [1]. The most efficient evaluation of OS in railway transportation is ensured by composite indicators, one of which is the risk assessment indicator. That is also reflected in the Russian legislation that stipulates the requirement to evaluate fire, occupational and other types of risks that affect industrial safety. According to the definition set forth in GOST 33433-2015 [2] risk is a combination of the probability and consequences of an event. The most complicated task related to risk assessment is the choice of the evaluation model for the probability of an undesired event. The model must enable practical applicability of evaluation results for planning of risk compensation measures. Currently, there are a large number of probability evaluation methods that can be divided into two large groups, i.e. expert and quantitative. Expert methods have several well-known shortcomings. The quantitative methods require the construction of a system of equations or an analytical model. In the context of railway facilities the construction of analytical models of probability evaluation is of principal interest due to the possibility of demonstration of the factors that are taken into consideration by the model. The aim of the article is to formalize the analytical method for evaluation of the probability of railway facility transfer into a hazardous state (in the context of industrial safety).

Methods. Undesirable events that cause industrial safety incidents in railway facilities are random; they can be represented as a random process. A random system development process, including objects transition from a safe state into hazardous (undesirable) states, i.e. system state change in time, can under some assumptions be described with a semi-Markov process. In general, the construction and solution of semi-Markov models comes down to building a system of homogenous differential equations. This procedure always involves mathematical difficulties. [3] shows the possibility of representation and solution of semi-Markov models with a coupled graph model. Such models are highly visual, and allow formalizing the wanted system states, as well as paths of transition from safe into hazardous states. The main problem of modelling random processes of industrial safety state changes is the requirement to identify the complete list of hazardous states and preceding non-hazardous or pre-hazardous states. The processes typical to railway facilities are characterized by a multitude of states that cause various events. The concept of “state” usually characterizes an instantaneous image, a “cross-section” of a system. Thus, at the first stage of construction and solution of a model of random process of a system’s industrial safety state change, the finite sets of safe and hazardous states of the railway facility under consideration are identified in accordance with the known hazardous state criterion [4]. As the process of state change of a system’s industrial safety in railway transportation is random in time, in this article system operation is described with a semi-Markov process with the assumption that the discrete process is described with an embedded Markov chain. The set of system states and their connections are represented with a directed state graph with defined topological concepts [3]. For a constructed model, the article provides the proof of the theorem identifying the probability of system transition from an initial non-hazardous into a hazardous state, as well as the formula for calculation of such probability.

Results. The graph method for evaluation of industrial safety in railway facilities developed in this article includes both the rules of construction of a system’s safety states graph and the tool for evaluation of the probability of system transition into a specific hazardous state. The graph is the basis of the practical method for calculation and forecasting of industrial safety incidents. The article provides the proof of the theorem identifying the probability of system transition from an initial non-hazardous into a hazardous state, as well as an example of application of graph method for evaluation of probability of fire in a fixed facility. The proposed probability evaluation method can be used in planning of industrial safety measures in terms of specification of new states or rules of transition into associated states.

46-52 5144
Abstract

The aim of this article is the analytical evaluation of dependability and reliability indicators of vital facility supervision and control systems. Such indicators include: probability of no-failure, collective failure rate, wrong-side and right-side failure rate, average service life. The article considers systems with different redundancy rates (2-oo-2, 2-oo-3, 2-oo-2-by-2) ensuring recovery of failed equipment (channels) without interruption of operation. The paper covers such safety and reliability mechanisms as interchannel data comparison, mutual channel blocking and protection against negative failure development by mutual channel blocking.

Methods. For the purpose of achieving the set goal, the article suggests a mathematical functional model based on absorbing homogenous Markovian continuous-time chains. The states of this chain reflect the number of good channels of the system, while state transition rates are identified based on the equipment failure rates of each channel and repair rates (subject to the mechanisms of interchannel data comparison and failed channel blocking). The absence of protection can be caused by such events as non-detection of failure by supervision facilities, disability of blocking mechanisms, protection tripping delay. In such case the failure of a channel (channels) causes the failure of the whole system and forces the Markovian chain into the absorbing state. The probabilities of transition into the absorbing state are divided into the probabilities of transition into state of right-side failure and state of wrong-side failure. As a failure occurrence in a situation of absent guaranteed protection against its possible negative consequences in a system that continues operating may cause undue inputs to the system’s executive mechanisms and on the assumption of the worst case scenario we deem such failure to be a wrong-side one. The used methods allow finding the probabilities of each state of the chain by solving a system of Kolmogorov-Chapman differential equations. Based on the given probabilities, the collective failure rate and average service life are identified along with the right-side and wrong-side failure rates. In order to ensure the usability of the presented methods, the authors provide approximate formulas of failure rates and approximation errors.

Results. A mathematical model of operation of a multichannel microprocessor system has been developed. Formulas for calculation of system state probabilities, average service life, wrong-side and right-side failure rates have been obtained that allow evaluating the safety and fault tolerance of various systems with hot standby and in-operation operability recovery capabilities. The given formulas for calculation of system state probabilities allow increasing the number of safety and reliability indicators, if needed. The article presents the feasibility of simplified calculation of failure rates.

Conclusions. The formulas given in the article can be used for evaluation of reliability, safety and longevity indicators of microprocessor-based supervision and control circuits of vital facilities (ship-borne technical facilities, trackside equipment in railway stations and open lines, fixed power facilities, etc.). In the development process they allow finding the rational system organization by means of comparative evaluation of performance of structures with various degrees of redundancy. In the context of system adaptation for application in various facilities as well as its modernization the formulas in question enable analytical calculation of the above indicators.



ISSN 1729-2646 (Print)
ISSN 2500-3909 (Online)