Voting algorithms for large scale fault-tolerant systems

Fault tolerance is the feature of computing systems which can continue their normal operation in the presence of fault(s). In line with this, various methods have been proposed in the last three decades. Redundancy is one of the methods in building fault-tolerant systems which is implementable at bo...

Full description

Saved in:
Bibliographic Details
Main Author: Karimi, Abbas
Format: Thesis
Language:English
Published: 2011
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/42337/1/FK%202011%20105R.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.42337
record_format uketd_dc
spelling my-upm-ir.423372016-03-15T05:01:15Z Voting algorithms for large scale fault-tolerant systems 2011-08 Karimi, Abbas Fault tolerance is the feature of computing systems which can continue their normal operation in the presence of fault(s). In line with this, various methods have been proposed in the last three decades. Redundancy is one of the methods in building fault-tolerant systems which is implementable at both hardware and software levels. In these systems, voting algorithms are extensively employed to arbitrate among the results of N redundant hardware modules or software versions. They have a wide range of application in which the goal is to decrease the probability of the system hazardous behavior. So far, various voting methods are proposed which are mostly proper for small-scale systems. In this research, we proposed optimal algorithms using Divide and Conquer, Brent’s theorem and parallel algorithms, appropriate for today’s large scale systems such as satellite processing systems, traffic control, weather forecasting which all face a large quantity of processing input data. To do so, we have introduced a new sequential m-out-of-n algorithm with prediction ability which is known as enhanced m-out-of-n voter. Using a proper test-bed, after 10000 run times, we compared our newly proposed algorithm with the basic algorithm in terms of reliability and availability. By extracting various plots from different aspects, we demonstrated that, compared to the basic algorithm, our algorithm has higher reliability and availability as the quantity of the input data increases. Then, we introduced an appropriate system suitable for analytical modeling. It is called Predictive Hybrid m-out-of-n redundancy (PHmn) which is applicable for such systems as X-by-wire. To investigate the reliability and availability of this structure, discrete Markov models were obtained for reliability and availability analysis. The results of analytical modeling, based on different values of N, M, λ (failure rate), and μ (repair rate) demonstrated that the availability and reliability of the analytical modeling verify simulation result. Among basic voting algorithms, average voter and weighted average voter have higher availability but unfortunately they have higher time and calculation complexity in large scale systems. To solve this problem and gain benefits of this algorithm, we employed parallel algorithm technique and by using optimal number of processors, we could propose optimal algorithms known as Parallel Average Voting and Parallel Weighted Average Voting which both have optimal time complexity and less calculation cost. Since Plurality voting, among the popular and widely applied algorithms, has more correct responses even more than the most known and practical voting algorithm like majority and by relying on the benefit of parallel algorithms, we proposed parallel plurality voting with the minimum number of processors in an optimal time compared to its sequential type. In addition to having all the features of sequential algorithm, this algorithm has far less time complexity and has higher processing speed in voting process in large scale systems. In a nut shell, we tried to introduce voting algorithms and structures suitable for large scale fault-tolerant systems which have optimal and proper time complexity (in parallel voting algorithms) and more reliability and availability (in enhanced m-out-of-n voting algorithm) compared to the basic types. Fault-tolerant computing Computer systems - Reliability Algorithms 2011-08 Thesis http://psasir.upm.edu.my/id/eprint/42337/ http://psasir.upm.edu.my/id/eprint/42337/1/FK%202011%20105R.pdf application/pdf en public phd doctoral Universiti Putra Malaysia Fault-tolerant computing Computer systems - Reliability Algorithms
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
topic Fault-tolerant computing
Computer systems - Reliability
Algorithms
spellingShingle Fault-tolerant computing
Computer systems - Reliability
Algorithms
Karimi, Abbas
Voting algorithms for large scale fault-tolerant systems
description Fault tolerance is the feature of computing systems which can continue their normal operation in the presence of fault(s). In line with this, various methods have been proposed in the last three decades. Redundancy is one of the methods in building fault-tolerant systems which is implementable at both hardware and software levels. In these systems, voting algorithms are extensively employed to arbitrate among the results of N redundant hardware modules or software versions. They have a wide range of application in which the goal is to decrease the probability of the system hazardous behavior. So far, various voting methods are proposed which are mostly proper for small-scale systems. In this research, we proposed optimal algorithms using Divide and Conquer, Brent’s theorem and parallel algorithms, appropriate for today’s large scale systems such as satellite processing systems, traffic control, weather forecasting which all face a large quantity of processing input data. To do so, we have introduced a new sequential m-out-of-n algorithm with prediction ability which is known as enhanced m-out-of-n voter. Using a proper test-bed, after 10000 run times, we compared our newly proposed algorithm with the basic algorithm in terms of reliability and availability. By extracting various plots from different aspects, we demonstrated that, compared to the basic algorithm, our algorithm has higher reliability and availability as the quantity of the input data increases. Then, we introduced an appropriate system suitable for analytical modeling. It is called Predictive Hybrid m-out-of-n redundancy (PHmn) which is applicable for such systems as X-by-wire. To investigate the reliability and availability of this structure, discrete Markov models were obtained for reliability and availability analysis. The results of analytical modeling, based on different values of N, M, λ (failure rate), and μ (repair rate) demonstrated that the availability and reliability of the analytical modeling verify simulation result. Among basic voting algorithms, average voter and weighted average voter have higher availability but unfortunately they have higher time and calculation complexity in large scale systems. To solve this problem and gain benefits of this algorithm, we employed parallel algorithm technique and by using optimal number of processors, we could propose optimal algorithms known as Parallel Average Voting and Parallel Weighted Average Voting which both have optimal time complexity and less calculation cost. Since Plurality voting, among the popular and widely applied algorithms, has more correct responses even more than the most known and practical voting algorithm like majority and by relying on the benefit of parallel algorithms, we proposed parallel plurality voting with the minimum number of processors in an optimal time compared to its sequential type. In addition to having all the features of sequential algorithm, this algorithm has far less time complexity and has higher processing speed in voting process in large scale systems. In a nut shell, we tried to introduce voting algorithms and structures suitable for large scale fault-tolerant systems which have optimal and proper time complexity (in parallel voting algorithms) and more reliability and availability (in enhanced m-out-of-n voting algorithm) compared to the basic types.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Karimi, Abbas
author_facet Karimi, Abbas
author_sort Karimi, Abbas
title Voting algorithms for large scale fault-tolerant systems
title_short Voting algorithms for large scale fault-tolerant systems
title_full Voting algorithms for large scale fault-tolerant systems
title_fullStr Voting algorithms for large scale fault-tolerant systems
title_full_unstemmed Voting algorithms for large scale fault-tolerant systems
title_sort voting algorithms for large scale fault-tolerant systems
granting_institution Universiti Putra Malaysia
publishDate 2011
url http://psasir.upm.edu.my/id/eprint/42337/1/FK%202011%20105R.pdf
_version_ 1747811907678502912