# Bayesian Methods for Data Analysis in Transcription Networks Term Paper

**Pages:** 6 (1735 words) ·
**Bibliography Sources:**
10 · **File:** .docx · **Level:** Master's · **Topic:** Education - Mathematics

¶ … Bayesian method refers to methods on probability and statistics particularly those related to the degree of belief interpretation of probability as opposed to frequency interpretation. In Bayesian statistics, a probability is assigned to a statement, whereas under frequency conditions the hypothesis is tested without this probability. Bayesian method, or probability, may be seen as an extension of logic that involves dealing with uncertain situations and statements. To evaluate the probability of a hypothesis, the Bayesian statistician specifies some prior probability, updates the condition of the statement according to current data and then applies standard calculations and formulaes to assess probability of its occurrence and accuracy.

In general, Bayesian methods are characterized by the following concepts:

The use of hierarchical models (a topic in modern Bayesian analysis where a simple model is pulled out and expressed in a richer manner than could be expressed in a simple Bayesian analysis) as well as the marginalization (i.e. The probability of a occurring over B) over nuisance parameters (i.e. those parameters that have to be considered only because of more significant parameters).

2. The sequential use of the Bayes formula: i.e. when more data become available after calculating a later (or posterior) distribution, the posterior becomes the next prior basis.

Buy full paper

for $19.77 3. As opposed to frequentist statistics where a hypothesis is a statement that must be either true of false, Bayesian statistics assigns a probability to a hypothesis. The Bayesian formula (or theorem), in short, tells us that the posterior probability (of the experiment) is proportional to the likelihood of the observed data multiplied by the prior probability of the outcome and is expressed by the following theorem:

Here H. denotes the hypothesis (and one always starts off with some prior belief about the truth status of H, and this is referred to as P (H)). D refers to the outcome of the experiment. The Bayesian theorem determines what the probability should be once D (i.e. outcome) is known. The probability of the hypotheses once the outcome is known is referred to as posterior probability (i.e. P (H/D)).

## Term Paper on

Bayesian methodology heavily relies on graphical models to demonstrate conditional independence structure between random variables. Two categories of models are used: Bayesian networks and Markov networks. The former also known as 'directed acyclic graphical models' (or Bayesian network) is a factorization of the joint probability of all random variables.

The model's purpose is to provide algorithms with which to discover and analyze structure in complex systems, to summarize them concisely and extract unstructured information, to construct them as computer models and to utilize them effectively.

In the computer field, Bayesian inference or methodology has applications to artificial intelligence and to expert systems. Since graphical models allow for efficient simulation algorithms, Bayesian methodology and networks are expanding applications in many areas and there is growing connection between Bayesian methods and simulation-based Monte Carlo techniques. It is for this reason that applications of 'directed acyclic graphical models' have extended to phylogenetics where they are used for modeling genetic regulatory networks, otherwise known as genetic transcription. Bayesian systems are beneficial in that they allow for mapping and estimating demographic and evolutionary parameters simultaneously. Significant research in this field is the subject of this essay.

Genetic Transcription and Bayesian methodology

Bayesian networks are used to simulate dependencies and conditional independencies amongst genetic random variables (xi.).

Using Bayesian networks to formulize the structure of a genetic regulatory system, Pearl (1988) and Friedman and colleagues (2000) used directed acyclic graphs as models. Specific algorithmic vertices represent genes and correspond to random variables xi. If I is a gene, xi. describes the expression of that gene. For each xi. A conditional distribution (p. xi / parents (xi.) )) is described where parents correspond to the variables that directly regulate i. The combination of the graph and the conditional distributions (p. xi / parents (xi.) )) specify a joint probability distribution p (x).

Friedman's work produced promising results and laid the pioneering basis for Bayesian networks used in genetic transcription, but Friedman et al. (2000) felt that for greater success in inferring networks from expression data, larger datasets were required. Hartemink et al. (2001) subsequently incorporated hidden variables into the network, which could capture the influence of concealed factors (e.g. protein levels) and render probabilistic judgment of these, with their predictions being tested later.

The conditional independency (i (xi; Y/Z)) is the case where xi is independent of Y given Z, and Y and Z. indicate variables. Here the Markov assumption is used where additional conditional independencies are implied. Two graphs (i.e.. two Bayesian networks) are equivalent if they show the same set of independencies. Equivalence graphs have the same underlying principles, but may disagree on some of the direction of the edges (Friedman et al., 2000).

Conditional independencies, or non-linear relationships between genes, were introduced by Imoto et al. (2000) who suggested that the parent genes do not depend linearly on objective genes.

Learning techniques of Bayesian networks allows finding the network, or equivalence class of networks that best matches D, given that D. refers to the expression data in the set of independent values for X (Heckerman, 1998). Since the situation compels one to use heuristic search methods here, outcome is uncertain.

Friedman and colleagues (2000) have proposed a heuristic algorithm for the inference of Bayesian networks from expression data that can deal with this dilemma. Instead of searching for a single network, or single equivalence set of networks, they focus on features that are shared by high-scoring networks, these being particularly order and Markov relations between xi and xj. Such an order relation may indicate a causal connection between the corresponding genes. Pe'er et al. (2001) has extended the method to deal with genetic mutations and to consider additional features such as activation, inhibition, and mediation relations between the variables.

In 1998, Spellman and colleagues had formulated a cell cycle data set. Pe'er et al. (2001) extended Markov relations to investigate applications of this data set. This research was appended to by Friedman and colleagues who, inspecting the high-confidence order relations in the data, added to Spellman's data by discovering that only a few genes dominate the order, likely demonstrating that these dominant genes regulate the cellular cycle process. Many of them are, indeed, involved in cell-process initiation and control.

Most Recent Research

Most recent research has focused on improving Bayesian application to modeling genetic regulation networks (GRN) such as incorporating previous knowledge, as for instance, known dependencies between variables of the system.

Zhou et al. (2005) also tightened the accuracy of the Boolean system when applied to GRN by introducing a new approach that would, simultaneously, also improve computation complexity. This approach limits potential cell-cycle regulators to genes that have been indicated to have early or simultaneous expression change in correspond ace to target genes.

Other studies include discussion of expression data type (e.g. discrete vs. continuous) on the results; the possibility of utilizing synthetic data with incremental processes; the role of prior knowledge in improving the Bayesian model; altogether improving the performance of the model; and the possibly of studying implicit, or concealed, variables of the model (*).

Challenges and Future Research Directions

Certain issues are still unresolved. These include the following:

1. Discrete vs. continuous expression data: Researchers argue over whether it is preferable to use the one vs. The other, in that each has advantages and disadvantages. The issue is over the best way in which to block out the noise.

2. Noise: Noise is inherent in microarray experiments due to the huge number of confounding variables that are involved. Although Boolean methodology is inherently equipped to dealing with this noise (and is, therefore, ideal for general modeling), conflict persists over which method to use: continuous or discrete data;

3. Other problems include the issue of prior knowledge on Boolean models; attempts to limit the number of regulators of the target gene in order to tighten the search space; attempting to understanding the conditional stages and roles of the gene regulators and whether an active regulator is going to act as activator or inhibitor

Conclusion

Bayesian networks facilitate genetic research in countless ways, particularly since they lead to an unprecedented growth in expanding the current gene expression data. Until now, traditional research focused on single gene, protein, or reaction at a time, but the employment of Markov computation or 'directed acyclic graphical models' has enabled corresponding patterns of thousands of genes to be discovered simultaneously. Not only is this important to furthering scientific understanding of genes at a cellular level, but this knowledge can be used to medical ends where genetic predisposition to disease can be predicted, and where findings from Bayesian models can serve as diagnostic markers. More so, resultant discovers can help scientists form better treatment options for various diseases and help them understand other, seemingly enigmatic diseases, better.

Bayesian networks, when applied to modeling genetic transcriptions, are more attractive than any other mathematical or scientific approach, since their solid basis in statistics enables… [END OF PREVIEW] . . . READ MORE

In general, Bayesian methods are characterized by the following concepts:

The use of hierarchical models (a topic in modern Bayesian analysis where a simple model is pulled out and expressed in a richer manner than could be expressed in a simple Bayesian analysis) as well as the marginalization (i.e. The probability of a occurring over B) over nuisance parameters (i.e. those parameters that have to be considered only because of more significant parameters).

2. The sequential use of the Bayes formula: i.e. when more data become available after calculating a later (or posterior) distribution, the posterior becomes the next prior basis.

Buy full paper

for $19.77 3. As opposed to frequentist statistics where a hypothesis is a statement that must be either true of false, Bayesian statistics assigns a probability to a hypothesis. The Bayesian formula (or theorem), in short, tells us that the posterior probability (of the experiment) is proportional to the likelihood of the observed data multiplied by the prior probability of the outcome and is expressed by the following theorem:

Here H. denotes the hypothesis (and one always starts off with some prior belief about the truth status of H, and this is referred to as P (H)). D refers to the outcome of the experiment. The Bayesian theorem determines what the probability should be once D (i.e. outcome) is known. The probability of the hypotheses once the outcome is known is referred to as posterior probability (i.e. P (H/D)).

## Term Paper on *Bayesian Methods for Data Analysis in Transcription Networks* Assignment

Bayesian methodology heavily relies on graphical models to demonstrate conditional independence structure between random variables. Two categories of models are used: Bayesian networks and Markov networks. The former also known as 'directed acyclic graphical models' (or Bayesian network) is a factorization of the joint probability of all random variables.The model's purpose is to provide algorithms with which to discover and analyze structure in complex systems, to summarize them concisely and extract unstructured information, to construct them as computer models and to utilize them effectively.

In the computer field, Bayesian inference or methodology has applications to artificial intelligence and to expert systems. Since graphical models allow for efficient simulation algorithms, Bayesian methodology and networks are expanding applications in many areas and there is growing connection between Bayesian methods and simulation-based Monte Carlo techniques. It is for this reason that applications of 'directed acyclic graphical models' have extended to phylogenetics where they are used for modeling genetic regulatory networks, otherwise known as genetic transcription. Bayesian systems are beneficial in that they allow for mapping and estimating demographic and evolutionary parameters simultaneously. Significant research in this field is the subject of this essay.

Genetic Transcription and Bayesian methodology

Bayesian networks are used to simulate dependencies and conditional independencies amongst genetic random variables (xi.).

Using Bayesian networks to formulize the structure of a genetic regulatory system, Pearl (1988) and Friedman and colleagues (2000) used directed acyclic graphs as models. Specific algorithmic vertices represent genes and correspond to random variables xi. If I is a gene, xi. describes the expression of that gene. For each xi. A conditional distribution (p. xi / parents (xi.) )) is described where parents correspond to the variables that directly regulate i. The combination of the graph and the conditional distributions (p. xi / parents (xi.) )) specify a joint probability distribution p (x).

Friedman's work produced promising results and laid the pioneering basis for Bayesian networks used in genetic transcription, but Friedman et al. (2000) felt that for greater success in inferring networks from expression data, larger datasets were required. Hartemink et al. (2001) subsequently incorporated hidden variables into the network, which could capture the influence of concealed factors (e.g. protein levels) and render probabilistic judgment of these, with their predictions being tested later.

The conditional independency (i (xi; Y/Z)) is the case where xi is independent of Y given Z, and Y and Z. indicate variables. Here the Markov assumption is used where additional conditional independencies are implied. Two graphs (i.e.. two Bayesian networks) are equivalent if they show the same set of independencies. Equivalence graphs have the same underlying principles, but may disagree on some of the direction of the edges (Friedman et al., 2000).

Conditional independencies, or non-linear relationships between genes, were introduced by Imoto et al. (2000) who suggested that the parent genes do not depend linearly on objective genes.

Learning techniques of Bayesian networks allows finding the network, or equivalence class of networks that best matches D, given that D. refers to the expression data in the set of independent values for X (Heckerman, 1998). Since the situation compels one to use heuristic search methods here, outcome is uncertain.

Friedman and colleagues (2000) have proposed a heuristic algorithm for the inference of Bayesian networks from expression data that can deal with this dilemma. Instead of searching for a single network, or single equivalence set of networks, they focus on features that are shared by high-scoring networks, these being particularly order and Markov relations between xi and xj. Such an order relation may indicate a causal connection between the corresponding genes. Pe'er et al. (2001) has extended the method to deal with genetic mutations and to consider additional features such as activation, inhibition, and mediation relations between the variables.

In 1998, Spellman and colleagues had formulated a cell cycle data set. Pe'er et al. (2001) extended Markov relations to investigate applications of this data set. This research was appended to by Friedman and colleagues who, inspecting the high-confidence order relations in the data, added to Spellman's data by discovering that only a few genes dominate the order, likely demonstrating that these dominant genes regulate the cellular cycle process. Many of them are, indeed, involved in cell-process initiation and control.

Most Recent Research

Most recent research has focused on improving Bayesian application to modeling genetic regulation networks (GRN) such as incorporating previous knowledge, as for instance, known dependencies between variables of the system.

Zhou et al. (2005) also tightened the accuracy of the Boolean system when applied to GRN by introducing a new approach that would, simultaneously, also improve computation complexity. This approach limits potential cell-cycle regulators to genes that have been indicated to have early or simultaneous expression change in correspond ace to target genes.

Other studies include discussion of expression data type (e.g. discrete vs. continuous) on the results; the possibility of utilizing synthetic data with incremental processes; the role of prior knowledge in improving the Bayesian model; altogether improving the performance of the model; and the possibly of studying implicit, or concealed, variables of the model (*).

Challenges and Future Research Directions

Certain issues are still unresolved. These include the following:

1. Discrete vs. continuous expression data: Researchers argue over whether it is preferable to use the one vs. The other, in that each has advantages and disadvantages. The issue is over the best way in which to block out the noise.

2. Noise: Noise is inherent in microarray experiments due to the huge number of confounding variables that are involved. Although Boolean methodology is inherently equipped to dealing with this noise (and is, therefore, ideal for general modeling), conflict persists over which method to use: continuous or discrete data;

3. Other problems include the issue of prior knowledge on Boolean models; attempts to limit the number of regulators of the target gene in order to tighten the search space; attempting to understanding the conditional stages and roles of the gene regulators and whether an active regulator is going to act as activator or inhibitor

Conclusion

Bayesian networks facilitate genetic research in countless ways, particularly since they lead to an unprecedented growth in expanding the current gene expression data. Until now, traditional research focused on single gene, protein, or reaction at a time, but the employment of Markov computation or 'directed acyclic graphical models' has enabled corresponding patterns of thousands of genes to be discovered simultaneously. Not only is this important to furthering scientific understanding of genes at a cellular level, but this knowledge can be used to medical ends where genetic predisposition to disease can be predicted, and where findings from Bayesian models can serve as diagnostic markers. More so, resultant discovers can help scientists form better treatment options for various diseases and help them understand other, seemingly enigmatic diseases, better.

Bayesian networks, when applied to modeling genetic transcriptions, are more attractive than any other mathematical or scientific approach, since their solid basis in statistics enables… [END OF PREVIEW] . . . READ MORE

Two Ordering Options:

?

**1.**Buy full paper (6 pages)

Download the perfectly formatted MS Word file!

- or -

**2.**Write a NEW paper for me!

We'll follow your exact instructions!

Chat with the writer 24/7.

#### Non-Parametric Testing Data Analysis Chapter …

#### Managing Human Resources Analysis of Research Study Methodology Chapter …

#### Data Collection for the Motivation to Three Methodology Chapter …

#### Stages of Data Analysis Thesis …

#### SPSS Statistics Data Analysis Essay …

### How to Cite "Bayesian Methods for Data Analysis in Transcription Networks" Term Paper in a Bibliography:

APA Style

Bayesian Methods for Data Analysis in Transcription Networks. (2011, February 18). Retrieved May 25, 2020, from https://www.essaytown.com/subjects/paper/bayesian-methods-data-analysis-transcription/6512MLA Format

"Bayesian Methods for Data Analysis in Transcription Networks." 18 February 2011. Web. 25 May 2020. <https://www.essaytown.com/subjects/paper/bayesian-methods-data-analysis-transcription/6512>.Chicago Style

"Bayesian Methods for Data Analysis in Transcription Networks." Essaytown.com. February 18, 2011. Accessed May 25, 2020.https://www.essaytown.com/subjects/paper/bayesian-methods-data-analysis-transcription/6512.