type
Post
status
Published
date
Feb 27, 2023
slug
summary
tags
Program Analysis
category
Program Analysis
icon
password
Property
Feb 27, 2023 11:24 AM

Introduction

A Program Dependence Graph (PDG) is a representation of the dependencies between various statements in a program. It can be used to analyze the behavior of a program and identify potential optimizations or bugs. The nodes in a PDG represent statements in the program, and edges represent dependencies between the statements. A dependency can be either a data dependency or a control dependency. Data dependence graphs have provided some optimizing compilers with an explicit representation of the definition-use relationships implicitly present in a source program. A control flow graph has been the usual representation for the control flow relationships of a program; the control conditions on which an operation depends can be derived from such a graph. An undesirable property of a control flow graph, however, is a fixed sequencing of operations that need not hold.
Informal definition: The program dependence graph explicitly represents both the essential data relationships, as present in the data dependence graph, and the essential control relationships, without the unnecessary sequencing present in the control flow graph. These dependence relationships determine the necessary sequencing between operations, exposing potential parallelism.

Terminology

A directed graph G = (N, E) is a finite set of nodes N together with a set E of ordered pairs of nodes called edges. If (n, m) is an edge, then n is a predecessor of m and m is a successor of n.
A graph G' = (N', E') is a subgraph of G if N' is contained in N and E' is contained in E.
A path from n to m in G is a finite sequence of nodes u = n, u1, u2, ..., uk = m such that (n, u1), (u1, u2), ..., (uk-1, m) are edges in E.
The reverse graph of a graph G = (N, E) is a graph (N, E'), where E' consists of edges (n, m) such that (m, n) is in E.
A strongly connected region of a graph G is a subgraph of G in which there is a path from every node to every other node.
notion image

Program Dependence Graph

The Program Dependence Graph (PDG) is a graph-based representation of a computer program that captures the dependencies among program statements and data values. In a PDG, the program is represented as a graph, with each node in the graph representing a statement or a predicate expression (or operators and operands) in the program. The edges in the graph are used to represent the dependencies between these nodes.
  • A predicate expression usually contains one or more boolean operators (such as AND, OR, NOT), comparison operators (such as ==, >, <, >=, <=), or logical expressions (such as if-then-else statements). The result of evaluating a predicate expression is a boolean value, either true or false. Predicate expressions are often used in programming languages to control the flow of execution in programs. For example, in an if statement, the condition is a predicate expression that determines whether or not the code inside the if block should be executed. Predicate expressions are also used in loops, switch statements, and other control structures to determine when and how many times to execute certain code blocks.
  • In mathematics, a predicate is either a relation or the boolean-valued function that amounts to the characteristic function or the indicator function of such a relation.
    • A function P: X→ {true, false} is called a predicate on X. When P is a predicate on X, we sometimes say P is a property of X.
  • For instance, in the sentence, "Mike is eating", we have the subject, 'Mike', and the predicate, 'is eating'. In the context of computer science, we aren't interested in stating a fact, but rather, in testing a true/false condition for the purpose of deciding whether to do something.
    • Person mike; if (!mike.isEating()) feedPerson(mike);
      The isEating() member of mike (an instance of Person) is a predicate. It returns true or false for the assertion that the person (mike in this case) is eating. The predicate is being used to decide whether or not to feed the person.
The edges in the PDG represent both data and control dependencies. Data dependencies indicate that a node depends on the value produced by another node in the program. For example, if statement S1 reads a variable x and statement S2 writes to x, there is a data dependency from S2 to S1. Control dependencies, on the other hand, indicate that the execution of a node depends on the outcome of another node's execution. For example, if statement S1 is an if statement and statement S2 is inside the if block, there is a control dependency from S1 to S2.
The set of all dependences for a program may be viewed as inducing a partial ordering on the statements and predicates in the program that must be followed to preserve the semantics of the original program.
Dependences arise as the result of two separate effects.
— First, a dependence exists between two statements whenever a variable appearing in one statement may have an incorrect value if the two statements are reversed. For example (data dependence),
x = 5 y = x + 3
In this code, the second statement (y = x + 3) depends on the first statement (x = 5), because it uses the value of x to compute the value of y. If we were to reverse the order of the statements, like this:
y = x + 3 x = 5
Then the value of y would be incorrect, because x would not yet have been assigned the value of 5 when the second statement is executed. So in this case, the two statements are dependent on each other, and must be executed in the correct order for the program to work correctly.
— Second, a dependence exists between a statement and the predicate whose value immediately controls the execution of the statement (control dependence),
This means that if the value of the predicate changes, then the execution of the statement may also change.
For example, consider the following code:
if x < 5: y = 2*x else: y = x/2
In this code, there is a dependence between the if statement and the predicate x < 5. The value of the predicate controls which branch of the if statement is executed. If the value of x is less than 5, then the first branch (y = 2*x) is executed, and if it is greater than or equal to 5, then the second branch (y = x/2) is executed.
So in this case, there is a dependence between the statement and the predicate that controls its execution. If the value of x changes, then the value of the predicate may also change, which in turn may cause the execution of the statement to change.

Control Dependence

Methodology: CFG → CDG (Control Dependence Graph)
Def. A node V is post-dominated by a node W in G if every directed path from V to STOP (not including V) contains W.
  • In a directed graph G, "post-dominance" refers to a relationship between nodes where one node (W) "post-dominates" another node (V) if every directed path from V to the end of the program (called STOP) contains W, and W is not equal to V.
    • In other words, if we start at node V and follow any path in the graph that doesn't include V, we will eventually reach STOP, and at some point along that path, we will encounter node W. This means that node W has control over the execution of node V, because any path that leads to the end of the program must go through W.
      Post-dominance is a useful concept in program analysis and optimization, as it can help identify parts of a program that are not affected by changes made in other parts of the program. For example, if a node W post-dominates a node V, we know that any change to the code in node V will not affect the behavior of the code in node W, because W will always be executed after V.
Def. Let G be a control flow graph. Let X and Y be nodes in G. Y is control dependent on X iff (1) there exists a directed path P from X to Y with any Z in P (excluding X and Y) post-dominated by Y and (2) X is not post-dominated by Y.
If Y is control dependent on X then X must have two exits. Following one of the exits from X always results in Y being executed, while taking the other exit may result in Y not being executed.
(1) → 1: There exists a directed path P from node X to node Y such that any node z in P (excluding X and Y) is post-dominated by node Y. In other words, any node along the path P, other than X and Y, is controlled by Y.
(2) → 2: Nothing I can tell.
Condition 1 of the definition of control dependence can be satisfied by a path consisting of a single edge. This means that if there is a direct edge from node X to node Y in the control flow graph G, and no other nodes are between X and Y, then Y is control dependent on X.
Condition 2 of the definition is always satisfied when X and Y are the same node. This allows loops to be correctly accommodated by the definition, because a node can be control dependent on itself if it contains a loop.
Note that the transitive closure of the definition corresponds to the notion of the range of a branch.
  • The range of the branch that starts at node x is the set of all nodes that can be reached from node x by following a path that satisfies the conditions of control dependence (direct case), namely that all nodes in the path (excluding the start and end nodes) are post-dominated by the end node y.
When applied to a loop in the control flow graph, the definition of control dependence determines a strongly connected region (SCR) of control dependences, whose nodes consist of predicates that determine an exit from the loop. The other nodes in the control flow graph loop not in the SCR of control dependences lie on some path (one end of a dependence edge) of control dependence edges from a node in the SCR. (without the statement sequence in code, compared to CFG)
Example:
notion image
Where Region Nodes R1 to R6 are used to explicitly show the control dependences for auxiliary understanding. So, you can remove them all (e.g., remove R3. Then we need to connect 1 to 3 and 1 to 6. 1 → 2 means 2 is control dependent on 1).

Determining Control Dependence

TBD.
What is ChatGPT?NLP-Related