# Random Horn formulas and propagation connectivity for directed hypergraphs.

1 Introduction

Horn formulas are a subclass of CNF expressions, where every clause contains at most one unnegated variable. This class is tractable in the sense that many problems that are hard for CNF expressions in general are polynomially solvable for Horn formulas (such as satisfiability and equivalence). It is partly for this reason that Horn formulas are of basic importance in artificial intelligence and other areas. Random Horn formulas have been studied in [DBC01, DV06, Ist02, LMST09, MIDV07].

A Horn formula is definite if it consists of clauses containing exactly one unnegated variable. We consider definite Horn formulas with clauses of size 3, i.e., with clauses of the form ([bar.a] [disjunction] [bar.b] [disjunction] [bar.c]), which can also be written as a, b [right arrow] c. Here a and b form the body of the clause and c is the head of the clause. Implication between a definite Horn formula [phi] and a definite Horn clause C can be decided by forward chaining: mark variables in the body of C and, while there is a clause in [phi] with all its body variables marked, mark its head variable as well. Then C is implied by [phi] if its head gets marked.

We consider random definite Horn formulas with clauses of size 3 over n variables, where every clause is included with probability p. It follows directly from the results of [LMST09] that p = (2 ln n)/n is a threshold probability for the following property: every pair of variables implies every other variable (see also [DBC01] for a related result).

In this paper we consider the property that some pair of variables implies every other variable. This property is closely related to the property of propagation connectivity for 3-uniform undirected hypergraphs, introduced recently by Berke and Onsjo [BO09a]. They consider a marking process like forward chaining, except that now a vertex can be marked if it is contained in an edge whose other two vertices are already marked. A 3-uniform undirected hypergraph is propagation connected if there is a pair of vertices such that the marking process, starting from that pair, marks every vertex. Berke and Onsjo showed that for p < 1/(n [(log n).sup.2]) random hypergraphs are a.a.s. not propagation connected [BO09a] and for p > 1/(n [(log n).sup.0.4]) random hypergraphs are a.a.s. propagation connected [BO09b]. The first result proves a lower bound for the transition from random hypergraphs being a.a.s. not propagation connected to random hypergraphs being a.a.s. propagation connected, and the second result proves an upper bound for the transition. We use the terms lower and upper bound in a similar sense throughout the paper.

The Horn formula property mentioned above is equivalent to propagation connectivity for directed 3-uniform hypergraphs, where by a directed hypergraph we mean a hypergraph with each edge having a distinguished vertex called its head, and the other vertices called its body. (This is one of the possible definitions of a directed hypergraph. There are several other variants.) In the rest of the paper we use the terminology of propagation connectivity for directed hypergraphs instead of Horn formulas. We show that random directed 3-uniform hypergraphs for p [less than or equal to] 1/(11n ln n) are a.a.s. not propagation connected and for p [greater than or equal to] (5 ln ln n)/(n ln n) are a.a.s. propagation connected. The proofs are based on two versions of the "fanning-out process" (see, e.g., [Kar90, JLR00]). For the upper bound we start the process by exploring a subset of the vertices and finding a maximal degree pair within that subset.

For the undirected hypergraph version of the problem, Coja-Oghlan, Onsjo and Watanabe concurrently and independently proved lower and upper bounds, both of the order 1/(n ln n) [COOW10]. It appears that their argument can be adapted to yield order 1/(n ln n) lower and upper bounds in the directed case as well. The proofs we present here are simpler.

The lower and upper bounds are presented in Section 3 and 4. In the closing section we mention a few open problems.

2 Preliminaries

We consider 3-uniform directed hypergraphs H with directed edges of the form u, v [right arrow] w. The pair (u, v) is the body of the edge and w is the head of the edge. Note that the body is an unordered pair. The degree of a pair (u, v) is the number of vertices w that form an edge u, v [right arrow] w with the pair. We refer to vertex w as a successor of (u, v). The (u, v)-propagation connected component (or simply (u, v)-component) of H is the set of vertices marked by the marking process starting with (u, v).

The probability model where a random directed hypergraph is formed over the vertex set [n] = {1,...,n} by including each edge u, v [right arrow] w independently with probability p is denoted by DH(n,p). For any monotone increasing property of directed hypergraphs the probability that the property holds for a random directed hypergraph drawn from DH(n,p) is a monotone non-decreasing function of p (see [Bol01, Th.2.1]).

We use the following versions of the Chernoff bounds [JLR00].

Proposition 2.1 If X [member of] BIN(n,p) and t [greater than or equal to] 0 then

(a) Pr[X [greater than or equal to] E[X] + t] [less than or equal to] exp{-[t.sup.2]/2(E[X]+t/3)},

(b) Pr[X [less than or equal to] E[X] - t] [less than or equal to] exp{-[t.sup.2]/2E[X]}.

3 A lower bound

We first give a lower bound for probabilities p such that a random directed hypergraph from DH(n,p) is a.a.s. propagation connected.

Theorem 3.1 Let p [less than or equal to] 1/(11n ln n). In a random directed hypergraph from DH(n,p) a.a.s. every propagation connected component has size at most 11 ln n.

Proof: By monotonicity we may assume p = 1/(11n ln n). The following process is used to explore H [member of] DH(n,p). Start with two sets [A.sub.0] = {u,v} and [B.sub.0] = 0. The sets [A.sub.i] and [B.sub.i] represent the sets of discovered vertices and saturated pairs at iteration i respectively, and put [m.sub.i] = [absolute value of [A.sub.i]]. At iteration i of the process consider vertices [u.sub.i] and [v.sub.i] such that [u.sub.i],[v.sub.i] [member of] [A.sub.i-1] and ([u.sub.i],[v.sub.i]) [not member of] [B.sub.i-1]. Find every edge [u.sub.i],[v.sub.i] [right arrow] w where w [not member of] [A.sub.i-1]. Construct the set [A.sub.i] so that it contains all vertices in set [A.sub.i-1] plus all vertices w, where w is the head of an edge that was found in step i. Construct the set [B.sub.i] by [B.sub.i] = [B.sub.i-1] [union] {([u.sub.i],[v.sub.i])}. When every pair in [A.sub.i] is saturated, we have discovered all the vertices in the component, and from then on we put [A.sub.j] = [A.sub.i], [B.sub.j] = [B.sub.i] for every j > i.

We need to show that this process stabilizes after a small number of steps with high probability. Define [X.sub.i] to be the number of successors, in V \ [A.sub.i-1], of the pair ([u.sub.i],[v.sub.i]) to be saturated. Each edge with body ([u.sub.i],[v.sub.i]) and head in V [A.sub.i-1] is in the hypergraph with probability p, independently of the presence or absence of any other edge. Furthermore each such edge is considered at most once in the process. Thus [X.sub.i] [member of] BIN(n - [m.sub.i-1],p).

Let k = [11 ln n]. If the process generates at least k vertices then this must happen in the first [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iterations. Thus the probability of generating at least k vertices is at most

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)

Let [X.sup.+.sub.i] [member of] BIN(n,p) and replace the upper limit in the summation (1) by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for convenience. Then, noting that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and as such has mean [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], the probability (1) can be upper bounded by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Using the values of p and k we note that np ~ 1/k, giving [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then the Chernoff bound (Proposition 2.1(a)) with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] gives the upper bound exp {-(3k)/16} ~ exp {-(33 ln n)/16} = o([n.sup.-2]), which implies the theorem by the union bound.

4 An upper bound

In this section we give a sufficient condition for probabilities p such that a random directed hypergraph from DH(n,p) is a.a.s. propagation connected.

Theorem 4.1 For p [greater than or equal to] (5 ln ln n)/(n ln n) a random directed hypergraph from DH(n,p) is a.a.s. propagation connected.

Proof: By monotonicity we may assume p = (5 ln ln n)/(n ln n). We use a modification of the process described above. First we consider all edges over the first n/4 vertices and find a highest-degree pair (u,v) in that subset. Starting from the successors of that pair we find a sufficiently large part of the rest of the component using a variant of the original process organized into phases as follows.

Let m = [(ln n)/(ln ln n)] and assume that we found a pair (u,v) with m successors [w.sub.1],...,[w.sub.m] among the first n/4 vertices. Let [A.sub.0] = {[w.sub.1],...,[w.sub.m]} be the initial set of discovered vertices and let [C.sub.0] be the (3/4)n vertices not considered so far, forming the initial set of available vertices. In iteration i of the new process we pick an arbitrary set [D.sub.i-1] [subset or equal to] [C.sub.i-1] of n/2 available vertices, and we find all edges u,v [right arrow] w, where u,v [member of] [A.sub.i-1] and w [member of] [D.sub.i-1] If there are at least m distinct successors in [D.sub.i-1] then let [A.sub.i] be any m of these and put [C.sub.i] = [C.sub.i-1] \ [A.sub.i]. Otherwise let [A.sub.j] = [A.sub.i-1] for every j [greater than or equal to] i. We run this process for [ln n ln ln n] iterations.

The following lemma, analogous to bounds for graphs (see [Bol01, Ch.3]), gives a bound for the maximal degree of a pair in H [member of] DH(n,p). This lemma is stated for the smaller and simpler probability 1/(n ln n), but applies also to larger p by monotonicity.

Lemma 4.2 If p = 1/(n ln n), then the maximum degree of H [member of] DH(n,p) is a.a.s. at least (ln 4n)/(ln ln 4n).

Proof: Let d = [(ln 4n)/(ln ln 4n)] and let the random variable [Y.sub.ij] be the number of successors of pair (i,j) in H. Then [Y.sub.ij] [member of] BIN (n - 2,p) and since we are dealing with directed edges, the variables [Y.sub.ij] are independent. Thus the probability that every degree is smaller than d is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

if n is sufficiently large. For the last inequality we used the fact that [(1 - p).sup.n-2-d] = 1 - o(1). Using 1 - x < [e.sup.-x], we need to show that

[(pn/d).sup.d][n.sup.2] [right arrow] [infinity]

This follows by taking logarithms and using the definitions of p and d. Specifically we use (ln 4n)/(ln ln 4n) [less than or equal to] d [less than or equal to] (2 ln 4n)/(ln ln 4n) to get

d ln(pn/d) + 2 ln n > (ln 4n)(ln ln ln 4n - ln 2)/ln ln 4n + (ln n) (1 - ln ln n/ln ln 4n) - (ln 4)(1 + ln ln n/ln ln 4n).

The expression on the right tends to infinity, since the first term tends to infinity, the second term is positive and the third term has a constant limit.

We also use a version of a lemma of [BO09b] showing that a.a.s. every component is either small or contains every vertex. Such a statement holds for several probabilities p, but we state it here for p = (5 ln ln n)/(n ln n), as this property is not monotone. This lemma is similar to the gap theorem in [Kar90] and its proof is included for completeness.

Lemma 4.3 ([BO09b]) If p = (5 ln ln n)/(n ln n) then a.a.s. every propagation connected component has either size n or size less than [(ln n).sup.2].

Proof: If a set of vertices is a propagation connected component then there can be no edges with body in the component and head outside. Thus the probability that there is a component of size k is at most

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

We show that for [(ln n).sup.2] [less than or equal to] k [less than or equal to] n - 1 this quantity is o(1/n).

If n/2 [less than or equal to] k [less than or equal to] n - 1 then replacing [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] gives the upper bound

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], the probability is upper bounded by exp{-[OMEGA] (n ln ln n/ln n)}.

Else [(ln n).sup.2] [less than or equal to] k < n/2 and the analogous calculation gives the upper bound

exp{-k(ln k + p(k - 1)(n - k)/2 - (ln n + 1))}.

Here n - k can be replaced by n/2 and then substituting the values of p and k we can lower bound ln k + p(k - 1)(n - k)/2 - (ln n + 1) by [OMEGA](ln n). Since k [greater than or equal to] [(ln n).sup.2] we get an upper bound of the form exp{-[OMEGA]([(ln n).sup.3])}.

Returning to the proof of Theorem 4.1, let us say that we are successful if we find a pair of degree m among the first n/4 vertices, we can run the iterative process for [ln n ln ln n] iterations, always finding m new vertices, and the event described in Lemma 4.3 occurs. In this case, after the last iteration we found a component of size [(ln n).sup.2], and by Lemma 4.3 the hypergraph is propagation connected.

The number [Z.sub.i] of edges added in the ith iteration has distribution [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Using the Chernoff bound (Proposition 2.1(b)) for the probability that there are fewer than m such edges we get

Pr[[Z.sub.i] < m] [less than or equal to] exp{-[(E[[Z.sub.i]] - m).sup.2]/2E[[Z.sub.i]]} < exp{-ln n/41 ln ln n}. (2)

The last bound is dueto E[[Z.sub.i]] ~ [m.sup.2]np/4 ~ (5 ln n)/(4 ln ln n) and E[[Z.sub.i]] - m ~ (ln n)/(4 ln ln n) which gives

[(E[[Z.sub.i]] - m).sup.2]/2E[[Z.sub.i]] ~ ln n/40 ln ln n.

Since we saturate more than one pair in an iteration it is possible that the same vertex is discovered by more than one edge. The probability of such a conflict is at most

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

If [Z.sub.i] [greater than or equal to] m and there are no conflicts in iteration i then we found at least m new vertices in that iteration. Hence, using Lemmas 4.2, 4.3 and the bounds (2) and (3), the probability of failure is

o(1) + O([(ln n).sup.3]/n ln ln n + exp{-ln n/41 ln ln n}ln n ln ln n) = o(1).

The approach used in this paper seems to require edge probabilities of the order of magnitude given in Theorem 4.1, as the probability should be large enough to produce a sufficient number of newly discovered vertices, starting from an initial set of size determined by the maximum degree calculation of Lemma 4.2.

5 Open Problems

It follows from the result of [LMST09] mentioned in the introduction and Theorem 3.1 that the fraction of pairs (u,v) such that the (u,v)-component has size n grows from 0 to 1 over the interval between p = [OMEGA](1/(n ln n)) and O(ln n/n). It would be interesting to have more detailed information about growth over this interval.

Forward chaining can be studied for different ranges of the parameters. For example, [MIDV07] gives phase transition results on forward chaining where the marking process is started from a positive fraction of the vertices and p is of the form c/[n.sup.2]. Forward chaining could be studied beyond just determining the size of the component produced. The process producing a propagation connected component can also be viewed, using the terminology of propositional logic, as a resolution derivation of consequences, or using the terminology of directed graphs, as a hyperpath [AIL+10]. Considering combinatorial parameters of the structure of a propagation connected component, such as its depth, could also be of interest from the point of view of knowledge base applications.

The Internet and other large networks motivated the study of random graphs in models different from the standard models of fixed edge probabilities or fixed number of edges. For evolving knowledge bases, modeled by random Horn formulas, we are not aware of any such work. A possible choice would be to consider random subformulas of a given formula (corresponding to 'true' knowledge). For random graphs such a model has been studied, e.g., in [CH07].

6 Acknowledgements

The authors would like to thank the referee for a careful reading of this paper and useful suggestions.

References

[AIL+10] Giorgio Ausiello, Giuseppe F. Italiano, Luigi Laura, Umberto Nanni, and Fabiano Sarracco. Classification and traversal algorithmic techniques for optimization problems on directed hyperpaths. Technical Report n.18, Dipartimento di Informatica e Sistemistica "Antonio Ruberti", Universita di Roma "La Sapienza", 2010.

[BO09a] Robert Berke and Mikael Onsjo. Propagation connectivity of random hypergraphs. In Stochastic Algorithms: Foundations and Applications, SAGA, pages 117-126. Springer LNCS 5792, 2009. An update appears in [BO09b].

[BO09b] Robert Berke and Mikael Onsjo. Propagation connectivity of random hypergraphs. Technical Report 1342-2812, Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, October 2009.

[Bol01] Bela Bollobas. Random Graphs. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2001.

[CH07] Fan R. K. Chung and Paul Horn. The spectral gap of a random subgraph of a graph. Internet Mathematics, 4(2):225-244, 2007.

[COOW10] Amin Coja-Oghlan, Mikael Onsjo, and Osamu Watanabe. Propagation connectivity of random hypergraphs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, volume 6302 of Lecture Notes in Computer Science, pages 490-503. Springer Berlin/Heidelberg, 2010.

[DBC01] Paul E. Dunne and Trevor J. M. Bench-Capon. A sharp threshold for the phase transition of a restricted satisfiability problem for Horn clauses. J. Log. Algebr. Program., 47(1):1-14, 2001.

[DV06] Demetrios D. Demopoulos and Moshe Y. Vardi. The phase transition in the random HornSAT problem. In Computational Complexity and Statistical Physics (Santa Fe Institute Studies in the Sciences of Complexity Proceedings), pages 195-220. Oxford University Press, Inc., 2006.

[Ist02] Gabriel Istrate. The phase transition in random Horn satisfiability and its algorithmic implications. Random Struct. Algorithms, 20(4):483-506, 2002.

[JLR00] Svante Janson, Tomasz Luczak, and Andrzej Rucmski. Random Graphs. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley-Interscience, 2000.

[Kar90] Richard M. Karp. The transitive closure of a random digraph. Random Struct. Algorithms, 1(1):73-94, 1990.

[LMST09] Marina Langlois, Dhruv Mubayi, Robert H. Sloan, and Gyorgy Turan. Combinatorial problems for Horn clauses. In Graph Theory, Computational Intelligence and Thought, pages 54-65. Springer LNCS 5420, 2009.

[MIDV07] Cristopher Moore, Gabriel Istrate, Demetrios D. Demopoulos, and Moshe Y. Vardi. A continuous-discontinuous second-order transition in the satisfiability of random Horn-SAT formulas. Random Struct. Algorithms, 31(2):173-185, 2007.

Robert H. Sloan (1)

Despina Stasi (1)

Gyorgy Turan (1,2)

(1) University of Illinois at Chicago

(2) Hungarian Academy of Sciences and University of Szeged, Research Group on Artificial Intelligence

([dagger]) This material is based upon work supported by the National Science Foundation under Grant No. CCF-0916708.

received 17th October 2010, revised 21st March 2012, accepted 1st August 2012.