# Permutations of context-free, ET0L and indexed languages.

For a language L, we consider its cyclic closure, and more generally the language [C.sup.k] (L), which consists of all words obtained by partitioning words from L into k factors and permuting them. We prove that the classes of ET0L and EDT0L languages are closed under the operators C . This both sharpens and generalises Brandstadt's result that if L is context-free then [C.sup.k] (L) is context-sensitive and not context-free in general for k [greater than or equal to] 3. We also show that the cyclic closure of an indexed language is indexed.Keywords: ET0L, EDT0L, indexed, context-free, cyclic closure

1 Introduction

In this note we investigate closure properties of context-free, ET0L, EDT0L and indexed languages under the operation of permuting a finite number of factors. Let [S.sub.k] denote the set of permutations on k letters. We sharpen a result of Brandstadt (1981) who proved that if L is context-free (respectively one-counter, linear) then the language

C (L) = {[w.sub.[sigma]](1)... [w.sub.[sigma](k)] | [w.sub.1]... [w.sub.k] [member of] L,[sigma] [zeta] [S.sub.k]}

is not context-free (respectively one-counter, linear) in general for k [greater than or equal to] 3. In our main result, Theorem 2.3, we prove that if L is ET0L (respectively EDT0L), then [C.sup.k](L) is also ET0L (respectively EDT0L). Since context-free languages are ET0L, it follows that if L is context-free, then [C.sup.k](L) is ET0L. Brandstadt (1981) proved that regular, context-sensitive and recursively enumerable languages are closed under [C.sup.k], so our results extend this list to include ET0L and EDT0L.

The language [C.sup.2](L) is simply the cyclic closure of L, given by

cyc(L) = {[w.sub.2][w.sub.1] | [w.sub.1][w.sub.2] [member of] L}.

Maslov (1973); Oshiba (1972) proved that the cyclic closure of a context-free language is context-free. In Theorem 3.3 we show that the same is true for indexed languages.

The cyclic closure of a language, as well as the generalization [C.sup.k], are natural operations on languages, which can prove useful in determining whether a language belongs to a certain class. These operations are particularly relevant when studying languages attached to conjugacy in groups and semigroups (see Ciobanu et al. (2016)).

2 Permutations of ET0L and EDT0L languages

The acronym ET0L (respectively EDT0L) refers to Extended, Table, 0 interaction, and Lindenmayer (respectively Deterministic). There is a vast literature on Lindenmayer systems, see Rozenberg and Salomaa (1986), with various acronyms such as D0L, DT0L, ET0L, HDT0L and so forth. The following inclusions hold: EDT0L [subset] ET0L [subset] indexed, and context-free [subset] ET0L. Furthermore, the classes of EDT0L and context-free languages are incomparable.

Definition 2.1 (ET0L) An ET0L-system is a tuple H = (V, A, [DELTA], I), where

1. V is a finite alphabet,

2.A [[subset].bar] V is the subset of terminal symbols,

3. [DELTA] = {P1,..., [P.sub.n]} is a finite set of tables, meaning each [P.sub.i] is a finite subset of V x V*, and

4. I [[subset].bar] V* is a finite set of axioms.

A word over V is called a sentential form (of H). For u, v [member of] V*, we write u [[??].sub.H,i] v if u = [c.sub.1]... [c.sub.m] for some [c.sub.1],..., [c.sub.m] [member of] V and v = [v.sub.1],..., [v.sub.m] for some [v.sub.1],..., [v.sub.m] [member of] V* with ([c.sub.j], [v.sub.j]) [member of] [P.sub.i] for every j [member of] {1,..., m}. We write u [??] H v if u [[??].sub.H,i] v for some i [member of] {1,..., n}. If there exist sentential forms [u.sub.0],... ,[u.sub.k] with ui [[??].sub.H] [u.sub.i+1] for 0 [less than or equal to] i [less than or equal to] n - 1, then we write [u.sub.0] [[??]*.sub.H] [u.sub.k]. The language generated by H is defined as

L(H) = {v [member of] A* | w [[??]*.sub.H] v for some w [member of] I}.

A language is ET0L if it is equal to L(H)for some ET0L system H.

We may write c [right arrow] v [member of] P to mean (c, v) [member of] P. We call (c, v) a rule for c, and use the convention that if for some c [member of] V no rule for c is specified in P, then P contains the rule (c, c).

Definition 2.2 (EDT0L) An EDT0L-system is an ET0L system where in each table there is exactly one rule for each letter in V. A language is EDT0L if it is equal to L(H) for some EDT0L system H.

In this section we prove the following:

Theorem 2.3 Let A be a finite alphabet. If L [[subset].bar] A* is ET0L (respectively EDT0L) then [C.sup.k](L) is ET0L (respectively EDT0L).

Proof: We start by showing that if #0,... , [#.sub.k] are distinct symbols not in A and L is ET0L (respectively EDT0L) then so is

L' = {[#.sub.0][w.sub.1]#1... [#.sub.k-1][w.sub.k][#.sub.k] | [w.sub.1]... [w.sub.k] [member of] L}.

This will be done in Lemma 2.5 below. We then prove in Proposition 2.9 that if L1 is an ET0L (respectively EDT0L) language where each word in L1 has two symbols a, b appearing exactly once, then L2 = {uabwv | uavbw [member of] [L.sub.1]} is ET0L (respectively EDT0L). For each permutation [sigma] [member of] [S.sub.k] we apply this result to L' for

(a, b) [#.sub.[sigma](1) -1], [#.sub.[sigma](1)],... , [#.sub.[sigma](k) - 1], [#.sub.[sigma](k)])

to obtain the ET0L (respectively EDT0L) language

[L.sub.[sigma]] = {[#.sub.0][#.sub.1]... [#.sub.k][w.sub.[sigma](1)]... [w.sub.[sigma](k)] | [#.sub.0][w.sub.1][#.sub.1]... [#.sub.k-1][w.sub.k][#.sub.k] [member of] L'}.

We obtain [C.sup.k](L) by applying erasing homomorphisms to remove the [#.sub.i], and taking the union over all [sigma] [member of] [S.sup.k]. Since ET0L (respectively EDT0L) languages are closed under homomorphism and finite union, this shows that [C.sup.k](L) is ET0L (respectively EDT0L).

Thus the proof will be complete once we established the above facts. []

Lemma 2.4 If L [[subset].bar] A* is EDT0L and # is a symbol not in A then the language

[L.sub.#] = {u#v | uv [member of] L}

is EDT0L.

Proof: Let H = (V, A, [DELTA], I) be an EDT0L system with L = L(H). Without loss of generality we can assume I [[subset].bar] V. Define an EDT0L system [H.sub.#] = ([V.sub.#],A [union] {#}, [[DELTA].sub.#],[I.sub.#]) as follows: [V.sub.#] is the disjoint union V [union] {[c.sub.#] | c [member of] V}, [I.sub.#] = {[s.sub.#] | s [member of] I}, and m = ma[x.sub.P[member of][DELTA]]{|w| | (c, w) [member of] P}, the length of the longest right-hand side of any table. Furthermore, we define [[DELTA].sub.#] to be the disjoint union [DELTA] [union] {[P.sub.i,#], [P.sub.#,i] | P [member of] [DELTA], i [member of] [0, m]}, where

[P.sub.i,#] := {[c.sub.#] [right arrow] u[d.sub.#]v | c [right arrow] udv [member of] P, |u| = i,d [member of] V} [union] P, [P.sub.#,i] := {[c.sub.#] [right arrow] u#v | c [right arrow] uv [member of] P, |u| = i} [union] P. (1)

We point out that if c [right arrow]* [epsilon] [member of] P, where [epsilon] denotes the empty word, then [P.sub.#,0] = {[c.sub.#] [right arrow] #}, so {[c.sub.#] [right arrow] # | c [right arrow] [epsilon] [member of] P} will be included in [[DELTA].sub.#].

The new system remains finite since we have added a finite number of new letters and tables, and deterministic since letters [v.sub.#] appear exactly once on the left side of each rule in the new tables.

Each word in L([H.sub.#]) is obtained starting with [s.sub.#] [member of] [I.sub.#] and applying tables of the form [P.sub.i,#] some number of times, until at some point, since A [union] {#} does not contain any letter with subscript #, a table of the form [P.sub.# ,i] must be applied. Before this point there is precisely one letter in the sentential form with subscript #, and after there are no letters with subscript #. Also, if uv [member of] L(H), then there is some a [member of] I with a [[??]*.sub.H] uv, and by construction [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

Lemma 2.5 If L [member of] A is ET0L (respectively EDT0L) and #0,... , [#.sub.n] are distinct symbols not in A, then

L' = {[#.sub.0][u.sub.1][#.sub.1]... [u.sub.n][#.sub.n] | [u.sub.1]... [u.sub.n] [member of] L}

is ET0L (respectively EDT0L).

Proof: Since ET0L languages are closed under rational transduction (Rozenberg and Salomaa (1986)), the result is immediate for ET0L. In contrast, the EDT0L languages are not closed under inverse homo-morphism (for example, the language {[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] | n [member of] N} is EDT0L and its inverse homomorphic image {w [member of] {a, b}* | [there exists]n [member of] N([|w|.sub.a] = [2.sup.n])} is not (Ehrenfeucht and Rozenberg (1974), Example 3). Instead, we apply Lemma 2.4 n + 1 times to insert single copies of the [#.sub.i], then intersect with the regular language {[#.sub.0][u.sub.1][#.sub.1]... [u.sub.n][#.sub.n] | [u.sub.i] [member of] A*} to ensure that the [#.sub.i] appear in the correct order. []

Definition 2.6 ((a, b)-language) Let T be a finite alphabet and a,b [member of] T distinct symbols. We say that w [member of] T* is an (a, b)-wordifw [member of] X* aX*bX*, where X = T\{a, b}. A language L [[subset].bar] T* of (a, b)-words is called an (a, b)-language.

We define a function [pi] on (a, b)-words as follows. If w = xaybz [member of] T*, then [pi](w) = xabzy. For an (a, b)-language L, we set [pi](L) = {[pi](w) | w [member of] L}.

Suppose L is an (a, b)-language and H = (V, T, [DELTA], I) is an ET0L or EDT0L system with L = L(H).

Definition 2.7 ((a, b)-morphism) A morphism [??] : [V.sup.#] [right arrow] [{a, b}.sup.#] is called an (a, b)-morphism (for H) if

(1) [??](a) = a, [??](b) = b, and [??](c) = [epsilon] for c [member of] T \ {a, b}, and

(2) if u, v [member of] V* with u [[??].sub.H] v then [??](u) = [??] (v).

Lemma 2.8 Let L be an ET0L (respectively EDT0L) language that is an (a, b)-language. Then L can be generated by some ET0L-system (respectively EDT0L-system) that admits an (a, b)-morphism.

Proof: Suppose L is generated by H = (V, [TAU], [DELTA], I), where a, b [member of] T and [DELTA] = {[P.sub.1],..., [P.sub.n]}. Without loss of generality, we may assume that I [[subset].bar] V. We define a new ET0L (respectively EDT0L) system H' = (V',[TAU],[DELTA]',I') as follows. Let F = {[epsilon],a,b, ab} be the set of factors of ab. Let V' = (V x F ) [union] [TAU] be the new alphabet and define the morphism [??]: V* [right arrow] {a, b}* by [??]((c, f)) = f for (c, f) [member of] V [union] T, [??](a) = a, [??](b) = b and [??](c) = [epsilon] for c [member of] T \ {a, b}.

The role of the F-component of a symbol (c, f) in V is to store the [??]-image of the terminal word to be derived from c. Since H generates only (a, b)-words, we choose as axioms I' = I x {ab}. The role of the tables is to distribute the two letters (in the F-component) in each word along a production.

In the ET0L case, the new set of tables is [DELTA]' = {[P'.sub.1],..., [P'.sub.n[eta]], [P'.sub.n+1]}, where

[P'.sub.i] = {(c, f) [right arrow] ([c.sub.1], [f.sub.1])... (cm, [f.sub.m]) | c [right arrow] [c.sub.1]... [c.sub.m] [member of] [P.sub.i], f = [f.sub.1]... [f.sub.m]}

for each i [member of] {1,..., n} and

[P'.sub.n+1] = {(a, a) [right arrow] a, (b, b) [right arrow] b} [union] {(c, [epsilon]) [right arrow] c | c [member of] T\ {a, b}} [union] {c [right arrow]c|c[member of] [TAU]}.

In the EDT0L case, we introduce a separate table for each choice of a factorisation f = [f.sub.1]... [f.sub.l] for each f [member of] F, where t is the maximal length of any right-hand side in H.

The idea underlying the definition of the tables [P'.sub.i] is that we make multiple copies of each rule in [P.sub.i] based on the choices for how to partition f and distribute the factors among the [c.sub.i] 's.

We claim now that H' = (V, T, [DELTA]', I') admits the morphism [??]. Property (1) follows from the definition of [??], and property (2) from the definition of the tables above.

Let [psi]: V* [right arrow] V* be the 'first coordinate projection' morphism with [psi]((c, f)) = c for (c, f) [member of] V x F and [psi](c) = c for c [member of] T.

For the inclusion L(H') [[subset].bar] L(H), note that u [[??].sub.H'] v implies [psi](u) [[??].sub.H] [psi](v) or [psi](u) = [psi](v), so in any case [psi](u) [[??]*.sub.H] [psi](v). Thus, if v [member of] L(H') with w [[??]*.sub.H], v and w [member of] I', then [psi](w) [[??].sub.H] [psi](v) and [psi](w) [member of] I, hence v = [psi] (v) [member of] L(H). This implies L(H') [[subset].bar] L(H).

For the inclusion L(H) [[subset].bar] L(H'), a straightforward induction on n yields the following claim: If [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] with u [member of] V* and an (a, b)-word v [member of] T*, then we have u' [[??]*.sub.H], v for some u' [member of] V'* such that [psi](u') = u and [??](u') = ab. We apply this to a derivation s [[??]*.sub.H] v with s [member of] I. Then our claim yields an s' [member of] V* with s' [[??]*.sub.H], v, [psi](s') = s [member of] I, and [??](s') = ab. This means s' [member of] I' and thus v [member of] L(H'). []

Proposition 2.9 Let L be an (a,b)-language that is ET0L (respectively EDT0L). Then [pi](L) is ET0L (respectively EDT0L).

Proof: Let L = L(H), where H = (V, T, [DELTA], I). By Lemma 2.8, we may assume that there is an (a, b)-morphism [??] for H. We now use [??] to define a map similar to [pi] on words over V. A word w [member of] V* is said to be an (a, b)-form (short for (a, b)-sentential-form) if [??](w) = ab. Such a word is either of the form xCy, where r, s [member of] V* and C [member of] V, with [??](x) = [??](y) = [epsilon] and [??](C) = ab; or it is of the form xAyBz with x, y, z [member of] V* and A, B [member of] V with [??](x) = [??](y) = [??](z) = [epsilon] and [??](A) = a, [??](B) = b. In the former case, w is called fused, in the latter it is called split.

Let p, q be symbols with p,q [??] V. We define the function [pi] on (a, b)-forms as follows. If w is fused, then [pi](w) = wpq. If w is split with w = xAyBz as above, then [pi](w) = xABzpyq. In other words, the factor between a and b in w will be moved between p and q. For a set L of (a, b)-forms, we set [pi](L) = {[pi](w) w [membre of] L}. Note that [pi] differs from [pi] by introducing the letters p, q. This will simplify the ensuing construction.

The idea is to construct an ET0L (respectively EDT0L) system H' = (V, T', [DELTA]', I'), in which V is the disjoint union V [union] {p, q} and T' = T [union] {p, q}, such that for (a, b)-forms u,v [member of] [V.sup.#], we have

u [[??].sub.H]v if and only if [pi](u) [[??].sub.H], [pi](v) (2)

Moreover, for each (a, b)-form u [member of] V* and v' [member of] V* with [pi](u) [[??].sub.H], v', there is an (a, b)-form v [member of] V* such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

For example, if the derivation [pi] (xAy B z) = xABzpyq [[??].sub.H'] x'A'B'z'py'q holds (the split-split case for u and v), then xAyBz [[??].sub.H] x'A'y'B'z', and similar implications hold in the other cases.

We define I' as I' = {[pi](w) | w [member of] I}, hence equation (2) implies [pi](L(H)) [[subset].bar] L(H') and equation (3) implies L(H') [[subset].bar] [pi](L(H)). Together, we have L(H') = [pi](L(H)), meaning [pi](L(H)) is an ET0L (respectively EDT0L) language. Furthermore, we have [pi](L(H)) = [psi]([pi](L(H))), where [psi] is the homomorphism that erases p, q. Thus, since the classes of ET0L and EDT0L languages are closed under homomorphic images, proving equations (2), (3) implies that [pi](L(H)) is an ET0L (respectively EDT0L) language and hence Proposition 2.9.

As before, we write [DELTA] = {[P.sub.1],..., [P.sub.n]}. Let l be the maximal length of a right-hand side in the productions of H, and let [V.sup.[less than or equal to]l] denote the set of all words in V* of length at most l. The set [DELTA]' consists of the following tables:

[P'.sub.i ]for each 1 [less than or equal to] i [less than or equal to] n, [P'.sub.i,w ]for each 1 [less than or equal to] i [less than or equal to] n and w [member of] [V.sup.[less than or equal to]l] with [??](w) = [epsilon], [P'.sub.i,u,v ]for each 1 [less than or equal to] i [less than or equal to] n and u, v [member of] [V.sup.[less than or equal to]l] with [??](u) = [??](v) = [epsilon],

which we describe next. The table [P'.sub.i] allows H' to mimic (in the sense of (2)) steps in Pi that start in a fused word and result in a fused word. Each table [P'.sub.i] comprises the following productions:

A[right arrow] z for each A [right arrow] z [member of] [P.sub.i] with [??](A) = [epsilon],

C [right arrow] xDy for each C [right arrow] xDy [member of] [P.sub.i] with D [member of] V and [??](C) = [??](D) = ab,

p [right arrow] p,

q [right arrow]q.

The table [P'.sub.i,w] mimics all steps of Pi where a fused word is turned into a split one, such that between the introduced A,B [member of] V, [??](A) = a, [??](B) = b, the word w is inserted. It contains the following productions:

A [right arrow] z for each A [right arrow] z [member of] [P.sub.i] with [??](A) = [epsilon],

C [right arrow] xABy for each C [right arrow] xAw By [member of] [P.sub.i] with [??](C) = ab, [??](A) = a, and [??](B) = b,

p [right arrow] pw, q [right arrow] q.

Finally, the table [P'.sub.i,uv] mimics a step of [P.sub.i] that starts in a split word and produces a split one, such that (i) the symbol A with [??](A) = a generates u to its right and (ii) the symbol B with [??](B) = b generates v to its left. It consists of the productions

A [right arrow] z for each A [right arrow] z [member of] Pi with [??](A) = [epsilon], A [right arrow] xA' for each A [right arrow] xA'u [member of] Pi with [??](A) = [??](A') = a, B [right arrow] B'y for each B [right arrow] vB'y [member of] Pi with [??](B) = [??](B') = b, p [right arrow] pu, q [right arrow] vq.

It can be verified straightforwardly that with these tables, equations (2), (3) are satisfied. In addition, if the table Pi has exactly one rule for each letter in V then [P'.sub.i], [P'.sub.i,w] and [P'.sub.iu,v] has exactly one rule for each letter in V, so if H is EDT0L then so is H'. We have thus proven Proposition 2.9. []

3 Cyclic closure of indexed is indexed

Recall that an indexed language is one that is generated by the following type of grammar:

Definition 3.1 (Indexed grammar; Aho (1968)) An indexed grammar is a 5-tuple (N, T, I, P, S) such that

1. N, T, I are three mutually disjoint sets of symbols, called nonterminals, terminals and indices (or flags) respectively.

2. S [member of] N is the start symbol.

3. P is a finite set of productions, each having the form of one of the following:

(a) A [right arrow] [B.sup.f].

(b) [A.sup.f] [right arrow] v.

(c) A [right arrow] u.

where A, B [member of] N, f [member of] I and u, v [member of] (N [union] T)*.

As usual in grammars, indexed grammars successively transform sentential forms, which are defined as follows. An atom is either a terminal letter x [member of] T or a pair (A, [gamma]) with A [member of] N and [gamma] [member of] I*. Such a pair (A, [gamma]) is also denoted [A.sup.[gamma]]. A sentential form of an indexed grammar is a (finite) sequence of atoms. In particular, every string over T is a sentential form. The language defined by an indexed grammar is the set of all strings of terminals that can be obtained by successively applying production rules starting from the sentential form S. Let A [member of] N, [gamma] [member of] I*. Define a letter homomorphism [[pi].sub.[gamma]] by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In contrast to ETOL systems, where each step replaces every symbol in the sentential form, indexed grammars transform only one atom per step. Production rules transform sentential forms as follows. For an atom [A.sup.[gamma]] in the sentential form:

1. applying A [right arrow] [B.sup.f] replaces one occurrence of [A.sup.[gamma]] by [B.sup.f[gamma]]

2. if [gamma] = f [delta] with f [member of] I, applying [A.sup.f] [right arrow] v replaces one occurrence of [A.sup.[gamma]] (with [gamma] [member of] I*) by [[pi].sub.[delta]] (v)

3. applying A [right arrow] u replaces one occurrence of [A.sup.[gamma]] by [[pi].sub.[gamma]] (u).

We call the operation of successively applying productions starting from the sentential form S and terminating at a string u [member of] T* a derivation of u. We use the notation [??] to denote a sequence of productions within a derivation, and call such a sequence a subderivation. Sometimes we abuse notation and write u [right arrow] v for sentential forms u and v to denote that v results from u by applying one rule.

We represent a derivation S [??] u [member of] T* pictorially using a parse tree, which is defined in the same way as for context-free grammars (see for example Hopcroft and Ullman (1979) page 83) with root labeled by S, internal nodes labeled by [A.sup.[omega]] for A [member of] N and [omega] [member of] I* and leaves labeled by T [union] {[epsilon]}.

A path-skeleton of a parse tree is the (labeled) 1-neighbourhood of some path from the root vertex to a leaf. See Figure 1 for an example.

Definition 3.2 (Normal form) An indexed grammar (N, T, I, P, S) is in normal form if all productions are of one of the following types:

1. A [right arrow] [B.sup.f]

2. [A.sup.f] [right arrow] B

3. A [right arrow] BC

4. A [right arrow] a

where A, B,C [member of] N, f [member of] I and a [member of] T.

An indexed grammar can be put into normal form as follows. For each production [A.sup.f] [right arrow] v with v [??] N , introduce a new nonterminal B, add productions [A.sup.f] [right arrow] B,B [right arrow] v, and remove [A.sup.f] [right arrow] v. By the same arguments used for Chomsky normal form, each production A [right arrow] u without flags can be replaced by a set of productions of type 3 and 4 above.

Maslov (1973); Oshiba (1972) proved that the cyclic closure of a context-free language is context-free. A sketch of a proof of this fact is given in the solution to Exercise 6.4 (c) in Hopcroft and Ullman (1979), and we generalise the approach taken there to show that the class of indexed languages is also closed under the cyclic closure operation.

Theorem 3.3 If L is indexed, then cyc(L) is indexed.

Proof: The idea of the proof is to take the parse-tree of a derivation of [w.sub.1][w.sub.2] [member of] L in [GAMMA] and "turn it upside down", using the leaf corresponding to the first letter of the word [w.sub.2] as the new start symbol.

Let [GAMMA] = (N, T, I, P, S) be an indexed grammar for L in normal form. If w = [a.sub.1]... [a.sub.n] [member of] L with [a.sub.i] [member of] T and we wish to generate the cyclic permutation [a.sub.k]... [a.sub.n][a.sub.1]... [a.sub.k-1] of w, take some parse tree for w in [gamma] and draw the unique path F from the start symbol S to [a.sub.k]. Consider the path-skeleton for F.

In the example given in Figure 1, the desired word [a.sub.k]... [a.sub.n][a.sub.1]... [a.sub.k-1] can be derived from the string [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], using productions in P.

Therefore we wish to enlarge the grammar to generate all strings

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the labels of the vertices lying immediately to the left of F (in top to bottom order), and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the labels of the vertices lying immediately to the right of F (in bottom to top order). We do this by introducing new 'hatted' nonterminals, with which we label all the vertices along the path F, and new productions which are the reverse of the old productions 'with hats on'. By first nondeterministically guessing the flag on the nonterminal immediately preceding [a.sub.k], we are able to essentially generate the path-skeleton in reverse.

The grammar for cyc(L) is given by [GAMMA]' = (N', T', I', P U P, [S.sub.0]), where T' = T, I' = I U {$} (where $ is a new symbol not in I), [S.sub.0] [member of] N' N is the new start symbol, and N' and P are as follows. Let N be the set of symbols obtained from N by placing a hat on them. Then the disjoint union N' = N [union] N [union] {[S.sub.0], S} is the new set of nonterminals.

The productions P are as follows:

1. [S.sub.0] [right arrow] S, [S.sub.0] [right arrow] [S.sup.$] , [S.sup.$] [right arrow] [epsilon]

[FIGURE 1 OMITTED]

2. for each f [member of] I, a production S [right arrow] [S.sup.f]

3. for each production A [right arrow] a in P, a production S [right arrow] aA

4. for each production A [right arrow] [B.sup.f] in P, a production [B.sup.f] [right arrow] A

5. for each production [A.sup.f] [right arrow] B in P, a production B[right arrow] [A.sup.f]

6. for each production A [right arrow] BC in P, productions B [right arrow] CA and C [right arrow] AB

Note that the new grammar is no longer in normal form.

Informally, the new grammar operates as follows. Let w = [w.sub.1][w.sub.2] [member of] L and suppose we wish to produce [w.sub.2][w.sub.1]. If a derivation starts with [S.sub.0] [right arrow] S, then the word produced is some word from L. (This corresponds to the case when one of the [w.sub.i] is empty.) Otherwise derivations start with [S.sub.0] [right arrow] [S.sup.$], followed by some sequence of productions S [right arrow] [S.sup.f], building up a flag word on S. This is how we nondeterministically guess the flag label [gamma] on the second last node of the path-skeleton. After this we apply a production S [right arrow] aA, where a is the first letter of w2 (labelling the end leaf of the path-skeleton) and A is the nonterminal labelling the second last vertex of the path-skeleton. Note that the flag label [gamma]$ is transferred to A. After this point, productions of types 4, 5, and 6 are applied to simulate going in reverse along the path-skeleton, at each step producing a sentential form with exactly one hatted symbol. The only way to remove the hat symbol is to apply the production [S.sup.$] [right arrow] [epsilon]. Observe that all flags on nonterminals in a derivation starting from [S.sub.0] [right arrow] [S.sup.$] are words in I*$, and since $ is always at the right end of a flag it does not interfere with any productions from P, so in particular rules A [right arrow] a to the sides of the path-skeleton produce the same strings of terminals as they do in [GAMMA].

We will show by induction on n that in this new grammar, if A,[A.sub.1],..., [A.sub.n] [member of] N then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

if and only if

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)

for all 1 [less than or equal to] i [less than or equal to] n.

To see why this will suffice, suppose first that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

in the original grammar [GAMMA]. So [A.sub.i] [right arrow] a is in P. Then in the new grammar

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Each [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] produces exactly the same set of words in [GAMMA]' as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] produces in [GAMMA]. Hence every cyclic permutation of a word in L is in the new language.

Conversely, suppose [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and that this subderivation does not start with [S.sub.0] [right arrow] S. Then the subderivation begins with [S.sub.0] [right arrow] [S.sup.$] [??] [S.sup.u] [right arrow] a[A.sup.u] for some u [member of] I*$, A [member of] N. Once a 'hatted' symbol has been introduced, the only way to get rid of the hat is via the production [S.sup.$] [right arrow] [epsilon]. Hence we must have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for some 0 [less than or equal to] j [less than or equal to] n (with the factor before or after S being empty if j = 0 or j = n respectively).

But then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and so if a word is produced by the new grammar, some cyclic permutation of that word is in L.

We finish by giving the inductive proof of the equivalence of (4) and (5). For the case n = 1, the productions of type 5 and 6 in the definition of the grammar for cyc(L) show that [A.sup.w] [??] [B.sup.u] if and only if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. For the case n = 2, we have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] if and only if at some point in the parse tree, we see a subtree labeled [X.sup.t] [right arrow] [Y.sup.t][Z.sup.t], with [A.sup.w] [??] [X.sup.t], [Y.sup.t] [??] [B.sup.u] and [Z.sup.t] [??] [C.sup.v]. The productions in these last three subderivations are all of the form D [right arrow] [E.sup.f] or [D.sup.f] [right arrow] E, so they are equivalent to [X.sup.t] [??] [A.sup.w], [B.sup.u] [??] [Y.sup.t] and [C.sup.v] [??] [Z.sup.t]. Also X [right arrow] Y Z if and only if Y [right arrow] ZX and Z [right arrow] XY. Putting these together, we have [A.sup.w] [??] [B.sup.u][C.sup.v] if and only if

[B.sup.u] [??] [Y.sup.t] [right arrow] [Z.sup.t][X.sup.t] [??] [C.sup.v] [A.sup.w]

and

[C.sup.v] [??] [Z.sup.t] [right arrow] [X.sup.t][Y.sup.t] [??] [A.sup.w][B.sup.u],

as required.

Now for n > 2, suppose our statement is true for k < n. Then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] if and only if for each 1 [less than or equal to] i [less than or equal to] n there are [X.sub.i], [Y.sub.i],[Z.sub.i] [member of] N and t [member of] I* such that [X.sub.i] [right arrow] [Y.sub.i][Z.sub.i] and for some 1 [less than or equal to] j [less than or equal to] n either

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], or

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We will consider only the second of these, as it is the slightly more complicated one and the first is very similar. The right hand side of the displayed subderivation has fewer than n terms, so by our assumption, this subderivation is valid if and only if

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

But this, together with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], is equivalent to the existence of a derivation

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Here, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] follows from the equivalence of (4) and (5) for n = 1. []

4 Concluding remarks

The results in this paper raise the question whether for an indexed language L the language [C.sup.k](L) is indexed as well, or if not, to which class of languages (within context-sensitive) it belongs.

A consequence of our main result (Theorem 2.3) is that permutations of context-free languages are indexed (a different proof of this based on parse trees can be found in Brough et al. (2015)). It would be interesting to consider the possible extension of this result to the OI- and IO-hierarchies (Damm (1982), Damm and Goerdt (1986)) of languages built out of automata or grammars that extend the pushdown automata and indexed grammars, respectively. They define level-n grammars inductively, allowing the flags at level n to carry up to n levels of parameters in the form of flags. Thus level-0 grammars generate context-free languages, and level-1 grammars produce indexed languages. We conjecture that the class of level-n languages is closed under cyclic closure, and also that if L is a level-n language then [C.sup.k](L) is a level-(n + 1) language.

References

A. V. Aho. Indexed grammars--an extension of context-free grammars. J. Assoc. Comput. Mach., 15: 647-671, 1968.

A. Brandstadt. Closure properties of certain families of formal languages with respect to a generalization of cyclic closure. RAIRO Inform. Theor, 15(3):233-252, 1981.

T. Brough, L. Ciobanu, and M. Elder. Permutation closures of context-free and indexed languages. http://arxiv.org/abs/1412.5512, 2015.

L. Ciobanu, S. Hermiller, D. Holt, and S. Rees. Conjugacy languages in groups. Israel Journal of Mathematics, 211(1):311-347, 2016.

W. Damm. The IO- and OI-hierarchies. Theoret. Comput. Sci., 20(2):95-207, 1982.

W. Damm and A. Goerdt. An automata-theoretical characterization of the OI-hierarchy. Inform. and Control, 71(1-2):1-32, 1986.

A. Ehrenfeucht and G. Rozenberg. The number of occurrences of letters versus their distribution in some EOL languages. Information and Control, 26:256-271, 1974.

A. Ehrenfeucht and G. Rozenberg. On inverse homomorphic images of deterministic ETOL languages. In Automata, languages, development, pages 179-189. North-Holland, Amsterdam, 1976.

J. E. Hopcroft and J. D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley Publishing Co., Reading, Mass., 1979. Addison-Wesley Series in Computer Science.

A. N. Maslov. The cyclic shift of languages. Problemy Peredaci Informacii, 9(4):81-87, 1973.

T. Oshiba. Closure property of the family of context-free languages under the cyclic shift operation. Electron. Commun. Japan, 55(4):119-122, 1972.

G. Rozenberg and A. Salomaa. The Book of L. Springer, 1986.

Tara Brough (1*) Laura Ciobanu (2[dagger]) Murray Elder (3[double dagger]) Georg Zetzsche (4[section])

(1) Universidade de Lisboa, Portugal

(2) University of Neuchatel, Switzerland

(3)The University of Newcastle, Australia

(4) LSV, CNRS & ENS Cachan, Universite Paris-Saclay, France

received 5th Jan. 2015, revised 19th Apr. 2016, accepted 12th May 2016.

(*) Research primarily carried out while employed at the University of St Andrews, Scotland. Visit to second author supported by London Mathematical Society Scheme 4 grant 41348.

([dagger]) Supported by Swiss National Science Foundation Professorship FN PP00P2-144681/1

([double dagger]) Supported by Australian Research Council grant FT110100178

([section]) Supported by a fellowship within the Postdoc-Program of the German Academic Exchange Service (DAAD)

Printer friendly Cite/link Email Feedback | |

Author: | Brough, Tara; Ciobanu, Laura; Elder, Murray; Zetzsche, Georg |
---|---|

Publication: | DMTCS Proceedings |

Article Type: | Report |

Date: | Jan 1, 2016 |

Words: | 6291 |

Previous Article: | Statistics for 3-letter patterns with repetitions in compositions. |

Next Article: | A proof of Zhil'tsov's theorem on decidability of equational theory of epigroups. |

Topics: |