Underspecified Beta ReductionManuel Bodirsky Katrin Erk Joachim Niehren Programming Systems Lab Saarland University D-66041 Saarbr¨ ucken, Germany www.ps.uni-sb.de/˜{bodirsky|erk|niehren}Alexander Koller Department of Computational Linguistics Saarland University D-66041 Saarbr¨ ucken, Germany

[email protected] ambiguous sentences, traditional semantics construction produces large numbersofhigher-orderformulas, which must then be β-reduced individ-ually. Underspecified versions can pro- duce compact descriptions of all read- ings, but it is not known how to perform β-reduction on these descriptions. We show how to do this using β-reduction constraints in the constraint language for λ-structures (CLLS).1IntroductionTraditional approaches to semantics construction (Montague, 1974; Cooper, 1983) employ formu- las of higher-order logic to derive semantic rep- resentations compositionally; then β-reduction is applied to simplify these representations. When theinputsentenceisambiguous, theseapproaches require all readings to be enumerated and β- reduced individually. For large numbers of read-ings, this is both inefficient and unelegant.Existing underspecification approaches (Reyle, 1993; van Deemter and Peters, 1996; Pinkal, 1996; Bos, 1996) provide a partial solution to this problem. They delay the enumeration of the read- ings and represent them all at once in a single,compact description. An underspecification for- malism that is particularly well suited for describ- ing higher-order formulas is the Constraint Lan- guage for Lambda Structures, CLLS (Egg et al., 2001; Erk et al., 2001). CLLS descriptions can be derived compositionally and have been used to deal with a rich class of linguistic phenomena (Koller et al., 2000; Koller and Niehren, 2000).They are based on dominance constraints (Mar- cus et al., 1983; Rambow et al., 1995) and extend them with parallelism (Erk and Niehren, 2000) and binding constraints. However, lifting β-reduction to an operation onunderspecified descriptions is not trivial, and to our knowledge it is not known how this can be done. Such an operation –which we will call un-derspecified β-reduction – would essentially β- reduce all described formulas at once by deriv- ing a description of the reduced formulas. In thispaper, we show how underspecified β-reductions can be performed in the framework of CLLS. Our approach extends the work presented in(Bodirskyetal., 2001), whichdefinesβ-reduction constraints and shows how to obtain a complete solution procedure by reducing them to paral- lelism constraints in CLLS. The problem with this previous work is that it is often necessary to perform local disambiguations. Here we add a new mechanism which, for a large class of de-scriptions, permits us to perform underspecified β-reduction steps without disambiguating, and is still complete for the general problem. Plan. We start with a few examples to showwhat underspecified β-reduction should do, and why it is not trivial. We then introduce CLLS and β-reduction constraints. In the core of thepaper we present a procedure for underspecified β-reduction and apply it to illustrative examples.2ExamplesIn this section, we show what underspecified β- reduction should do, and why the task is nontriv-ial. Consider first the ambiguous sentence Everystudent didn’t pay attention. In first-order logic, the two readings can be represented

[email protected] lam ∀ → @

[email protected] [email protected] payattvar¬R1 Y0Y3Y6 Y7Y1 Y2Y4Y5∀ → @

[email protected] [email protected] payattvarvar¬R2Z8Z0Z3Z5Z6 Z7∀ → @

[email protected] payattvar¬R3Figure 1: Underspecified β-reduction steps for ‘Every student did not pay attention’@ @

[email protected] payattvar¬R0X0Figure 2: Description of ‘Every student did not pay attention’∀x(std x → ¬(payatt x)) ¬(∀x(std x → payatt x))A classical compositional semantics constructionfirst derives these two readings in the form of two HOL-formulas:(Every std) λx (¬payatt x) ¬((Every std) λx (payatt x))where Every is an abbreviation for the termEvery = λPλQ(∀x(P x → Q x))An underspecified description of both readings is shown in Figure 2. For now, notice that the graph has all the symbols of the two HOL formulas as node labels, that variable binding is indicated by dashedarrows, andthattherearedottedlinesindi-cating an “outscopes” relation; we will fill in the details in Section 3. Now we want to reduce the description in Fig-ure 2 as far as possible. The first β-reduction step, with the redex at X0is straightforward.Eventhough the description is underspecified, the re- ducing part is a completely known λ-term. The result is shown on the left-hand side of Figure 1. Here we have just one redex, starting at Y0, which binds a single variable. The next reduction step is less obvious: The ¬ operator could either be- long to the context (the part between R1and Y0)@ lam

[email protected] bvara¬

[email protected] b a¬X YU VFigure 3: Problems with rewriting of descriptionsor to the argument (below Y4). Still, it is not dif-ficult to give a correct description of the result:it is shown in the middle of Fig. 1. For the final step, which takes us to the rightmost description, the redex starts at Z8. Note that now the ¬ might be part of the body or part of the context of this redex. The end result is precisely a description ofthe two readings as first-order formulas.So far, the problem does not look too difficult. Twice, we did not know what exactly the parts of the redex were, but it was still easy to derive cor- rect descriptions of the reducts. But this is not always the case. Consider Figure 3, an abstract but simple example. In the left description, there are two possible positions for the ¬: above X or below Y . Proceeding na¨ ı vely as above, we arrive at the right-hand description in Fig. 3. But this de-scription is also satisfied by the term f(¬(b(a))), which cannot be obtained by reducing any of the terms described on the left-hand side. More gen- erally, the na¨ ı ve “graph rewriting” approach is unsound; the resulting descriptions can have too many readings. Similar problems arise in (more complicated) examples from semantics, such as the coordination in Fig. 8.The underspecified β-reduction operation we propose here does not rewrite descriptions. In- stead, we describe the result of the step using a“β-reduction constraint” that ensures that the re- duced terms are captured correctly. Then we use a saturation calculus to make the description more explicit.3Tree descriptions in CLLSIn this section, we briefly recall the definition of the constraint language for λ-structures (CLLS). A more thorough and complete introduction can be found in (Egg et al., 2001). We assume a signature Σ = {f,g,.} of function symbols, each equipped with an arityar(f) ≥ 0. A tree θ consists of a finite set of nodes π ∈ Dθ, each of which is labeled by a sym- bol Lθ(π) ∈ Σ. Each node π has a sequence of children π1,.,πn ∈ Dθwhere n = ar(Lθ(π)) is the arity of the label of π. A single node †, the root of θ, is not the child of any other node.3.1Lambda structuresThe idea behind λ-structures is that a λ-term can be considered as a pair of a tree which represents the structure of the term and a binding function encodingvariablebinding. WeassumeΣcontains symbols var (arity 0, for variables), lam (arity 1, for abstraction), @ (arity 2, for application), and analogous labels for the logical connectives.Definition 1. A λ-structure τ is a pair (θ,λ) of a tree θ and a binding function λ that maps every node π with label var to a node with label lam, ∀, or ∃ dominating π.

[email protected] binding function λ explicitly maps nodes representing variables to the nodes representing their binders. When we draw λ-structures, we rep- resent the binding function using dashed arrows, as in the picture to the right, which represents the λ-term λx.f(x). A λ-structure corresponds uniquely to a closed λ-term modulo α-renaming.We will freelyconsider λ-structures as first-order model struc- tures with domain Dθ.This structure defines the following relations.The labeling relation π:f(π1,.,πn) holds in θ if Lθ(π) = f and πi= πi for all 1 ≤ i ≤ n. The dominance re- lation π/∗π0holds iff there is a path π00such that ππ00= π0. Inequality 6= is simply inequality of nodes; disjointness π⊥π0holds iff neither π/∗π0 nor π0/∗π.3.2Basic constraintsNow we define the constraint language for λ- structures (CLLS) to talk about these relations. X,Y,Z are variables that will denote nodes of a λ-structure.ϕ::=X/∗Y | X6=Y | X⊥Y | ϕ ∧ ϕ0 |X:f(X1,.,Xn)(ar(f) = n) |λ(X)=Y | λ−1(X0)={X1,.,Xn}A constraint ϕ is a conjunction of literals (for dominance, labeling, etc). We use the abbrevi- ations X/+Y for X/∗Y ∧ X 6= Y and X = Y for X/∗Y ∧ Y /∗X.The λ-binding literal λ(X)=Y expresses that Y denotes a node which the binding function maps to X.The inverse λ-binding literal λ−1(X0)={X1,.,Xn} states that X1,.,Xndenote the entire set of vari- able nodes bound by X0. A pair (τ,σ) of a λ-structure τ and a variable assignment σ satisfies aλ-structure iff it satisfies each literal, in the obvi- ous way.lamvarvarXX1X2Figure4:Theconstraintgraphof λ−1(X)={X1,X2} ∧ X/∗X1∧ X/∗X2We draw constraints as graphs (Fig. 4) in which nodes represent variables. Labels and solid lines indicate labeling literals, while dotted lines repre- sent dominance. Dashed arrows indicate the bind- ing relation; disjointness and inequality literals are not represented. The informal diagrams from Section 2 can thus be read as constraint graphs, which gives them a precise formal meaning.3.3Segments and CorrespondencesFinally, we define segments of λ-structures and correspondences between segments. This allowsus to define parallelism and β-reduction con- straints. A segment is a contiguous part of a λ-structure thatisdelineatedbyseveralnodesofthestructure. Intuitively, it is a tree from which some subtrees have been cut out, leaving behind holes.Definition 2 (Segments). A segment α of a λ- structure (θ,λ) is a tuple π0/π1.,πnof nodesin Dθsuch that π0/∗πiand πi⊥πjhold in θ for 1 ≤ i 6= j ≤ n.The root r(α) is π0, and hs(α) = π1,.,πnis its (possibly empty) se- quence of holes. The set b(α) of nodes of α isb(α) = {π ∈ Dτ|r(α)/∗π, and not πi/+π for all 1 ≤ i ≤ n}To exempt the holes of the segment, we define b−(α) = b(α) − hs(α). If hs(α) is a singleton sequence then we write h(α) for the unique hole of α, i.e. the unique node with h(α) ∈ hs(α).For instance, α = π1/π2,π3is a segment in Fig. 5; its root is π1, its holes are π2and π3, and it contains the nodes b(α) = {π1,π5,π2,π3}. Two tree segments α,β overlap properly iff b−(α) ∩ b−(β) 6= ∅. The syntactic equivalent of a segment is a segment term X0/X1,.Xn. We use the letters A,B,C,D for them and extend r(A), hs(A) , and h(A) correspondingly. A correspondence function is intuitively an iso- morphism between segments, mapping holes to holes and roots to roots and respecting the struc- tures of the trees:Definition 3. A correspondence function between the segments α,β is a bijective mapping c : b(α) → b(β) such that c maps the i-th hole of α to the i-th hole of β for each i, and for every π ∈ b−(α) and every label f,π:f(π1,.,πn) ⇔ c(π):f(c(π1),.c(πn)).There is at most one correspondence function between any two given segments. The correspon- dence literal co(C,D)(X)=Y expresses that a correspondence function c between the segments denoted by C and D exists, that X and Y denote nodes within these segment, and that these nodes are related by c.Together, these constructs allow us to define parallelism, which was originally introduced for the analysis of ellipsis (Egg et al., 2001). The par- allelism relation α ∼ β holds iff there is a corre- spondence function between α and β that satis-fies some natural conditions on λ-binding which we cannot go into here. To model parallelism in the presence of global λ-binders relating multiple parallel segments, Bodirsky et al. (2001) general- ize parallelism to group parallelism. Group par- allelism (α1,.,αn) ∼ (β1,.,βn) is

[email protected]@

[email protected]π0π00 π1π5π01π2π4π3π03Figure 5: f ((λx.z(x))(a)) →βf (z(a))by the conjunction ∧ni=1αi∼ βiof ordinary par- allelisms, but imposes slightly weaker restrictions on λ-binding. By way of example, consider the λ- structure in Fig. 5, where (π0/π1,π2/π4,π3/) ∼ (π00/π01,π01/π04,π04/) holds. Onthesyntacticside,CLLSprovides groupparallelismliterals(A1,.,An)∼ (B1,.,Bn) to talk about (group) parallelism.4Beta reduction constraintsCorrespondences are also used in the definition of β-reduction constraints (Bodirsky et al., 2001). A β-reduction constraint describes a single β- reduction step between two λ-terms; it enforces correct reduction even if the two terms are only partially known. Standard β-reduction has the formC((λx.B) A) →βC(B[x/A])x free for A.The reducing λ-term consists of context C which contains a redex (λx.B)A. The redex itself is an occurrence of an application of a λ-abstraction λx.B with body B to argument A. β-reduction then replaces all occurrences of the bound vari- able x in the body by the argument while preserv- ing the context. We can partition both redex and reduct into ar- gument, body, and context segments. Consider Fig. 5. The λ-structure contains the reducing λ- termf ((λx.z(x))(a))startingatπ0. Thereduced term can be found at π00. Writing γ,γ0for the context, β,β0for the body and α,α0for the ar- gument tree segments of the reducing and the re-duced term, respectively, we find γ = π0/π1β = π2/π4α = π3/ γ0= π00/π01β0= π01/π03α0= π03/Because we have both the reducing term and the reduced term as parts of the same λ-structure, we can express the fact that the structure below π00 can be obtained by β-reducing the structure be- low π0by requiring that α corresponds to α0, β to β0, and γ to γ0, again modulo binding. This is indeed true in the given λ-structure, as we have seen above.More generally, we define the β-reduction re- lation(γ,β,α)β− → (γ0,β0,α01,.,α0n)for a body β with n holes (for the variables bound in the redex). The β-reduction relation holds iff two conditions are met: (γ,β,α) must form a re- ducing term, and the structural equalities that we have noted above must hold between the tree seg- ments. The latter can be stated by the following group parallelism relation, which also represents the correct binding behaviour:(γ,β,α,.,α) ∼ (γ0,β0,α01,.,α0n)Note that any λ-structure satisfying this relation must contain both the reducing and the reduced term as substructures. Incidentally, this allows us to accommodate for global variables in λ-terms; Fig. 5 shows this for the global variable z. We now extend CLLS with β-reduction con- straints(C,B,A)β− → (C0,B0,A01,.,A0n),which are interpreted by the β-reductionrelation. The reduction steps in Section 2 can all be represented correctly by β-reduction constraints.Consider e.g. the first step in Fig. 1. This is repre-sented by the constraint (R1/Y0,Y2/Y3,Y4/)β− → (R2/Z0,Z0/Z3,Z3/).The entire middle con- straint in Fig. 1 is entailed by the β-reduction lit- eral. If