G61.1830-01 31508
Introduction to Programing for Linguists
Prof. Ray C. Dougherty
LECTURE SCHEDULE
Tentative syllabus. The final syllabus will be available by September 1996.
Syllabus

Reading List:
     Chomsky, Noam. 1966. Language and Mind. New York: Harcourt Brace
     Dougherty, Ray C. 1994. Natural Language Computing: An English Generative Grammar in  
          Prolog. Hillsdale, N.J.: Lawrence Erlbaum Inc.
     Gazdar, Gerald and Mellish, Chris. 1989. Natural Language Processing in Prolog: An Introduction   
           to  Computational Linguistics New York: Addison Wesley Inc.

All supplemental readings, and all Internet materials, are on reserve in Bobst Library and in the Linguistics Department Library, 719 Broadway, Fifth Floor. Weekly readings are assigned in Dougherty and Gazdar & Mellish.

Check the Academic Computing Facility for lab hours

Each student will complete the course project, which will be given out at the first class meeting and also is described in some detail on the page:

The class will be held in a computerized classroom. Many of the lectures will include computer screens that will be projected onto an overhead screen so students can see programs being executed.
Introduction to Minimalist Grammar and Logical Programming

  1. Introduction to Programming Syntax and Semantics in Prolog. Overview of the course, materials, the world wide web, HTML, Prolog, linguistics, and levels in a minimalist framework. We will discuss the project and why it is relevant to explicate concepts I in the mininimalist framework.
    	
              (a)  This course 
              (b)  The NYU Academic Computing Facility 
              (c ) Dougherty, Preface, Introduction: What is Computational Linguistics? How to Use this  
                    Book, pp. I-xlvi.
              (d)  Dougherty, Ch. 1, Natural Intelligence Linguistics and Prolog, pp. 1-23
    
  2. How to Read and Write in Prolog: Representing Linguistic Notations and Processes. Introduction to Prolog as a notation for representing linguistic data, processes, and constraints. Description and illustration of various Prolog tools. We discuss (and illustrate with the overhead projection system) how to run Prolog on the Unix, IBM PC, and Macintosh systems.
          
               (a)  http://www.nyu.edu/pages/linguistics/ling.html
              (b)  Dougherty, Ch2., How to Read and Write in Prolog, pp. 24-51
              (c)    Dougherty, Ch3., How to Load, Run, and Edit a Prolog Program, pp. 52-82.
              (d)  Gazdar and Mellish, Introduction, pp. 1-20
    
  3. Morphology: A Lexicon and Constraints on Combination. We will discuss Prolog programs that relate singulars and plurals, present and past tenses of verbs, form compound nouns like American history teacher, and complex words like undecomposability.
          
              (a)  http://www.nyu.edu/pages/linguistics/anlcbk.html
              (b)   http://www.nyu.edu/pages/linguistics/parsers.html
              (c)    Dougherty, 7.1. The Levels of Human Language Structure, pp. 161-179
              (d)  Dougherty, 7.2. Morphological Parsers, pp. 180-194.
    
    Part 1 of Project: (PF,LF) Pairs as Primary Data
    Turn in your paper at the fourth class meeting
  4. Merge: X-Bar, Phrase Structure, and Prolog Constraints. We discuss simple context free phrase structure grammars, context sensitive phrase structure grammars, X-bar grammars, lexical structure, and the constraints on representations and processes. We present X-bar type grammars in Prolog that define structures as projections of lexical items.
              (a)  http://www.nyu.edu/pages/linguistics/marc/parsshk.html
              (b)  Dougherty, 7.3. Recursion: Affixes on the Affixes, pp. 195-215
              (c)    Dougherty, 7.4. Regular and Irregular Morphology, pp. 216-228
              (d)  Dougherty, 7.5. The Minimalist Framework, pp. 229-238
    
  5. Merge: Generating The Structure of Simple Sentence and Phrases. Chomsky offers a simple mechanism, called merge, that defines a structural relation between two nodes and defines a third node that is a label for the two related nodes. We show how to interpret Chomsky's ideas about constituency and node relations in terms of Prolog operations.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/german.html 
              (b) http://www.nyu.edu/pages/linguistics/parsers.html
              (c)    Dougherty, 8.1. Syntax: Representations and Parsers, pp. 239-253
              (d)  Dougherty, 8.2. Rule Governed Creativity: Derivations, pp. 254-265
              (e)  Gazdar and Mellish, 5.1. A Simple Parsing Problem, pp. 143-144
              (f)  Gazdar and Mellish, 5.2. Bottom-up Parsing, pp. 144-151
              (g)  Gazdar and Mellish, 5.3. Top-down Parsing, pp. 152-156
    
    Make an Appointment with the Proffessor to Discuss the Project
    Office Hours or By Appointment:
  6. Features, Feature Structures, and Verb Selection Constraints. Up until now we have discussed only constituent and relational structure among nodes. Now we start to express inter-constituent selections and cooccurrence restrictions. We discuss verbs that are intransitive, that are transitive and require an NP complement, a sentence complement, oranother type of complement. Examples of coordinate restrictions will be discussed: both Mary and Sue, *both Mary or Sue.
     
             (a)  http://www.nyu.edu/pages/linguistics/ling.html
              (b)  Dougherty, 8.3. Parsers Assign Structure to an Ordered String, pp. 266-275
              (c)    Dougherty, 8.4. Top Down and Bottom Up Parsing, pp. 276-302
              (d)  Gazdar and Mellish, 5.5. Comparing Strategies, pp. 165-166
              (e)  Gazdar and Mellish, 5.6. Breadth-first and Depth-first Search, pp. 166-168
              (f)  Gazdar and Mellish, 5.7. Storing Intermediate Results, pp. 168-169
              (g)  Gazdar and Mellish, 5.8. Ambiguity, pp. 169-173
              (h)  Gazdar and Mellish, 5.9. Determinism and Lookahead, pp. 174-179
    
  7. Prepositional Phrases: In a house on a hill near a river on public land in Jersey. The time adverbial phrases that form the substance of your term project will include prepositional phrases such as: at six o'clock in the evening on the third of July in 1966. How can a Prolog program recognize or generate embedded or coordinated or appositive prepositional phrases?
    (a)   http://www.nyu.edu/pages/linguistics/courses/g611830/lecture7.html 
              (b)  Look up 'Recursion' in Dougherty and read the references.
              (c)    Look up 'Recursion' in Gazdar and Mellish and read the references.
    
    Part 2 of Project: Possible Notations at the Level of PF
    Turn in your paper at the eighth class meeting
  8. The Calendar and Time Expressions: Semantics, Pragmatics, Syntax. We will place a calendar on the computer as a database: calendar([1,1,january,monday,1944]). calendar([32,1,february,wednesday,1944). And so on. We will see how to relate the information in such a database to phrases like: on the thirty second day of nineteen forty four, Wednesday, February the first.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/lecture8.html
              (b)  Dougherty, Ch 4. Tables of Data as Prolog Facts and Relations, pp. 83-110
              (c)    Peirce, C.S. How to Make Our Ideas Clear, on reserve in the library.
    

    At this point, you have learned (or at least been exposed to) enough Prolog tools to complete the course project. Dougherty, Natural Language Computing, does not discuss how one might optimally represent features and complex node types in Prolog. Essentially, all feature structures (and node labels) are treated as list structures. For those who want to see alternatives, we will start to examine some of the techniques and procedures presented in Gazdar and Mellish, Natural Language Processing in Prolog. You should regard G&M as a cookbook of techniques, sort of like an encyclopedia of Prolog processes and data structures, that can more or less match the structures one finds in human languages.

  9. Compositional Semantics: Nodes as Features of Nodes Below Them. The data and the month information, which are combined into a single item at one level may be spread over two prepositional phrases in the orthographic/phonetic form. On the thirteenth day of August in the year of nineteen sixty six contains the 13 in the phrase on the thirteenth day, 8 in the phrase of August, and 1966 in the phrase in the year of nineteen sixty six. These three numbers can be spread across the PF/OF in different orders: in nineteen sixty six on August the thirteenth, in sixty six on August thirteenth, and so on.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/lecture9.html
              (b)  Dougherty, 8.5. Horizontal Appends: Complement Structures, pp. 303-318
              (c)    Dougherty, 8.6, Vertical Appends: Selection Restrictions, pp. 319-327
              (d)  Gazdar and Mellish, 4.1. Grammar as Knowledge Representation, pp. 100-103
              (e)  Gazdar and Mellish, 4.2. Words, Rules and structures, pp. 104-109
              (f)  Gazdar and Mellish, 4.3. Representing Simple Grammars in Prolog, pp. 110-114  
    
  10. Feature Percolation: Passing Features from Node to Node. In order to obtain a Numeric Form such as [_,_,8,13,1966] for a variety of orthographic/phonetic forms, the parser must be able to accept information in a variety of different linear and hierarchical orders. The parser must be able to input information and move it from node to node as it "composes" the Numerical Form for the time expression.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/lecture10.html
              (b)  Dougherty, Ch. 5. How Prolog Backtracks in Searches, pp. 111-140
              (c)    Gazdar and Mellish, 4.4. Subcategorization and the Use of Features, pp. 115-126
              (d)  Gazdar and Mellish, 4.5. Definite Clause Grammars, pp.127-131
              (e)  Gazdar and Mellish, 4.6. Classes of Grammars and Languages, pp. 132-142
     
    Part 3 of Project: Definig the Pairing for the Calendar and PF Notations
    Turn in your paper at the fourth class meeting
  11. The Clock and Time Expressions: Notations Containing Operators. The clock time expressions are more complicated than those relating to the calendar because of the inherent ambiguity in a twelve hour clock and the idiosyncratic ways in which the minutes are related to the hours in time expressions. *three quarters past six, *forty minutes till seven, and so on. Ten minutes till seven will be represented as [6,50,_,_,_], a representation that contains neither a 7 nor 10.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/lecture11.html 
              (b)  Gazdar and Mellish, 8.1. Semantics: Compositionality, pp. 280-281
              (c)    Gazdar and Mellish, 8.2. Meaning as Reference, pp. 283-287
              (d)  Gazdar and Mellish, 8.3. Translation to a Meaning Representation Language, pp. 288-292
              (e)  Gazdar and Mellish, 8.4. A Database Query Language, pp. 290-293
    
  12. Calculations on Features Percolated Among Nodes. Some of the feature percolation mechanisms must perform calculations on the lower features and pass the computed features up the tree, or maybe sometimes down the tree, depending on your assumptions about constituent structure.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/lecture12.html
              (b)  Gazdar and Mellish, 8.5. Computational Semantics as Feature Instantiation, pp. 293-294
              (c)    Gazdar and Mellish, 8.6. Transitive Verbs and Quantification, pp. 295-301
              (d)  Gazdar and Mellish, 8.7. Ambiguity, Preferences, and Timing, pp. 301-302
              (e)  Gazdar and Mellish, 8.8. Building Semantic Checking in the Grammar, pp. 303-308
    
    Part 4 of Project: Defining the Pairing for the Clock and PF Notations
    Turn in your paper at the thirteenth class meeting
  13. Presentation of a Rude Time Parser for Clock and Calendar. We will present and discuss a rather primitive attempt to solve the project. The main function of this class is to show the format in which student projects should be submitted in their final form. All projects must include documentation, a discussion of the problem, a bibliography, and so on in addition to the computer program and scriptfile samples of executions. The most elegant formulation of our project would be in terms of a feature structure grammar.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/lecture13.html
              (b)  Gazdar and Mellish, 7.1. Feature-Theoretic Syntax , pp. 218-220
              (c)    Gazdar and Mellish, 7.2. Feature Structures as Graphs, pp. 221-227
              (d)  Gazdar and Mellish, 7.3. Feature Structures in Prolog, pp. 228-229
              (e)  Gazdar and Mellish, 7.4. Subsumption and Unification, pp. 230-237
              (f)  Gazdar and Mellish, 7.5. The Status of Rules, pp. 238-239
    
  14. Encoding Your Project as a Set of HTML Pages on the World Wide Web. It is possible to present your project as a set of HTML pages and place it on the NYU Linguistics Department site. You will see the problems involved and learn how to do this. We will discuss some of the grammars (HPSG, LFG, and so on) implemented on the WWW in HTML. Some of them exhibit the type of programming discussed by Gazdar and Mellish.
              (a)  http://www.nyu.edu/pages/linguistics/courses/g611830/lecture14.html
              (b)  Gazdar and Mellish, 7.6.  Implementing PATR in Prolog, pp. 240-247
              (c)    Gazdar and Mellish, 7.7. Chart Parsing with Feature-Based Grammars, pp. 248-255
              (d)  Gazdar and Mellish, 7.8. Representation of Lexical Knowledge, pp.256-269
              (e)  Gazdar and Mellish, 7.9. Implementing a Lexicon in Prolog, pp. 270-272
              (f)  Gazdar and Mellish, 7.10. DAG's Versus Terms, pp. 273-275
    
    
  15. Part 5of Project: Turn in the Final Polog Program


    Intro | Part 1| Part 2 | Part 3 | Part 4 |Part 5