Computational Principles of Sentence Construction
V61.0024 - 001 73128


Prof. Ray C. Dougherty
New York University | Linguistics Department

This course description is available in postscript and in zipped postscript.

HELP || LING || COURSES || Basics | Overview | Syllabus | Project | Students

Preliminary Version
Final Version Will be Given Out In Class

The course lectures are organized into five topics. At the end of each set of lectures, there is a student project due. The project may be given to me on an IBM compatible disk or placed into your WWW directory.

(RES) means the article in on reserve in Bobst and at 719 Broadway, fifth floor. (HARD) means this article is difficult and is for reference. (SCRIPT) is a scriptfile handed out in class. (WORKBOOK) is an except from the Introductory Workbook in Computational Linguistics.

I. What is Natural Language Computing?

1. Introduction to Computational Linguistics and Sentence Processing
In Part I, students will run existing programs provided in class, from the disk provided with the book Natural Language Computing, and from the WWW. In Parts II-V students will modify existing programs and write their own.

2. Diagramming Sentences: Representations and Derivations.
We will begin the course by following Chomsky's ideas about phrase markers, constituent structure, and derivations. Later we will develop a bottom-up parser that projects all structures from lexical items.

3. Animal, Human, and Machine Intelligence.
Crucial in Chomsky's theory is the idea of level. Each level is defined by a lexicon of elements and principles of combination that define the possible adjunctions and merges of elements. This course focuses on implementing the permutation (displacement) and combination (merge, adjunction) operations that function in human language into Prolog relations.

4. The Role of Intelligent Machines in the Information Society.
The readings provide resource materials for the course final project. We discuss Chomsky's idea that an I-language (grammar) is like a switch. We show how a parser links a written (orthographic) string with a logical form (meaning).

5. Working Examples of Artificial Intelligence Machines.
We discuss websites that provide rather complicated examples of parsers, translation machines, pronouncing dictionaries, and so on.


For this project, you have been given a set of sentences in a file on the NYU web sever. You are required to run these through the parsers discussed in class, output the results to a file, and print the file. You are expected to turn in the printout resulting from parsing the sentences.

II. Chomsky's Grammar, Prolog, and the WWW

6. How to Read and Write in Prolog.

7. How to Load, Run, and Edit a Prolog Program.

8. Prolog as a Tool for Linguistic Research.

9. Tables of Data as Prolog Facts and Relations.

10. German and French Agreement: Tense and Case.
We focus on agreement phenomena such as: (English) I am/*are, you *am/are; (French) je suis/*est, elle *suis/est; and (German) das alte Buch, ein altes Buch, *eine altes Buch, *dem alter Buch. We examine agreement of person, number, gender, case, tense, and so on between NP and VP and internal to NP and VP.


You are expected to write a short Prolog program in which the quantifiers (each, all, etc.), the coordinating conjunctions (and, or, nor), and some distributional adverbs (simultaneously, together, etc.) are presented as facts (simple or complex) in a Prolog database. You should also define some relations that indicate selection and cooccurrence restrictions between the quantifiers, coordinations, and adverbs.

III. Basic Tools for Computational Linguistics

11. How Prolog Backtracks in Searches.
We discuss the computational, linguistic, and psycho-linguistic significance of 'garden path sentences' such as: 'The horse raced past the barn died,' and 'The artist sold the painting admired it.'

12. Pretty Printers: Formatting a Bibliography.

13. Irregular Verb Morphology in English.

14. Prolog and Logical Constraint Based Grammars.


You will write a short Prolog grammar (database/lexicon) and principles of combination that defines strings of quantifiers, coordinations, and adverbs that cooccur. It will recognize: (each, and, independently), as in John and Mary each will leave independently, but not *(each,and,simultaneously), as in *John and Mary each will leave simultaneously. It will recognize (all, and), but not *(all, or). I am mainly concerned that you realize how to use the unfication properties of Prolog to insure cooccurrence and selection. If you feel ambitious, you may write a program to parse sentences and phrases incorporating these quantifiers, coordinations, and adverbs.

IV. Computing the Data Structures of Human Language.

15. Computational Tools for Language Processing.

16. Morphological Parsers: Affixes, Suffixes, Stems, and Roots.

17. Recursion: The Analysis and Decomposition of Words

18. The Lexicon, Exceptions, and Productive Principles.

19. Chomsky's Minimalist Program: Morphology and the Lexicon.


You will produce a Prolog program that parses coordinations of noun phrases and plural noun phrase constructions that may incorporate quantifiers, coordinations, plurals, and adverbs. Your parser will recognize: (john,and,mary,each,independently) but not *(john,or,mary,each,simultaneously). If you wish, you may include a parser that provides pretty printer to indicate the scope of the quantifiers and adverbs. Such pretty printers have been discussed in class and are readily available on the web, but you may write your own if you wish.

V. Computing Human Sentence Structures.

20. Computational Tools for Sentence Processing.

21. Lexically Driven Syntactic and Semantic Parsers.

22. Top-Down Versus Bottom-Up Parsing.

23. Lexically Specified Complements and Selection.

24. Computational Linguistics on the WWW.

The final project is due the day, hour, and minute that the registrar has scheduled the final exam. You may turn it in earlier than this, but not later.

You will write a Prolog program (lexicon and principles of combination) that parses embedded noun phrase coordinations such as: (mary,and,sue,or,jane), which might have a logical form like ((mary and sue) or jane) or (mary and (sue or jane)). The input to your parser will be strings of nouns, the coordinations (and, or, nor), and quantifiers (each, all, etc.), such as: (either,tess,or,tracy,and,sean), (neither,tess,and,tracy,nor,sean), and so on. Your parser will assign each of these the correct logical form(s), indicating ambiguities where relevant. It will fail to link strings like this to any logical form: *(and,or,tess,tracy,neither,sean). Use the pretty printers to output a readable logical form for the input sentences.