Computational Principles of Sentence Construction |
|---|
The course lectures are organized into five topics. At the end of each set of lectures, there is a student project due. The project may be given to me on an IBM compatible disk or placed into your WWW directory.
(RES) means the article in on reserve in Bobst and at 719 Broadway, fifth floor. (HARD) means this article is difficult and is for reference. (SCRIPT) is a scriptfile handed out in class. (WORKBOOK) is an except from the Introductory Workbook in Computational Linguistics.
1. Introduction to Computational Linguistics and Sentence
Processing
In Part I, students will run existing programs
provided in class, from the disk provided with the book Natural Language
Computing, and from the WWW. In Parts II-V students will modify existing
programs and write their own.
2. Diagramming Sentences: Representations and
Derivations.
We will begin the course by following
Chomsky's ideas about phrase markers, constituent structure, and derivations.
Later we will develop a bottom-up parser that projects all structures from
lexical items.
3. Animal, Human, and Machine Intelligence.
Crucial in Chomsky's theory is the idea of level. Each level is defined by a
lexicon of elements and principles of combination that define the possible
adjunctions and merges of elements. This course focuses on implementing the
permutation (displacement) and combination (merge, adjunction) operations that
function in human language into Prolog relations.
4. The Role of Intelligent Machines in the Information
Society.
The readings provide resource materials for the
course final project. We discuss Chomsky's idea that an I-language (grammar) is
like a switch. We show how a parser links a written (orthographic) string with a
logical form (meaning).
5. Working Examples of Artificial Intelligence Machines.
We
discuss websites that provide rather complicated examples of parsers,
translation machines, pronouncing dictionaries, and so on.
For this project, you have been given a set of sentences in a file on the NYU web sever. You are required to run these through the parsers discussed in class, output the results to a file, and print the file. You are expected to turn in the printout resulting from parsing the sentences.
6. How to Read and Write in Prolog.
7. How to Load, Run, and Edit a Prolog Program.
8. Prolog as a Tool for Linguistic Research.
9. Tables of Data as Prolog Facts and Relations.
10. German and French Agreement: Tense and Case.
We
focus on agreement phenomena such as: (English) I am/*are, you *am/are;
(French) je suis/*est, elle *suis/est; and (German) das alte Buch, ein altes
Buch, *eine altes Buch, *dem alter Buch. We examine agreement of person, number,
gender, case, tense, and so on between NP and VP and internal to NP and VP.
You are expected to write a short Prolog program in which the quantifiers (each, all, etc.), the coordinating conjunctions (and, or, nor), and some distributional adverbs (simultaneously, together, etc.) are presented as facts (simple or complex) in a Prolog database. You should also define some relations that indicate selection and cooccurrence restrictions between the quantifiers, coordinations, and adverbs.
11. How Prolog Backtracks in Searches.
We
discuss the computational, linguistic, and psycho-linguistic significance of
'garden path sentences' such as: 'The horse raced past the barn died,' and 'The
artist sold the painting admired it.'
12. Pretty Printers: Formatting a Bibliography.
13. Irregular Verb Morphology in English.
14. Prolog and Logical Constraint Based Grammars.
You will write a short Prolog grammar (database/lexicon) and principles of combination that defines strings of quantifiers, coordinations, and adverbs that cooccur. It will recognize: (each, and, independently), as in John and Mary each will leave independently, but not *(each,and,simultaneously), as in *John and Mary each will leave simultaneously. It will recognize (all, and), but not *(all, or). I am mainly concerned that you realize how to use the unfication properties of Prolog to insure cooccurrence and selection. If you feel ambitious, you may write a program to parse sentences and phrases incorporating these quantifiers, coordinations, and adverbs.
15. Computational Tools for Language Processing.
16. Morphological Parsers: Affixes, Suffixes, Stems, and
Roots.
17. Recursion: The Analysis and Decomposition of Words
18. The Lexicon, Exceptions, and Productive Principles.
19. Chomsky's Minimalist Program: Morphology and the
Lexicon.
You will produce a Prolog program that parses coordinations of noun phrases and plural noun phrase constructions that may incorporate quantifiers, coordinations, plurals, and adverbs. Your parser will recognize: (john,and,mary,each,independently) but not *(john,or,mary,each,simultaneously). If you wish, you may include a parser that provides pretty printer to indicate the scope of the quantifiers and adverbs. Such pretty printers have been discussed in class and are readily available on the web, but you may write your own if you wish.
20. Computational Tools for Sentence Processing.
21. Lexically Driven Syntactic and Semantic Parsers.
22. Top-Down Versus Bottom-Up Parsing.
23. Lexically Specified Complements and Selection.
24. Computational Linguistics on the WWW.
You will write a Prolog program (lexicon and principles of combination) that parses embedded noun phrase coordinations such as: (mary,and,sue,or,jane), which might have a logical form like ((mary and sue) or jane) or (mary and (sue or jane)). The input to your parser will be strings of nouns, the coordinations (and, or, nor), and quantifiers (each, all, etc.), such as: (either,tess,or,tracy,and,sean), (neither,tess,and,tracy,nor,sean), and so on. Your parser will assign each of these the correct logical form(s), indicating ambiguities where relevant. It will fail to link strings like this to any logical form: *(and,or,tess,tracy,neither,sean). Use the pretty printers to output a readable logical form for the input sentences.