G61.830-001 Introduction to Programming for Linguists
Prof. Ray C. Dougherty
COURSE FINAL PROJECT
This assignment may be in any human language: English, Spanish, Hebrew, etc.
|1.0 Basic Problem Statement|
In the Minimalist Program (Chomsky 1995), there are essentially only two levels of linguistic description, a Logical Form (LF) corresponding to aspects of meaning, and a Phonetic Form (PF) corresponding to aspects of sound. There is no internal structure (phrase structure) to the PF. The derivational mechanisms that link the PF with the LF may or may not define phrase structure (or some other structure). In order to illustrate the types of problems that arise in a Minimalist Parser, we will produce a Prolog program that relates a 'semantic representation' defined as a clock and a calendar, with a 'phonetic representation', or better an 'orthographic representation' - a string of words (in orthographic notation) that can paired with the clock and calendar.
|Semantic Representation||Numeric Form||Phonetic/Orthographic Form|
|Clock, Calendar||18:45 4/29/96 [18,45,4,29,1996]||at six forty five P.M. on April twenty ninth in ninteen hundred and ninty six at quarter of seven at night on the twenty ninth of April in ninteen ninty six at forty five minutes past/after six on the evening of April the twenty ninth in ninty six|
One might try to link the Phonetic Form to a graphical display of a clock face and a calendar with a day marked off, that is, link the Phonetic Form directly to a Semantic Representation. But a simpler project is to assume that the Semantic Representation correlates with a Numerical Form in which the time and date information is presented as a sequence of numbers. Some countries might have the numbers in different orders: April 18, 1996 might be 18/4/96 some places. But it is 4/18/96 in New York City, so we will use this order.
Notice in the example in the chart, the semantic representation does not match the Numeric Form. The NF requires the big hand on the 9 and the little hand on the 6. The correct NF for the semantic representation given is either [10,10,4,29,1996] or [22,10,4,29.1996]. The Numeric Form does match the Orthographic Form. The main focus of this study is to pair a Numeric Form with an Orthographic Form.
|1.1 A Notation in Which to Represent Information
|1.2 The Program You will Write: datetime(NF,PF)|
The program datetime(NF,PF) that be will be the final project will pair a Numerical Form with a Phonetic Form. datetime([12,0,1,1,2001],at_noon_on_New_Year's_day_at_the_millenium). datetime([18,45,4,15,1996],at_quarter_of_seven_on_April_fifteenth_ninteen_hund red_ninty_six). datetime([18,45,4,15,1996],at_six_fourty_five_on_income_tax_day_in_ninty_six). Such a program is neutral between generation, i.e., given a NF, it produces a PF, and recognition, i.e., given a PF, it produces a NF: Generation, link a NF to a PF: (in general, a one to many pairing) datetime([12,0,1,1,2001],X). X = on_the_first_of_January_at_12_o'clock_high_noon_in_two_thousand ; X = on_the-first_of_January_at_high_noon_in_the_year_two_thousand ; and so on. Recognition, link a PF to an NF: (in general, a one to one/two [24hr/AM/PM] pairing) datetime(X,at_noon_on_the_first_of_January_in_two_thousand). X=[12,0,1,1,2001].
|1.3 The Program will Be Written in Five Stages|
There are essentially no problems involved in defining the possible notations at the level of NF. Each object at NF is a list of six numbers: [Hour,Minute,Month,Day,Year], where all numbers are integers. Hour is between 0 and 24. Minute between 0 and 59. Month between 1 and 12. Day between 1 and 31. Year is any integer. From this information one can calculate the day, e.g., Monday, Tues, and so day names are not in the list.
|Possible Values for the Variables at the Level of Numerical Form|
|Values||0, 1, 2... 23||0, 1, 2... 60||1, 2... 12||1, 2... 31||any integer||sunday...|
|1.3.1 The First Part of the Project: (PF,NF) Pairs as Primary Data|
PF is for us an orthographic string of words, where the words are selected from a finite fixed list (the lexicon). The words can be marked for category, e.g., prep([on]), prep([at]), noun([noon]), adj([high]), noun_proper([august]) - notice that capital letters are only used as variables in Prolog.
There are constraints governing the order of elements at PF: at noon on August (the) fourth, at noon on the fourth of August, at noon on fourth of August, and so on. There are also idiomatic expressions to be included: at quarter past six, at half past six, at three quarters past six, at a quarter after six, at a half after six, and so on.
The first part of the project is to compile a listing of relevant forms that our program must analyze. Compile a list of pairs, where each pair will be in this form:
on the third of April in ninty five at six o'clock in the afternoon [18,0,4,3,1995] at nine forty five AM on October twelfth of ninty six [9,45,9,12,1996] in ninty three on the third of June about ten minutes after noon [12,10,6,3,1993]
|1.3.2 The Second Part of the Project: The Possible Notations at the Level of PF|
The second part of the project is to write a grammar in Prolog that defines the possible strings at the level of PF. This will involve (a) constructing a lexicon of elements and (b) defining a set of distributional constraints (X-bar relations, phrase structure rules, logical constraints, merge, and so on) to specify how the elements can be combined to yield larger elements, e.g., how prep([at]), adj([high]), and noun([noon]) can combine to yield a prepositional phrase with the order of elements at high noon.
The grammar of English time expressions, which will be encoded into Prolog, must incorporate selection restrictions, cooccurrence constraints, word order constraints, and so on. Phrase like these need attention: at six A.M. in the afternoon, at seven thirty o'clock, ?in the morning in ninteen ninty six at seven o'clock on July fourth.
|1.3.3 The Third Part of the Project: Defining the Pairing for the Calendar and Date Expressions|
The pairing operation for linking the NF expression [_,_,Month,Date,Year] should be developed to link Month, Date, and Year variables on a calendar
|1.3.4 The Fourth Part of the Project: Defining|
The pairing operation for the time expressions is much harder than that for the calendar because the numbers in the time expressions can change. For instance at ten after seven yields a 10 and a 7 as in [7,10,_,_,_], but at ten before seven yields a 50 and a 6 as in [6,50,_,_,_]. At six fifty translates directly: [6,50,_,_,_].
|1.3.5 The Fifth Part of the Project: Write up Documentation, Give Scriptfiles, Turn in the Project|
|1.4 The Format of Your Final Paper|
1. Title Page: Title, Your name, Course Number, Date, Number all pages sequentially. 2. Abstract: A one paragraph description of your project. 3. Discussion of the problem and its linguistic significance. 4. Discussion of your program and how it solves the problem. Include script files which illustrate the execution of your Prolog code. 5. Listing of your Prolog code, courier monospace type 10cpi, include comment lines if needed (a) lexicon (b) definitions of phrases and relations 6. Your Prolog code. Send it to me by e-mail or give me a disk containing the code.
Your code must run on my computing machine: the ACF4 Unix or the IBM PC or Mac.
1.0 | 1.1 | 1.2 | 1.3 | 1.4