New York
University
Computer
Science Department
Courant
Institute of Mathematical Sciences
Course Title: Application Servers Course Number: g22.3033-011
Instructor: Jean-Claude Franchitti Session: 2
1. Ongoing
Project Background
This homework follows the course on-going project approach described in the homework #1 specification. You should keep enriching your framework-based enterprise application design, and take advantage of the application server platforms used to implement/deploy your application. Your application should be developed in such a way as to be shielded as much as possible from the underlying software infrastructure. For that reason, it is a good idea to build your applications around a portable application framework. As portability issues arise, you will learn how to improve the design of your applications to make them more portable across application server platforms.
2.
Introduction to Page-Based Tag-Oriented Application Server Technology
Page-based application server technology seeked to
improve the limitations of traditional dynamic HTML approaches based on first
generation application servers. The quest for a flexible and extensible approach
led to extension of HTML that would support special tags to be processed by the
web/application server. Two categories of tags were developed to support either
custom HTML tags with pre-defined semantics and user defined scripts.
ColdFusion 5.0/MX 6.1 is an example of a tag-oriented page-based application server.
ColdFusion provides many features, and includes in particular search
capabilities based on the Verity search engine. The goal of this second
homework is to deploy the framework-based application developed as part of
homework #1 on top of ColdFusion, and extend it to make use of ColdFusion’s
search capabilities.
2.1. Software Infrastructure Provided:
1.
Environment
used for homework #1 as needed
2.
ColdFusion
5.0/MX 6.1 server
3.
Other
development tools best suited for page-based application server development
environments (as identified in homework #2a)
1.
Code
samples are provides as part of the sample program description in the following
Additional
Sample Applications
1.
No
additional sample applications are provided for this homework
2.2. Installing and Running the Homework Software Infrastructure:
1.
Install
the ColdFusion 5.0/MX 6.1 server, and any other development tool judged useful
2.
Deploy
the sample application described in part 3 below
3.
ColdFusion Sample
Application
As an increasing number of companies converted their print material to electronic formats, search engines became much more important. Without them, sifting through large volumes of documents can be very time intensive and frustrating. As engines become increasingly advanced, search processes will become more efficient. In the near future, you may see natural-language processing elements trickle down into existing search applications Natural language processing lets the user put queries into standard English (for example, "Where can I find newsletters on e-commerce").
Historically, search engines have been either labor
intensive—such as Microsoft's Index Server, and wide area information servers
(WAIS)—or expensive, or both. Unfortunately, the free search engines, like
Excite's EWS, have not been very flexible. Macromedia’s ColdFusion provides a
middle ground solution, which lets you create a very robust search engine using
the integrated Verity Search97/K2 software. Although Search97/K2 isn't a
full-featured Verity search engine, this combination is extremely flexible,
letting you create an internal search feature (of your Web pages, news
articles, and other content) with relatively minimal effort and configuration.
The integrated search capability supports several file formats including HTML,
PDF, and Microsoft Word. ColdFusion and Verity can also be used to search SQL
databases.
This section will guide you through the creation of a basic search engine and a more advanced example combining lexical and META tag searches. These applications could be written with many other tools—Microsoft's Index Server using Active Server Pages, Perl and WAIS, or AltaVista Search Developer's Kit. In general, the Perl/WAIS solution is the least expensive in terms of software costs (it is free), while AltaVista's solution is on the expensive side. ASP could be a good solution for those of you who are familiar with Visual Basic or VB-centric applications, but it does require more lines of code than the ColdFusion/Verity solution. The big advantage of the latter is that it is fairly easy to set up and requires little maintenance.
Note that the important steps to follow are indicated in italics/bold in section I below. There are five steps altogether. Additional information on ColdFusion on optimizing Verity collections can be obtained from the ColdFusion documentation available via the Macromedia web site.
3.1. Basic Search
Capability
The basic flow and structure for a ColdFusion search
is as follows:
Before you write any search forms, you'll have to
specify which documents you want to search. Verity's search engine technology
uses collections, which are Verity's way of naming and tracking a group of
documents that can be searched. For example, if you want to search all the news
items on your Web site, you can create a collection of news documents by using
the ColdFusion Administrator. Once you specify the collection, you'll use the
Administrator to create an associated index. Indexing the collection is the act
of adding documents and creating a searchable index.
Step1: To experiment with the search interface outlined here, you will need to create a few sample news-related HTML documents that include the text illustrated on Figure 2. You will then need to create a collection of all these news-related documents and post them on a Web site. You will then index this collection and create a primary search page that will act as the user interface for visitors to your site as illustrated on Figure 1 below.
Figure 1 - The
Search Interface Page
Listing 1 below contains basic HTML for a generic search form. Notice that the action of the form is a ColdFusion document—results.cfm. If you have ever used Hypertext Preprocessor (PHP) or Active Server Pages (ASP), using an action to point to a Web page (and not a CGI script) may seem familiar.
Listing 1 - Basic HTML for a
Generic Search Form
Please search our newsletter
archives:<br>
<FORM action="results.cfm"
method=post>
<INPUT type=text
name="keyword" size="42"><br>
<INPUT type=submit value="Get
Results">
<INPUT type=reset
value="Reset">
</FORM>
ColdFusion is slightly different from PHP and ASP in the sense that the pages do not contain much scripting. Instead, they are sprinkled with ColdFusion tags, as in Listing 2 below.
<!-- Perform the search -->
<CFSEARCH name="Archive"
collection="OurNewsletters" criteria="#form.keyword#">
<!-- Output the results -->
Search Results:<BR>
<TABLE border="0">
<TR>
<TD
align=center width="25%">Relevancy</TD>
<TD>Rank</TD>
<TD
align=left> <B>Title</B> </TD>
</TR>
<CFOUTPUT QUERY="Archive">
<TR>
<TD
align="center">#Archive.score#</TD>
<TD> <A
href="#Archive.file#">#Archive.TITLE#</A> </TD>
</TR>
</CFOUTPUT>
</TABLE>
The first thing you'll notice in Listing 1 and Listing 2 is the simplicity of the code. Listing 1 sets up a standard HTML form using the INPUT text box keyword. This text field holds the user's search criteria. Submitting this form sends the criteria to the results.cfm page (Listing 2) where most of the real work occurs. This page starts the Verity search, triggered by the CFSEARCH tag. The command uses Archive as the name of the search. The collection field is set to OurNewsletters, which is the name of the collection we want to search. The value of the criteria field uses special ColdFusion markup to insert the user's keywords from the original search form.
Once the search is complete, the results page shows matches using the CFOUTPUT command. ColdFusion treats a Verity search (CFSEARCH) just like any other query. Thus, you can use the CFOUTPUT tag to loop through the result set. Each row is a separate title. For example, searching with the word "e-commerce" would yield the results shown in Figure 2 below.
Clicking on any of the titles will bring up the associated document. The ColdFusion Administrator does all the hard work and shields you from the search's complexity.
3.2. Advanced Search
Capability
The search engine outlined in the above returns
basic information based on a lexical search. In other words, Verity just looks
for the frequency of your keywords in a document and displays the top results.
But what if you want to provide a little more information, such as the document
date field from an HTML document? Moreover, suppose you want users to search
the contents of document META tags? Initially, this problem does not seem too
difficult. Further examination reveals that the CFSEARCH tag provides no
built-in feature for searches on specific portions of the HTML file (also known
as fielded searches). But there is a way to get around this.
Step 2: Assuming you want your site visitors to be able to search your newsletters either lexically, or on keywords supplied in the META tags of the documents (you may need to lookup the exact syntax of META tags), you will need to first go through all your documents, add META tags and compile the keywords into some sort of table.
If you use a consistent header format—by using META
tags—it's possible to write a script to parse the filename and keywords from
every newsletter on your site. The suggested script is written in Perl, but if
you're fairly comfortable with ColdFusion (or any other language, for that
matter) you could probably use it to do this as well. You should run the script
manually on your server so that you end up with an output file (call it the
keys file) that looks like Example 1. (The Perl script used is in Listing 3.)
Example 1 - The
"keys" file
"file","title","keys"
"realities.htm","09/08/99 - The Realities of ecommerce ",
"Realities, ecommerce, amazon, sales tax"
"etrends.htm","10/08/99 - Ecommerce Trends for 2000 and beyond ",
"ecommerce, Y2K"
"opensource.htm","11/08/99 - E-Commerce Tools Go Open Source",
"Open Source, ecommerce"
"e-christmas.htm","12/08/99 - Tune Your Ecommerce Site for the
Christmas Rush",
" ecommerce, christmas, online sales"
"statetax.htm","07/08/99 - State Taxes and Ecommerce",
"Sales tax, ecommerce, California"
Listing 3 - Script to Parse
"filename" and "keywords" from META Tags
# key.pl
#
# This Perl program processes files containing HTML that
# follow a standard format for keywords and dates.
# It prints out a line for every file it receives as input.
# This output can be redirected.
# The files that it receives as input can contain wildcards.
# The first argument to the program is the directory path
# This path will be prepended to the file name.
# The output file format is CSV, with quotations.
# Double quotes are duplicated.
# Example invocation:
# perl key.pl path *.html > keywords.txt
# CODE BEGINS
# get path and remove trailing \
$path = shift;
$path =~ s/\\$//;
# Print out column headers
print
"\"file\",\"title\",\"keys\",\"date\"\n";
# command line arguments are:
# path-to-files file1 file2 file3 ...
# i.e. C:\InetPub\wwwroot\ file1.html file2.html
# (each file may itself be a wildcard)
# step through all of the command line input files
foreach $argfile (@ARGV) {
# open the file for input, ending execution if this fails
# if the argument is a wildcard, step through each matching
file
foreach $file
(<"$path/$argfile">) {
open
IN, $file || die "Couldn't open $file\n";
# set key variables to null strings. Must be done at start
of each file
$title
= "";
$keys
= "";
$date
= "";
# @lines is a list that will hold the lines of the header
@lines
= ();
# Stepping through each line of the header, cleaning it, and
joining it
# to a list of files.
# The cleaning process removes whitespace characters,
# including new lines, from the beginning and end of the
line.
while(<IN>)
{
s/^\s+//o;
s/\s+$//o;
@lines
= (@lines,$_);
if
(m/<\/head>/oi) { last; } # after reaching the end of the header,
# there is no need to continue reading in more lines from
this file
}
$line
= join(" ",@lines);
# Now pull out title, keywords, and date
if
($line =~ m/< *title *>(.*)<\/title>/oi)
{
$title
= $1;
$title
=~ s/"/""/o;
$title
=~ s/^\s*//o;
$title
=~ s/\s*$//o;
$title
= "\"$title\"";
}
if
($line =~ m/< *meta *name *= *"KEYWORDS" +content *=
*"([^"]+)"/oi)
{
$keys
= $1;
$keys
=~ s/^\s*//o;
$keys
=~ s/\s*$//o;
$keys
= "\"$keys\"";
}
if
($head_line =~ m/< *meta *name *= *"PUBDATE" +content *=
*"([^"]+)"/oi)
{
$date
= $1;
$date
=~ s/^\s*//o;
$date
=~ s/\s*$//o;
$date
= "\"$date\"";
}
# Print out the line of data, but,
# only print out the data if there is actually something to
process
if
($#lines > 0) { print "\"$file\",$title,$keys,$date\n";
}
close IN;
}
}
Step 3: Once the keys file is generated, another
collection must be created strictly for
it. Again, the index for this collection can be created using the ColdFusion
Administrator. Call your new collection KeysCollection. The primary search form
will pass the search type (full text, keyword, or both) as a form variable called
search_type. As intended, this will let the user search on the actual body of
the document, on the META tags, or on both.
Step 4: At this point you may be tempted to include both collections in the CFSEARCH command in the result page, as in Example 2 below. After all, the CFSEARCH tag can handle multiple collections. But searching on both collections will actually result in duplicates any time there is a hit in both the keywords (KeyCollection) and the body (OurNewsLetters) of the newsletter. In order to avoid this problem, you must use a couple ColdFusion functions: namely, QueryAddRow and QuerySetCell.
Example 2 - CFSEARCH Command
to Insert in the "result.cfm" page
<CFSEARCH name="Archive"
collection="OurNewsLetters,KeyCollection"
criteria="#Form.Criteria#">
Listing 4 below shows the code needed to perform the search and combine results from the two collections. Line 0 sets up the query that will contain the final result set. Line 1 checks to see if the user has chosen to search solely by META keywords. In this case, the search operates only on the KeysCollection. Remember, KeysCollection is simply an index of the keys.csv file shown in Example 1.
Listing 4 - Modified
"result.cfm" Page
0: <cfset AllGetResults = QueryNew("score,file,title")>
1: <CFIF #search_type# is "keyword" OR
#search_type# is "both">
2: <CFSEARCH
3: name =
"KeyGetResults"
4: collection =
"#KeyCollection#"
5: criteria =
"#Form.Criteria#"
6: maxRows =
"#Evaluate(Form.MaxRows + 1)#"
7: startRow =
"#Form.StartRow#">
8:
9: <!-- Add rows to new query for each row of
KeyGetResults
10: Scores for
keyword results are always reported as 100 --->
11:
12: <CFOUTPUT query="KeyGetResults"
maxRows="#Form.MaxRows#">
13: <CFSET exclude = ListAppend(exclude,
#KeyGetResults.key#)>
14: <CFSET temp = QueryAddRow(AllGetResults)>
15: <CFSET temp =
QuerySetCell(AllGetResults,"score","100")>
16: <CFSET temp =
QuerySetCell(AllGetResults,"file", # KeyGetResults.key#)>
17: <CFSET temp =
QuerySetCell(AllGetResults,"title",
# KeyGetResults.title#)>
18: </CFOUTPUT>
19: <CFSET # KeyGetResults rec_count# =
#
KeyGetResults rec_count# + KeyGetResults.RecordCount>
20: </CFIF>
21:
22: <CFIF #search_type# is "body" OR
#search_type# is "both">
23: <CFSEARCH
24: name =
"BodyGetResults"
25: collection
= "#BodyCollection#"
26: criteria =
"#Form.Criteria#"
27: maxRows =
"#Evaluate(Form.MaxRows + 1)#"
28: startRow =
"#Form.StartRow#" >
29: <!-- Add rows to new query for each row of
BodyGetResults
30: Scores for
keyword results are always reported as 100 --->
31:
32: <CFOUTPUT query="BodyGetResults"
maxRows="#Form.MaxRows#">
33: <CFIF exclude DOES NOT CONTAIN # KeyGetResults
file#>
34: <CFSET temp = QueryAddRow(AllGetResults)>
35: <CFSET temp =
QuerySetCell(AllGetResults,"score", Round(100 *
#
BodyGetResults Score#))>
36: <CFSET temp =
QuerySetCell(AllGetResults,"file", # BodyGetResults url#)>
37: <CFSET temp =
QuerySetCell(AllGetResults,"title",
#
BodyGetResults.title#)>
38: <CFSET # BodyGetResults rec_count# = # BodyGetResults
rec_count# + 1>
39: </CFIF>
40: </CFOUTPUT>
41: </CFIF>
The search executes in lines 2 through 7 of Listing
4. Now the #key# variable (predefined by ColdFusion) contains the name of the
file currently being searched. This value will be used to set up the exclude
list, which will help prune any duplicates. Line 14 adds a row to the
AllGetResults query. The cells in that row hold the score, title, and filename.
Line 19 keeps track of how many rows are present.
Lines 32 through 40 add results from the body
(OurNewsLetters) search with a few differences: Line 33 checks to see if this
file is already on the list. If so, the code excludes this result and will not
add the file again. This prevents duplicates from showing up in the final
output. Another difference: Line 35 returns the score using Verity's relevancy
ranking formula. In the keys section, all scores are set to 1. This is a
necessary trade-off when combining search results. A keyword match would likely
prove to be the best match anyway, so it deserves a top ranking.
At this point, your ColdFusion page is finished
processing and can display the results. Listing 5 provides a more advanced way
of displaying the search results.
Listing 5 – Advanced Display
of Search Results
0: <cfset line_number = Form.StartRow>
1: <cfoutput>
2: <h2>#rec_count# Matching Files for Search:<br>"#criteria#"</h2>
3: </cfoutput>
4:
5: <TABLE cellspacing=0 cellpadding=2 border=0>
6: <!--- table header --->
7: <TR bgcolor="cccccc">
8: <TD><B>Number</B></TD>
9: <TD><B>Score</B></TD>
10: <TD><B>Title</B></TD>
11: </TR>
12: <CFOUTPUT query="AllGetResults" maxRows="#Form.MaxRows#">
13: <TR bgcolor="#IIf(line_number Mod 2,
14: DE('ffffff'), DE('ffffcf'))#">
15: <!--- current row information --->
16: <TD>#line_number#</TD>
17: <cfset line_number = #line_number# + 1>
18: <!--- score --->
19: <TD>#score# </TD>
20: <TD> -
21: <A href="#file#">#Title#</a>
22: </TD>
23: </TR>
24: </CFOUTPUT>
25: </TABLE>
You can see the page in Figure 3. As usual, clicking
on any of the titles will display the full newsletter.
Figure 3 - New Search Result Page
Although the full-blown Verity search engine is not included with ColdFusion server, you can use a variety of standard and custom ColdFusion tags to build in most functionalities.
Step 5 (optional): Another feature you could easily
add to the search engine application would be the ability to sort in ascending
or descending order, by score, or by date. To do this, you would need to locate and download a freeware custom tag
that implements the sorting (e.g., CF_QUERYSORT downloadable from ). If you
create the date in the format mmddyy (for example: 011000=January 10, 2000) you
will need to plug the tag "AllGetresults" into your query. This tag
will sort the date column using the built-in numeric sort feature of the
CF_QUERYSORT tag.
4. Questions
1.
Preparation
phase:
2.
Prepare
a short report documenting your refined framework-based enterprise application
(using software engineering standards), and explaining its motivation.
3.
Prepare
a short report including functional diagrams and screenshots (as needed) to
demonstrate your understanding of the infrastructure software and search
capability. Explain the infrastructure software differences between the
application you developed in homework #1 and the one you are developing for
this homework.
4.
Develop
and deploy your framework-based enterprise application on top of ColdFusion as
needed to make use of its search capabilities. Document the benefits and
deficiencies of the approach on which ColdFusion is based, and explain (as
needed) how it limits your ability to develop the various application
components you have envisioned for your enterprise application. Note that you
do not need to provide a complete implementation of your application in this
homework. You should restrict yourself to what you feel is feasible based on
time and the level of support provided by the infrastructure software. Your
application should be tuned for efficiency as allowed by the underlying
infrastructure software, and you should document your performance engineering
approach. You should conclude your report by suggesting, and implementing (as
time allows) an improved Application Server model.
5. Explain how you would refine the “analyzer” tool you started designing in homework #1 to capture information about the new version of your application deployed on top of ColdFusion, and redeploy it as an XML-based web application such as the “spyweb” application provided as support material under demo programs on the course website. Note that the target application should again maintain a strict separation between content, style, and logic. As for homework #1, your analyzer should strive to extract and represent a generic model of your application using a suitable markup language.
6.
Extra
Credit: Implement a prototype of the analyzer tool described in question 5.
Deliverables
Grading
All assignments are graded on a maximum scale of 10 points. Your grade will be based equally on:
a.
The
overall quality of your documentation.
b.
The
understanding and appropriate use of application server related technologies.
c.
Your
ability to submit working and well-commented code.
d.
Extra
credit may be granted for solutions that are particularly creative.
Please let the TA know as
soon as possible about teaming arrangements. You will need to stay with the
same team for the duration of the course. You should only submit one
report/archive per team for each assignment. To balance things out, the final
grading in the course will take into account the fact that you are working as a
team instead of individually, so you should feel free to work individually as
well. Note that the final take home examination will require individual work.