Connect Summer 1998  Statistics and Social Sciences


TextSmart 1.0
First Release of Qualitative Analysis Software from SPSS

Frank LoPresti

Research methodology may be classified as quantitative or qualitative. For the humanist or social scientist, massive text collection and its reduction hopefully lead to theory construction and confirmation. Typically, qualitative research is done in the library stacks. Field interviews, where opinions are solicited using open-ended questions such as "What do you like about pizza?" also create material for qualitative analysis.

On the other hand, data for quantitative research is generated by closed-coded questions such as "How often do you eat pizza?" in which the responses are restricted to a predetermined list of possible answers: "1) never; 2) monthly; 3) weekly; 4) daily." Quantitative research methods involve statistical analysis of survey, census, economic or other data sets.

In the Summer 1997 issue of Connect, I described NUD*IST, a qualitative analysis program from Sage Publications that ACF distributes to the University research community at an academic discount. NUD*IST organizes the qualitative researcher's voluminous textual materials. Once the textual data is stored and organized on the researcher's personal computer, NUD*IST has many functions, relying on standard SQL (Structured Query Language), which search the data collection and highlight relationships between diverse text segments.

SPSS's TextSmart is less ambitious. It has one main function, statistical analysis of responses to open-ended questions, that therefore bridges qualitative and quantitative data analysis. Text-Smart tames qualitative open-ended responses, creating data sets which can then be analyzed using quantitative methods.

Survey research relies on closed-coded questions such as "Which style of pizza do you like? 1) Deep pan; 2) Thin crust," or "Which pizza topping do you like best? 1) Only cheese; 2) Pepperoni; 3) Vegetarian; 4) Anchovy; 5) Sausage." Questions with a closed set of responses are easily coded onto quantitative statistical spreadsheets. The first question about pizza type becomes a single variable and each person in the survey is coded "1" or "2" depending on whether they answered "Deep pan" or "Thin crust," or their answer is coded as "Missing" if they don't respond at all. The second pizza question only allows responses "1" through "5," or "Missing" if they don't respond.

In the newly released Text-Smart User's Guide, Professor Norman H. Nie (University of Chicago, Political Science, Senior Scientist at Gallup and co-founder of SPSS) discusses the danger of using closed-coded questions like the examples above. For example, the response to the question on crust style would be misleading if the interviewee really wanted to answer that she didn't eat the crust because she was on a high-protein diet, but that choice wasn't offered. Unfortunately, most survey analysis is closed-coded. Therefore, it is limited to the statistical description and analysis of the researcher's list of responses, or to the relationships among the researcher-determined responses to several questions.

Professor Nie states, "There have been two fundamental rationales for asking and recording open-ended responses and one reason for their infrequent use by survey researchers. First, open-ended questions are asked because they are a source of varied and textured information about what respondents think, believe, or know about a subject area, presented in their own words. Second, open-ended questions are asked because if researchers rely exclusively on closed-ended questions, they are forced to frame not only the question but also the alternative responses. Thus, researchers end up constructing and interpreting the respondents' social reality."

The reason for the infrequent use of open-ended questions is the difficulty encountered in coding them. One problem is seen if we ask "What is your favorite pizza?" We might simply choose descriptive key words used in the answers, code single variables like "Anchovy" or "Deep-dish," and use "yes/no" as the values for those variables. If the person's answer mentions anchovy, the variable "Anchovy" has the value "yes." But isn't the answer "Romano, mozzarella and asiago cheeses" substantially different from the single responses to three categorical variables "Romano," "Mozzarella" and "Asiago"? Of course, because it is the combination of the three cheeses on one pizza that describes the three-cheese pizza.

Which takes us to TextSmart. In this program's survey file, answers to questions are entered along with each respondent's ID number. Another file of excluded terms is used to filter out words like "the," "and" or "my" from the survey file. A good excluded-terms file makes a TextSmart session run more smoothly.

After applying the excluded-terms file, the only terms remaining are words like "Anchovy," "Romano" and "Tomato." Another file, the alias file, deals with synonyms. For negative ideas, we create "not-words" so that "not-Anchovy" conveys the idea that the respondent doesn't like anchovies on their favorite pizza. Then, using techniques similar to those used in cluster analysis and using logical operators such as AND, OR and NOT, categories are automatically created. Single terms that appear frequently might become a category, such as "Garlic." Two terms which appear in many responses together might become another type of category: "Deep-dish AND Garlic." Two terms which frequently appear exclusively could become a third type of category: "Onion OR Pepper." These categories become "yes/no" (dichotomous) variables. A person's response might have the variable "Garlic" checked "yes," as well as the variable "Thin-crust OR Crispy."

The researcher manages the categories. She eliminates categories which aren't interesting to her research and creates categories which may not have occurred frequently enough to be created automatically by Text-Smart. The output is an SPSS data set. The first question may have ten categories and the second may have 15. Each category variable is a column. Each row in the data set describes one person's responses to the open-ended survey questions -- whether a respondent's answer can be said to satisfy the category's criterion. If the person responded "Thin-crust OR Crispy," that variable has the value "yes"; if not, the "Thin-crust OR Crispy" variable is coded "no."

TextSmart's function -- to create spreadsheet data sets for statistical analysis from textual responses -- is easily grasped by students and researchers. Its operation is relatively simple, and the manual is less than 100 pages long.

There are many other more powerful textual analysis tools available. (A detailed list may be found in The Resources Guide, Oxford University Computing Services, co-edited by ACF's Assistant Director Lorna Hughes.) But those quantitative researchers who only want to analyze interviews and other open-ended survey text will be overjoyed with SPSS's introduction of this new tool.[ C ]


frank.lopresti@nyu.edu

Posted May 18,1998