Connect Summer 1997:  Statistics & Social Sciences


Exploring NUD-IST for Qualitative Analysis

Frank LoPresti

[Ed: Links to web pages which have become inactive since the publication of this article have been enclosed in curly brackets { }. Replacement links have been provided where possible.]

Developing theories from a wide collection of interviews, notes, research, and inspiration is equal parts art and science. Managing such qualitative, non-numeric data can be made easier with the assistance of qualitative data analysis (QDA) software. If you have boxloads of note-covered index cards, or piles of articles with densely annotated margins; if your field notes are stored in word-processor documents organized in directories within directories, their inter-relationships slowly fading from your memory; if your reading material is stored on the shelf but your reading notes are kept on your computer, you should consider using some sort of QDA software. QDA software, such as QSR NUD-IST, indexes and annotates both on-line and off-line documents, helping you evolve and confirm theories based on widely varied and unstructured sources.

This article briefly introduces QDA methodologies and some of the software packages that have proved useful to social scientists and humanists in the past. It then offers a description and review of QSR NUD-IST (Non-numerical Unstructured Data Indexing Searching and Theorizing), a QDA software package that is gaining popularity among qualitative researchers.

Methodology in Qualitative Data Analysis

Qualitative studies involve the storage and organization of textual and graphical data, with the goal of developing and testing theories based on that data. As an example, a researcher may collect texts and interviews, developing notes on the individual parts of the research as it is accumulated. Portions of the text may then be coded with keywords which indicate relationships among the segments. By drawing related text segments together based on the keywords, theories can be developed about the underlying causes and effects, and conclusions can be drawn.

As the quantity of text increases, researchers may turn to computer programs which provide automated search and retrieval of text. Relatively simple content analyses such as word frequency counts and relative position measures may be used to characterize some of the data. Advanced analyses might call on graphics technology to display relationships among the data and to build theories and evaluate hypotheses.

Methodologies for non-numeric research are as diverse as the range of academic disciplines. For example, Huberman and Miles describe three linked subprocesses following data collection: reduction, display, and theory construction and verification.

Reduction-categorizing a body of text by keyword-is used by all researchers. In essence, you reduce data whenever you label a file folder or store similar documents in a directory. Display, the second subprocess, is illustrated by the use of keywords to focus attention on critical relationships and categories. With some types of data, such as interviews, case clustering methods aid in the display process. Theory construction in non-numeric research is supported by coding and retrieval methods. The ability to examine and review similarly coded cases can be critical to understanding causal relationships.

Software Applications in Qualitative Analysis

Qualitative researchers have found innovative ways to suit common software products to their purposes. Word processors, text search packages, relational databases, and hypertext applications such as HyperCard have all served a role in organizing textual and graphical data. From these roots special purpose programs for QDA have developed, including code-and-retrieve software. Since the functionality of the older types of software is probably familiar, comparisons with their capabilities will help clarify what a QDA package like NUD-IST can add to qualitative analysis.

Word processors Word processors have the ability to handle multiple documents on-screen, to perform pattern searches, to annotate text, and to include charts and images within documents. The linking and publish-and-subscribe features of some word processors enable the marking of content for inclusion within passages of a text, allowing marked text to be automatically updated in the document being developed. Linking features eliminate the need to cut-and-paste, or to hunt for a given section of text each time it needs updating.

While these are useful features, word processors do not perform other tasks needed by qualitative researchers. Word processors do not effectively manage lists of keywords attached to documents; nor do they manage notes which may be attached to keywords as investigations proceed. They do not automate the retrieval of sections of separate documents with related keywords or themes.

By the same token, most researchers will probably continue to rely on word processors for maintaining source documents even after adopting a QDA package. Word processors provide advanced editing features not available in the simple editors generally provided with QDA software.

Text search software Text search packages enable the study of occurrences of themes in large bodies of text. Some packages are used to study word frequency and position; others, such as Gofer and ZyINDEX, enable the use of complex Boolean searches. The UNIX command grep is considered more powerful than most text search packages. QDA software includes varying levels of search functionality.

Database and statistical software Relational databases and statistical packages like SPSS and SAS manage data in two-dimensional arrays. These arrays are conceptualized as tables in which each row is an observation and each column is a variable. Consider a database in which each row is an interview. A researcher could code a case with dichotomous variables like "guilt" and "worry," giving them a value "yes" if those feelings are noted. Some databases handle text data and have the functionality to do text searches.

Sorting and statistical procedures like cluster analysis are also of interest in qualitative studies. In SPSS, an Autorecode function can be applied to open-ended string data to collect all string responses and create a new numeric variable, allowing an interviewer to code open-ended questions with keywords. The researcher does not need to list the entire range of answers before coding begins. For example, if an interviewee has been asked to report a favorite activity, the researcher enters the response as a word or phrase and Autorecode creates a new variable for each activity as it is entered, assigning "1" for the alphabetically first answer, "2" for the second, and so on.

These packages work well with structured, discrete data such as questionnaires. If relational database structuring is adhered to, tables from several studies may be joined at key fields and useful complex analysis can be done. Where these tools fail, though, is in the study of varied, unstructured content which does not lend itself to strict question-response categorization.

HyperCard A hypertext application, HyperCard organizes information in what can be conceptualized as stacks of cards. The researcher can link cards in any fashion. A stack might be linked at common words, so that clicking on a keyword would bring you to the next card containing that keyword. HyperCard is good for storing and searching text and for attaching memos. Features added to HyperCard software since its first appearance include the ability to generate reports of the results of keyword searches.

Qualitative Data Analysis Software

QDA software emerged from the functionalities that qualitative researchers found useful in the assortment of tools described above. Techniques for storing, processing, and retrieving knowledge have led to the development of several types of specialized QDA software which incorporate methods for developing and testing theories. Among these are code-and-retrieve programs such as Ethnograph, which replicate the manual code-and-retrieve process. Specified text segments are coded into a database, from which relevant text can later be retrieved based on the coding information. Studying these codes enables the researcher to relate concepts to text. Ethnograph also handles memos, text searches, sub-setting, and the generation of statistics about text searches.

QSR NUD-IST

NUD-IST is based on code-and-retrieve techniques, but incorporates features of index-based software. At its most elementary level, NUD-IST keeps on-line and off-line text organized and portable. Work is done within a "project" that you create and name. As on-line text is introduced to a project, NUD-IST copies the text to the project directory. For this review, I created a project called "wine." Every document I introduced about wine is stored by NUD-IST in the wine directory. I included text from documents taken from CDs, from my old computer files, and from the World-Wide Web. NUD-IST provides a facility for keeping track of off-line documents, so I was able to include pointers to parts of my library. If I copy the wine directory to another disk, all the documents come along automatically in data subdirectories, as does any other work done within NUD-IST on my wine project. This organizing feature was enough to sell me on using the package for my own work.

When documents are introduced to the project, the researcher decides on the text "units." A unit can be a word, a line, a paragraph, or an entire document. The unit that you select will affect how NUD-IST deals with your data -- for example, the measurement of position and the study of distances between pairs of words within the text. NUD-IST also uses units in indexing and searching. By default, the program assumes that units are marked off by hard returns -- so, as most documents are formatted, the software will take paragraphs to be the units. This is easily altered with a word processor when preparing the text. For example, to make lines the unit, you can save it as DOS Text with hard returns on every line. Inserting a hard return after every word will make words the units.

Indexing with NUD-IST Project information is indexed by manually entering codes, or by performing a text search and then using the autoindex function. An index entry, or node, can refer to an entire document or to a part of it. You might attach a full interview to the index node "female" if the interviewee was female; or you could attach some smaller unit-say, a response to the question about her job-to node "job_type." A range of text units including her responses about her job could be attached to another node, "bad_job" or "good_job." The same document or document part may be indexed by multiple nodes.

Taken together, the index nodes of a project form a tree. Notes can be attached to the nodes themselves and separately to documents, allowing you to record your thoughts on both concepts and content. NUD-IST automatically adds log entries to node memos as you manipulate the tree structure.

Text searches include standard features, such as the use of wild cards, integers, and Boolean operators in search strings. You can limit the range of documents searched, or search the entire project. Search results are stored in a "node clipboard" which can be saved or used to create new nodes and branches.

For index searches, NUD-IST offers eighteen separate operators for specifying relationships between indices -- including co-occurrence and proximity, as well as all the standard Boolean operators. For example, the overlap operator finds text coded with at least two occurrences from a list of codes that you provide, and the near operator allows you to further refine the search by specifying an acceptable distance between two co-occurring codes. Thus, "NEAR 5 `red wine' `food'" would gather nodes that index text with the two strings "red wine" and "food," and that are within five text units of each other.

Exploring structure with index-based techniques With the command make-node-report you can display a report on a node and its subtree. Command parameters allow you to control output details like node names, attached documents, and their text, memos, and headers. The display-tree command also opens a window on the index system, centered on the node of your choice.

These tree windows provide point-and-click functionality. By clicking in a window, you can manipulate a tree without having to resort to menu commands or having to write a command file. NUD-IST also supports a command language that parallels the point-and-click actions. Results of tree manipulations appear in separate windows which can be saved and edited into command files. Unfortunately, full syntax rules are not included in the minimal on-line help provided in the package-so the manual is a must.

The nodes form a dual layer to the text collection. That is, the nodes may be thought of as text and concepts, and may be investigated in a parallel fashion to the textual data. In this sense, the index system in NUD-IST can be used to extend the code-and-retrieve technique. The user explores the text, the nodes, and the relationships provided by the coding. Such meta-study techniques can then be used in theory development. [ C ]

(Author's Note: NUD-IST Version 3 for Microsoft Windows was reviewed for this article; Version 4 is soon to be released. NUD-IST is distributed by Qualitative Solutions and Research Pty Ltd., at La Trobe University, Victoria, Australia {nudist@qsr.com.au} and in the U.S. by SCOLARI at Sage Publications (nudist@sagepub.com). The cost to academics is $333.

            In writing this article I relied heavily on the User's Guide for QSR NUD-IST, published by QSR, and on Computer Programs for Qualitative Data Analysis, by Eben A. Weitzman & Matthew B. Miles (Sage Publication Software Sourcebook, California: 1995). Additionally, I referred to the Handbook of Qualitative Research, edited by Norman Denzin & Yvonna Lincoln (Sage Publications, California: 1994), in particular the chapters "Data Management"(Huberman and Miles) and "Computers in Qualitative Research" (Richards).)


Frank LoPresti heads the ACF Social Sciences Group
frank.lopresti@nyu.edu

Posted 28 April 1997