table of contents        previous chapter        next chapter

 

 

XII. Assessment of Projects by User Evaluation

 

Introduction

There are two main reasons for carrying out evaluations: to ensure that digitization initiatives produce materials that meet user needs, and to assess whether the original objectives of the project were achieved. Throughout this Guide we have stressed the substantial and long-term financial investment that digitization requires, and have drawn attention to the demands it places on limited organizational resources. It makes good sense, wherever possible, to assess the results methodically and make the benefits of this investment evident.

Evaluation provides a mechanism to measure objectively what has been achieved through digitization. Funding bodies increasingly require that digitization projects deliver an evaluation and impact assessment report as part of their final report. In addition to helping projects to be more accountable to the user community, evaluation provides an opportunity for projects to discover who is using the digital collections and how. We make many assumptions about our users and their needs, often basing our expectations of digital usage patterns on existing usage of analog resources. However, at best, these can only be educated guesses, since access and usage can change completely in the networked environment. For instance, use of postcard collections has always been limited, but when made available in digital form their use rises dramatically; ease of access is the key. Only by carrying out evaluation with our users can we find out how the digital resources we create are actually being used. Despite the spreading use of digitization in the cultural sector, we know surprisingly little about its effect on the various user groups we aim to serve and the way digital collections influence their teaching, research, life-long learning, personal development, or entertainment. Research and evaluation will help indicate how we can enhance the functionality and usability of the digital resources we are creating.

At a more practical level, giving the target user group the chance to evaluate and test the delivery system, and in particular the interface, at an early prototype stage should enable the development program to take advantage of feedback and suggestions for improvements. This approach makes most sense when projects and programs have built in the resources needed for iterative prototyping, so it is a good idea to ensure that you have staff and time to implement revisions if you get early feedback from potential users. Commercial and cultural institutions report that this approach allows them to be more responsive to a broader spectrum of users with a variety of technical experience and familiarity with the subject. User evaluation and research can provide answers to these (and other) questions:

The questions you ask will depend on the aims of the program and the particular focus of your interests. For example, if you are interested in educational uses of the digital surrogates of a museum’s collections, the main issues to explore might be the kinds of understanding and learning the digital resources can enable, and how you can ensure that visitors who use them also engage with the original objects on display. If the digital resources are to be made accessible in a museum gallery, then evaluation activity might examine how computer interactives affect the visitors’ experience, and how they affect the social context of group visits and interaction between participants in the group. In an academic library setting, evaluation might assess whether digitizing a given collection is likely to address local curricular needs, and what kinds of improved pedagogical outcomes might result. Evaluation can also help discover how effectively digital surrogates represent the original. Do visitors/users prefer to see the actual object or do the digital surrogates have the same impact? Questioning the user base can reveal this and any evaluation activity should approach this issue.

In making these assessments, remember also that many resources will be used by very different user groups: users from a variety of age groups, educational levels, backgrounds, and interests. Their needs and expectations will also differ considerably. An essential starting point before beginning any evaluation is thus to categorize the user community to make the assessment more informative. For example it might be useful to group users by:

You can further refine this categorization by considering the potential user communities with reference to the project’s defined context and priorities. For instance, it may be important to ascertain whether specialists actually use all the extra search facilities they say they want, or whether they use digitized resources more or less as the general public do. You may discover that different communities actually use the resources in very similar ways. On the other hand, certain users might have particular requirements, for example in relation to image quality or searching facilities.

 

Types of evaluation

Evaluation is classified into three types according to the point during the lifecycle of the project at which it is carried out: front-end, formative, and summative.

Front-end analysis is carried out before a program or application is developed. This type of evaluation can gauge potential users’ reactions to the subject matter; assist in the selection of content areas and themes; provide feedback about the type of service and functionality that would be suitable; and enable the project to make a general assessment of the attitudes that stakeholders and users have towards the proposed development. Involving both groups early on helps ensure that the final product or service conforms more closely to their expectations and needs. User needs assessment often plays an important part in front-end analysis. This technique investigates the requirements of target users—(search tools, image or sound quality, depth of metadata—using methods such as focus group discussions or interviews, described below). The front-end evaluation process should also examine whether digitization is the best way to achieve the aims of the project or whether simpler and more affordable solutions might be more appropriate.

Formative evaluation takes place during the development phase and its results help refine and improve the design and delivery of the resource. It can be used to test the appropriateness and intuitiveness of the user interface and pinpoint problematic areas and programming errors. This is a vital step in the design and development of all digital resources. Even if the final product is not perfect, it will be better than if no user testing were carried out at all. It is never too early to test and involve future users in the design process. Even mockup screen views drawn on paper, sketchy web pages, or crude prototypes can provide valuable feedback and suggest changes before too much time and effort have been expended. Where a large user sample is too difficult to administer, even a brief survey with a small sample number of users will, in most cases, offer useful information at this stage. Formative evaluation also provides an opportunity to assess users’ perceptions of the project’s content. Different user groups might provide crucial feedback about the kinds and quantity of contextual information, metadata and tools for using the collection that they would find useful. The project will quite possibly discover at this stage that different groups of users have different expectations and may find that there is a need to narrow the focus of the project rather than attempt to provide the definitive resource to all potential users. Since in many cases digitization is a continuing effort for cultural institutions, formative evaluation should also be an ongoing activity, integrated into the digitization chain and implemented as part of the working processes.

Summative evaluation measures the effect and impact of the completed program or a distinctive stage of its development. It is good practice to define particular stages at which to conduct summative evaluation—for example, after a particular phase of activity or once a particular deliverable has been completed—especially where digitization is an ongoing activity. Summative evaluation is often more thorough and yields more determinate results, involving real users as opposed to targeted potential ones. Since it takes place after all the selected materials have been digitized and associated interface and tools designed, it offers a more accurate picture of how these are perceived and used than that provided by formative evaluation. When the whole range of resources is made available, interesting observations can be made, for example, about the relationships between them, the most and least popular materials, and the associations users make between different materials. It may also be possible to find out why users find some materials of greater interest than others.

Often the summative evaluation is the first time that evaluators can measure in depth the effectiveness of interpretative exhibits and gallery kiosks, in relation to the surrounding space, and in the context of the exhibition itself. This is also the opportunity to explore the dynamics between real and surrogate objects, visitors, and computer interactives. The approach and tools chosen for summative evaluation will naturally depend on the aims of the survey and the reasons for carrying it out.

 

Evaluation methods and tools

Measuring and recording the effect of digital resources in cultural and educational settings, as well as the personal meanings that people derive from them, is a complex and difficult undertaking. Gay and Rieger (1999) have argued convincingly that “[m]edia- and technology-rich environments, such as cultural Web sites, demand equally rich data collection and analysis tools that are capable of examining human-computer interactions.” There is no single golden method for evaluating digital programs and measuring their effectiveness and impact. Experience has shown that it is better to combine several methods (known as ‘triangulation’) in order to verify and combine data, relating quantitative with qualitative results. The dichotomy between the quantitative and qualitative approaches is usually artificial, since they work best in complementary ways, illuminating different aspects of a complex phenomenon.

Several useful pointers have emerged from experimental work in evaluation:

Some of the most commonly used methods in traditional evaluation work can be applied to digital programs, and these are briefly described below.

 

Computer logging of user interaction

Automated logging of user interaction with a digital resource provides a reliable way of recording users’ choices and the path they selected through the website or program. There are numerous programs for recording web usage statistics, several of which are freeware or shareware.[1] Most of these offer possibilities for graphic displays, diagrams and categorization of results according to various parameters (e.g., requests by day, month, year). They usually record the number of requests made, the folders and files requested, and a list of countries or types of sectors from which the users come, often based on an analysis of the IP address of their computer. By measuring the length of time between selected links, researchers can estimate approximately the amount of time spent on individual pages. Although web statistics are notoriously unreliable and difficult to interpret (for example, estimating the number of individual users from the number of requests), they can still be very useful in a comparative way, even if the absolute figures should be treated with caution. In general, web statistics programs offer insufficient information to build a profile of the users beyond an estimation of where their computer is based. More detailed and customized information can be recorded, using specialized tools developed in JavaScript, Java, Visual Basic, or C. Research into scripting and computer logging is being carried out at Virginia Tech (http://www.cs.vt.edu/research).

Once the scripting has been set up, computer interaction logging is generally an easy and objective way of obtaining a large set of data, which can be analyzed statistically. One problem is that when this method is used for programs in public access areas, it is sometimes difficult to differentiate between the interactions of different users. Another is that although it reveals the most popular user choices, it does not explain why they were chosen. The results are not very meaningful on their own, but can be useful when combined with interviews, focus group discussions and observation.

Sites that require user registration offer a richer range of possibilities for evaluation and contact with the users, although here evaluators need to take into account such issues as privacy and data protection (see Evaluation and Privacy Issues below). Registration requires users to register and then log in to the site using the same user name and password on each occasion. This provides more accurate information and can help develop trend data: for instance, to what degree use by a particular category of patrons may be rising or declining over time, whether particular types of material tend to be accessed by specific user groups, or whether certain user groups use particular types of material or explore the resource in particular ways. Although registration may deter some users, it helps if the registration page is carefully designed, with clear information about the purpose of registration and assurances as to how the data collected will be handled.

Techniques similar to web usage logging — and often more accurate — can also be used with stand-alone multimedia programs or CD-ROMs.

 

Electronic questionnaires

Although the results generally pose problems for valid statistical analysis, as the sample is self-selected, electronic questionnaires provide an easy way to obtain feedback from end users. They work by encouraging users to answer questions about the resource and their use of it, often by typing or clicking on multiple choice answers. Common practice is to require the user to make an active choice to complete the questionnaire, but more recently institutions, eager to better understand their users, have implemented short evaluation questionnaires that appear automatically a few minutes after the user has entered the site. Although more intrusive, this usually generates a high number of responses. The results of the questionnaires can be automatically loaded into server-side databases for subsequent analysis.

Questionnaires can also be sent by email or mailed to users whose email and postal addresses are known or who have provided this information. This allows for more flexibility and customization compared with the standard electronic questionnaire approach. Attractive and clearly laid out printed questionnaires placed next to the computer terminals can encourage local users to leave their impressions and comments about the program. Providing enough pens and visible boxes or assigned locations for returning the questionnaire can help increase the number of responses. Again, this method is not statistically valid, as it records answers from a self-selected and not necessarily representative sample. It might be a useful option for recording information about local users, as long as it is not the only method of evaluation.

 

Observation and tracking

Observing how people use digital collections in the classroom, gallery exhibition, reading room or public space can be very illuminating. It provides an opportunity to collect information about the physical, social, and intellectual contexts that affect the use of the digital resources, indicating, for example, the relationship with real objects in the gallery, the interaction between groups of users, or the ways students make connections between primary sources and the school’s curriculum. Observations can be recorded on data collection sheets, with circulation paths or with checklists of specific behavior categories, together with personal notes. Video recording and web cameras offer alternatives and can produce a wealth of data, although these often take longer to analyze, thus raising the cost. Observation raises privacy issues, which we examine below.

 

Interviewing and focus group discussions

Interviews and discussions with a small number of targeted users provide an effective method of evaluation and offer the opportunity for both structured and open-ended data collection. Discussions with focus groups are often held during front-end analysis, as well as at other stages of evaluation. These are participatory sessions with small groups of people, from the real audience or a particular targeted subset, who are encouraged to express their opinions about the digitized resources and how they are using or would like to use them. Interviews and focus group discussions can be open-ended, with the interviewer or discussion moderator interacting freely with users, or follow a pre-defined set of questions. Open-ended interviewing can be particularly useful for front-end analysis, to test how and what the targeted audience thinks about a topic before beginning program development. If an application is intended for specific groups (e.g., a researcher’s resource or a schoolchildren’s outreach program), discussions with focus groups can be very useful during the planning and development stages. These preliminary discussions often help outline a list of questions for more formal interviewing. Interviews usually provide many useful and meaningful data but are time-consuming to administer, demanding on the interviewer, and difficult to analyze and categorize. Projects should consider using ethnographic software for annotating transcripts of interviews with users. This software permits the researcher to aggregate comments with a given annotation label.

When testing a prototype, interviewing and observation can take two forms (often described as ‘cued’ and ‘uncued’ testing). Cued testing involves explaining to users what the program is about and asking them to perform specific tasks or to answer questions. Another possibility is engaging users in conversation and encouraging them to ‘think aloud’ as they go through the program, while recording their responses. With uncued testing, users are observed unobtrusively as they use the program and are then asked questions about their experience.

 

Checklist Box:

Checklist of criteria

Identifying appropriate criteria is vital for every evaluation. These depend on the scope and purpose of the study, the aims of the digitization program, and the time and funds available. This checklist outlines some of the basic aspects of digital resources that can be assessed. Each project must develop criteria that provide appropriate measures reflecting the goals of their own program and its users.

User interface/delivery

Structure/navigation

Programming

Content

Overall impressions

 

Link Box:

There are a number of resources providing guidance in the planning and design of evaluation strategies. Some of these are listed here:

 

Evaluation and privacy issues

Projects should be aware of the IRB (Institutional Review Board) protocol, designed to safeguard rights and welfare of human research subjects. The main points to consider are:

Full guides can be found at http://ohsr.od.nih.gov/info/einfo_5.php3.

Some of the evaluation methods proposed may be seen as an invasion of privacy. In the case of observation, however, if you inform all the users and seek their permission in advance, experiments show that their behavior is affected and the results skewed. The evaluation team will have to address this issue and decide on the most appropriate way of notifying users or addressing the problem in general. Having policies in place to ensure that data collected are used sensitively and anonymously can also help to address some of the ethical concerns. In addition, evaluators need to be aware of the privacy protections afforded by federal and state laws.

Evaluation can help us realize the full potential of the technology to create powerful and attractive applications that assist users to understand digital resources in meaningful and relevant ways.

 

Who should conduct evaluations?

A final question is who should carry out the evaluation work itself. Evaluation work requires time, resources, and skilled staff. It can either be conducted in-house or contracted to external professional survey or marketing companies, as is often the case with quantitative work, or to evaluation experts. Each option has different implications, advantages and disadvantages.

Carrying out the evaluation in-house can help to reduce costs, ensure that the work is carried out by staff members who are familiar with the mission of the organization, control the design of the evaluation and the way it is conducted, and provide for continuity between the evaluation and the way the organization takes advantage of the results. Before beginning, it is essential to ensure that staff have adequate training and expertise in designing and carrying out evaluation work and analyzing the data that are collected. For example, quantitative evaluation might require training in sample design and statistical analysis, while training in moderating focus group discussions might be useful for qualitative work, and system design skills may be essential for developing computer-based evaluation systems. Libraries, archives and museums may collaborate with specialists in information studies departments or with assessment professionals at area universities or colleges who can provide them with access to the necessary skills. One difficulty is that evaluation work is very time-consuming, and using in-house staff to carry it out will inevitably divert them from other activities. If you lack the skills in-house then it will prove more cost-effective to outsource the activity. For large-scale studies, that may entail drafting an RFP (Request for Proposals). The RFP should scope the problem that you hope to address through evaluation and ask the respondents to outline the methods they would use to conduct the evaluation work and how they would propose presenting the results.

 

Conclusion

Evaluation, like project documentation, is often viewed as ancillary to the main project goals, a separate task which takes a back seat to the work of creating the digital resource itself and disseminating it to users. However, as this section has suggested, evaluation not only offers tangible benefits in improved design and better responsiveness to user needs, but also may help avoid disastrous missteps in the planning stages, while change is still possible. Evaluation also enables you to document your project’s effectiveness and make a persuasive case for new or continued funding. Careful planning will ensure that evaluation does not get in the way of efficient project development, or consume a disproportionate share of project resources. Conducted in this way, evaluation is an extremely valuable component of digitization work.

 


[1]     For example, see http://www.ics.uci.edu/pub/websoft/wwwstat/, http://www.extreme-dm.com/tracking/, or http://www.analog.cx/

 

 

  table of contents        previous chapter        next chapter




valid xhtml 1.1
abp~03/03