The analysis of the results reveals some gaffes that we commited, and some interesting facts about how users approach the system that has bearing on a redesign of the system.
Note: Most of the bar charts on this page are averaged from responses on a six-point Likert scale. We have normalized these plots so that the neutral response is at zero. Thus, if respondants disagreed with a statement on average, the bar would appear below the the x-axis.
Overall Ratings
Overall, the system was found to be both usable and useful.
Handheld
We found that the handheld had No Learnability Whatsoever . However, once learned, users thought the handheld was very easy to use, and most liked using it, or thought they would in a real speed-dating scenario. Overall, they strongly agreed that usage of the handheld system was enjoyable.
Prompts & Categories
The average responses to our first three handheld prompts ("Looks?", "Smarts?", and "Intrigue?") were approximately the same. However, only half as many participants responded affirmatively to the "Chemistry?" prompt. This may be reflective of the experimental design, in which our participants were rating fictional characters absent any real dating experience. It may also be the vagueness of the description - some participants did comment that they would like to see a greater variety of characteristics to rate. Other observations:
Users confused responses to 4 questions with a rating scale, and thought +/- buttons were for increasing/decreasing compatibility
Users did not like the fact that the rating schema was binary ('compatible' & 'not compatible)
Users were not able to navigate through the prompts, and got confused by the cycling method of navigation that is currently extant
At least two users wanted the '+' and '-' buttons swapped in position
Most users looked at the coasters when using the handheld, but not always
Voice Recording
There was a marked gender difference in the usage of voice memos. Every male participant recorded multiple voice memos, while no female participants did so. In reviewing survey responses, this gender difference in attitude towards voice recording also became apparent (one female participant did not answer the voice recording questions on the survey because, she notes, she didn't use the function). This may be an attribute effect, in that female-gendered attitudes towards relationships may inhibit voicing evaluations in a semi-public context. It may also be a selection effect, due to our small experimental population. Finally, it may be a researcher effect, in that two male investigators were present and within close proximity to the participant when voice memos were being recorded. Our presence may have made female participants uncomfortable making compatibility statements about even fictional characters, while male participants may have been encouraged by a male atmosphere. Some other lessons learned were:
One user associated the voice memos with the categories
One user committed a slip while recording the memos, pressing the +/- buttons instead of the recording button. This indicates that more feedback would be required for the voice recording button
Environmental Elements
These features were found to be useful while being marginally distracting, as well as being somewhat enjoyable. The chart below illustrates this.
Conversation Lights
There was a mixed response to this feature. Overall, users thought it was not distracting, but were divided as to its utility.
Some users disregarded the conversation lights completely
Some users experimented with the lights to figure out how they worked
Our analysis of the data from the conversation lights was inconclusive. The plot at left shows the graphs generated from participants' usage of the conversation lights during the experiment. While there were both strongly positive and strongly negative responses from our participants regarding the lights, a correlation of conversation patterns with their reactions fails to show any particular systematic covariance. However, we did discover that our timing of the lights was inappropriate for a conversation pattern that approximates that of a speed date (limited duration, open-ended, and social in nature). While the timing was fine-grained enough, it resulted in large plots that could not be reasonably shrunken for a computer display without losing critical details. We would need to do further experimenting to find the right balance between maintaining an appropriate granularity of data and optimizing for display.
Lava lamp
Users felt that the transitions not smooth enough
Some users were confused about what the end state is
Most thought that it was a good way to keep track of the time
Coasters
The coasters were a hit! People liked the associativity between the coasters and the dates. They also had other observations:
Users in general thought they'd be useful for sorting & ranking date prior to using the website
Some users thought that the dots on the coasters should be on the front, for sorting purposes. We agree, for the sake of task effectiveness
Some users wanted more personalization of the coasters
Website
Layout, navigation etc
Users wanted more assistance in comparing the dates, for instance, by viewing all the graphs/photos at the same time, or by viewing only those were rated high enough. Otherwise, they found the website layout & organization of information satisfactory, and felt confident making decisions based on the information presented.
Conversation graphs
Many users intuited that the conversation graphs 'meeting in the middle' meant that the date was a good experience, which is what we intended. They also inferred from the graph of the one of the fictional dates that that person dominated the conversation, which matched the narrative we'd prepared for that date
However, some thought that the graphs meeting in the middle meant that the dates were talking over each other
Some users associated the graph with the photo-morphs, or with the categories
Photo morphs
Not successful at all for the following reasons:
Users thought that they contained redundant information
Users also found that the morphs could be easily misinterpreted
The photo-morphs were often overlooked, but users thought that having the original photos on the coasters as a reference might be useful
Environment vs. Website
This figure compares our participants' ratings of the conversation plots on the website with their ratings of the real-time conversation feedback in the environment. The yellow bars represent their evaluation of the lights' utility, while the yellow lines represent their evaluation of the website plots' utility. The green bars represent their enjoyment / engagement with the lights in the environment, while the green lines represent the same for the website plots.
Note that the distance between the green and yellow bars tends, on average, to match the disparity between the green and yellow lines. This means that the difference between our users' ratings of the lights in the environment and the website remains constant, on average, across both categories of utility and engagement. This suggests one of several possible explanations. First, it is possible that when encountering the lights, we fail to help users make a connection between the real-time feedback and the visualizations that are generated from it. Second, it's possible that real-time feedback is simply less useful than the conversational record, and that we should try omitting real-time feedback to see if it makes an appreciable difference in the usability of the website plots. Third, there may exist an ordering effect in the experiment, where users' impression of the plots on the website is colored by the "Aha!" moment they have when they recognize the connection to the conversation lights. Fourth, there may be a "Halo Effect" in the survey, where participant response to an engagement question is colored by their response to the corresponding utility question.