Conducting usability study with eye-tracking to determine how users make decisions on Yelp

Eye-tracking setup and results

The problem

It’s unclear what features of a business’ Yelp page influences consumer decisions most.

Our results

Photos, average rating, and number of ratings are the most influential features.

Contents

brief

Context

Yelp is a local-search service powered by crowd-sourced reviews.

This was a project for CSC 486 Human-Computer Interaction Theory and Design.

Team

Takahiro Shimokobe
Marianne Miranda
Paula Zitnick

Role

My major contributions were in the design of the main study experiment, administration of the experiment, and analysis of main study data.

Tools

SMI RED250 eye tracking hardware & accompanying software

Deliverable

Report

Time

January – April 2019

preliminary study

Objective

To determine what features to study with eye-tracking, we ran a preliminary study to surface features that were statistically significant.

Method

Results

We found that the most important features are:

  • average rating (number of stars)

  • available full menu

  • number of reviews

  • photos

133 respondents

90% of respondents used Yelp at least once a month

process

Method

6 participants were presented with 3 sets of 2 businesses' Yelp pages and asked to choose one to patronize.

We tracked their gaze while they compared the pages.

After making a choice, we asked them which features had informed their choice.

Design considerations

Bias

Businesses chosen for comparison were far away from the testing location to minimize the likelihood that participants had existing opinions about them.

Confounding variables

Each pair of businesses had similar values for each feature, except for the feature being tested.

Experiment design

Test cases

We chose the top 3 most important features discovered in our preliminary study to test.

Each pair of businesses will have similar values for each feature, except for the feature being tested.

Case 1. Number of reviews

Option 1. 120 reviews; 4.5 stars; menu available

Option 2. 47 reviews; 4.5 stars; menu available

Case 2. Average rating (number of stars)

Option 1. 5 stars; 209 reviews; menu available

Option 2. 4 stars; 209 reviews; menu available

Case 3. Menu availability

Option 1. menu available; 4 stars; 137 reviews

Option 2. menu not available; 4 stars; 137 reviews

Hypothesis

We predict that Option 1 will be chosen since it has the more successful value within each pair.

i.e. Businesses with more reviews, higher average rating, and menu availability will be chosen.

Analysis

We compared each participant's choices with (1) the features they reported as influential and (2) their eye-tracking data. We interpreted these results for each of the 3 cases.

Faulty approach

Initially, we had hoped to simply measure which features participants looked at (to gauge each feature's influence).

Initially, we had hoped to simply measure which features participants looked at (to gauge each feature's influence). This was ultimately ineffective because no significance was found (i.e. most participants looked at most features).

Silver linings

This made our collection of participant's self-reported influences useful in identifying correlations between their eye-tracking data and choices.

Alternate aproach exploration

We also explored using the proportion of time each participant looked at a feature, relative to the rest of the page. However, this was also not a good measure due to variation in detail (e.g. it takes longer to read a review than the amount of stars).

Results

Case 1. Number of reviews

All participants looked at both review counts.

5 of 6 participants chose the predicted option.

The other participant chose the non-predicted option because they had more authentic looking food in the photos.

Case 2. Average rating (number of stars)

All participants looked at both average ratings.

4 of 6 participants chose the predicted option and reported average rating as an influential factor.

The other 2 participants chose the non-predicted option because of opening hours and quality of business interior.

Case 3. Menu availability

3 of 5 participants looked at both menu availabilities and did not choose the predicted option.

1 of 5 participants chose the predicted option, but they did not look at the menu availabilities. They cited wanting the food.

0 participants reported menu availability as an influential factor.

findings

Discoveries

We discovered that users reported looking at pictures (27%), ratings (19%), reviews (19%), and number of reviews (15%).

Users spent the most time looking at reviews (27%) and photos (22%).

There was only one case where the participant looked the feature being tested, chose the predicted option, then did not report the feature as influential. They cited quality of the business interior instead.

Iterating on the preliminary study

While photos were ranked fourth most important in the preliminary study, we found that users spend the most time looking at them.

The average rating and number of reviews features, found to be important in the preliminary study, were confirmed by the main study.

The importance of menu availability was not confirmed, possibly because of ambiguous wording. Survey respondents may have misunderstood the menu feature as photos of the menu.

Conclusions

We found that there was no correlation between duration of gaze and the importance of the feature, due to variation in detail (e.g. it takes longer to read a review than the amount of stars).

Future work might incorporate more qualitative methods of assessment during eye-tracking experiments to better understand users' thought process while making decision in the Yelp interface. ※

Top ↑