Tag Archives: research

Do People Seek Information Like Animals Forage for Food? An Introduction to Information Foraging Theory

Much of my research at Oregon State University examines debugging using a lens called Information Foraging Theory. I’ve written a few posts on this topic but I haven’t really given a good overview of what Information Foraging Theory is and what it provides for software engineering.

The theory, in a nutshell, is a theory of human behaviour that describes how people forage for information. They are theorized to forage in a way to provide maximum benefit for minimum value and to make decisions based on input from the environment that affects this cost/benefit ratio. This theory is applicable to software engineering because software engineering is a very information-seeking intensive activity. People spend a lot of time looking for things—whether it’s “What does this variable do?” down to, “Where can I start investigating this problem?”

Another reason why this theory is valuable in software engineering is because software engineering research often is built on ideas but not necessarily on underlying theories. Information foraging theory provides a theoretical framework that can help consolidate previous results and provide not only an explanation for why previous tools and findings have worked in the past, but also can make predictions for how people may behave in the future.

Now that we have an idea of what it is and why it’s relevant to software engineering, let’s dive into what information foraging is. Much of this post is adapted from material that appears in An Information Foraging Theory Perspective on Tools for Debugging, Refactoring, and Reuse Tasks that appears in the ACM Transactions on Software Engineering and Methodology (TOSEM), 2013. In another post, we’ll talk about how it relates to software engineering research.

Information Foraging Theory: What it is

Information Foraging Theory was originally proposed by Peter Pirolli and Stuart Card at what was then Xerox PARC to explain how individuals search the web for information. The idea was inspired by ecology’s Optimal Foraging Theory which is the idea that foraging animals attempt to maximize their energy intake (by finding food) over the time required to find that food.

Constructs and Theory

In Information Foraging Theory, the human, called a predator, is looking for information in an environment, like the web. A predator can seek information from an information source, called an information patch, and a topology is made up of many patches. Many patches make up an information topology. Patches are connected to each other through links—each link requires a certain cost to go from one patch to another. Within each patch, there are information features. These features might be words or sentences on a screen, graphics and pictures, icons, even colours and shapes.

A rounded, shaded rectangle contains hexagons with numbers inside them. Some of these hexagons are associated with outgoing links to other shaded rectangles that each have their own hexagons with numbers in them. The links have a number on top of them representing the cost of traversing the link.

Information patches (shaded boxes) in an information topology. In each information patch, there are features (hexagons) with a numerical value. Some of these features are attached to links (dashed line). Each link navigates to a different patch and has a cost.

The predator has an information goal in mind and want to seek information that satisfies that goal. This predator forages through the information topology seeking prey, which are information features that are related to the predator’s goal.

The activity of getting at information has a cost (usually time) but consuming information from a source also has an associated value (how relevant or important the information is). After consuming some amount of information (which is called prey), the predator may decide that it’s no longer worth the predator’s time to continue processing that patch and the predator navigates away from the patch to a new one that is considered more valuable.

Some information features are connected to links. In web pages, links are usually located in particular places, are coloured differently, and are sometimes underlined when you mouse over them. These features are called cues. A predator can use these cues to try to predict the value of the information on the other side of a link.

Three-panel representation of a developer looking at a screen of information. In the first panel, the developer is staring at a panel at the top of the screen. In the second panel, the developer is choosing to move to a new part of the same screen. In the third panel, the developer has chosen an alternate route of changing the view to look at an entirely new screen.

A developer decides whether to continue foraging in the same screen of information, or whether to refresh a view (which has a cost) and getting new information.

So, a developer who is foraging for information has to make a decision whether to stay within the current patch and continue processing the information in it or to access a different patch and process information from there. To make the optimal decision, the developer wants highest value information for the lowest cost!

If we decide to use math to represent this relationship, it looks like this:

A mathematical formula: Predator's desired choice equals max(V over C).

The predator wants to maximize value V of processing information and minimize the cost C of travelling to find information.

This is pretty basic so far—everyone wants to maximize their value and get the lowest cost! What is really interesting about this theory is what people’s perceptions of high value and low cost are.

Perceptions and Scents

Even though a predator wants to maximize value and get low cost, one of the main issues is that predators don’t know everything. They only know what they can see currently. Thus, predators perceive an expected value and an expected cost whenever they are processing information features from a patch, including the cues that indicate if a patch is worth leaving.

Since most patches have multiple cues, this means that the predator has to make a number of estimations, based on the cue (and possibly other factors) about whether to leave the patch. This is called information scent. Scent is often represented in practice by measures of textual similarity. Scent is also influenced by the amount of attention—for example, how big the cue’s visual size is, or the position of the cue.

Summary of Information Foraging Theory Constructs

That’s a lot of constructs. Fortunately, Fleming et al. (in an article that I helped write) built a pretty handy table to remind everyone what all of these concepts are.

Construct Description
Topology Collection of information patches and links between those patches within a particular information environment
Information patch Region in the topology that contain information features
Links L Traversable arcs between patches
Information features Elements of the environment that the predator can process to gain knowledge
Cues Set of information features associated with a particular link
Predator Person in search of information
Information goal Set of information features that the predator wants to find
Prey An individual feature in the goal set
Information scent Given a link with an associated cue, the predator’s estimation of the probability that traversing the link will lead to prey
Attention Amount of attention that a predator pays to a particular cue
Information value V Benefit of processed information to the predator
Interaction cost C Value that the predator anticipates gaining through a particular course of action (e.g., following a particular link)
Expected value E(V) Value that the predator anticipates gaining through a particular course of action (e.g., following a particular link)
Expected cost E(C) Cost that the predator anticipates incurring in following of a particular course of action

IFT’s Key Constructs, adapted from Fleming et al. 2013, An Information Foraging Theory Perspective on Tools for Debugging, Refactoring, and Reuse Tasks, ACM Transactions on Software Engineering and Methodology.

Predictions and Validations

There’s a lot of scientific work that has designed mathematical models of information foraging theory in the web domain. Pirolli and Card, 1999 investigated models to predict how people surf the web; this work was further augmented by incorporationg scent Chi et al. 2000, Chi et al. 2001.

Information foraging theory has also since been used to investigate collaborative search on the web, as well as social media tagging.

Next time: Information Foraging Theory in Software Engineering

Now that we have an idea of what information foraging theory is, I will present an overview next time about how this theory’s been applied in software engineering. So far, information foraging theory has been applied primarily to debugging tasks. Margaret Burnett has been leading the charge in this direction, but the concept is beginning to take hold in other areas of software engineering. Nan Niu, for instance, recently published at ICSE a requirements engineering paper on traceability using constructs from information foraging theory.

Stay tuned for the next part in this series!

Advertisements

Paper accepted to the International Computing Education Research Workshop

Our research paper was accepted to the International Computing Education Research Workshop (ICER)! ICER this year had a 33% acceptance rate. This is one of the works on Gidget and the first one about the “newer” version of Gidget I’ve been contributing to research-wise and implementation-wise.

In-Game Assessments Increase Novice Programmers’ Engagement and Learning Efficiency

M. Lee, A. Ko, and I. Kwan. In-Game Assessments Increase Novice Programmers’ Engagement and Learning Efficiency, The Ninth International Computing Education Research Workshop (ICER), San Diego, USA, 2013.

Abstract—Assessments have been shown to have positive effects on learning in compulsory educational settings. However, much less is known about their effects in discretionary learning settings, especially in computing education and educational games. We hypothesized that adding assessments to an educational computing game would provide extra opportunities for players to practice and correct misconceptions, thereby affecting their performance on subsequent levels and their motivation to continue playing. To test this, we designed a game called Gidget, in which players help a robot find and fix defects in programs that follow a mastery learning paradigm. Across two studies, we manipulated the inclusion of multiple choice and self-explanation assessment levels in the game, measuring their impact on engagement and learning efficiency. In our first study, we found that including assessments caused learners to voluntarily play longer and complete more levels, suggesting increased engagement; in our second study, we found that including assessments caused learners to complete levels faster, suggesting increased learning efficiency. These findings suggest that including assessments in a discretionary computing education game may be a key design strategy for improving informal learning of computing concepts.

Papers Accepted to IEEE Visual Languages/Human-Centric Computing (VL/HCC)

Good news! We received notification today about two papers accepted to VL/HCC later this year. Here are the paper titles and abstracts. When the camera-ready preprints are ready, I’ll be sure to post those as well.

Helping End Users Help Themselves with Idea Gardening

J. Cao, I. Kwan, F. Bahmani, M. Burnett, J. Jordahl, A. Horvath, S. Fleming and S. Yang. End-User Programmers in Trouble: Can the Idea Garden Help Them to Help Themselves? to appear in the IEEE Conference on Visual Languages and Human-Centric Computing (VL/HCC), San Jose, USA, 2013

Abstract—End user programmers often get stuck because they do not know how to overcome their barriers. We have previously presented an approach called the Idea Garden, which makes minimalist, on-demand problem-solving support available to end user programmers in trouble. Its goal is to encourage end users to help themselves learn how to overcome programming difficulties as they encounter them. In this paper, we investigate whether the Idea Garden approach helps end-user programmers problem-solve their programs on their own. We ran a statistical experiment with 123 end-user programmers. The experiment’s results showed that, even when the Idea Garden was no longer available, participants with little knowledge of programming who previously used the Idea Garden were able to produce higher-quality programs than those who had not used the Idea Garden.

Keywords—Idea Garden; end-user programming; problem solving; barriers; mashups; quantitative empirical evaluation

User Interface Explanations in Intelligent Agents

T. Kulesza, S. Stumpf, M. Burnett, S. Yang, I. Kwan and W.-K. Wong. Too Much, Too Little, or Just Right? Ways Explanations Impact End Users’ Mental Models, to appear in the IEEE Conference on Visual Languages and Human-Centric Computing (VL/HCC), San Jose, USA, 2013

Abstract—Research is emerging on how end users can correct mistakes their intelligent agents make, but before users can correctly “debug” an intelligent agent, they need some degree of understanding of how it works. In this paper we consider ways intelligent agents should explain themselves to end users, especially focusing on how the soundness and completeness of the explanations impacts the fidelity of end users’ mental models. Our findings suggest that completeness is more important than soundness: increasing completeness via certain information types helped participants’ mental models and, surprisingly, their perception of the cost/benefit tradeoff of attending to the explanations. We also found that oversimplification, as per many commercial agents, can be a problem: when soundness was very low, participants experienced more mental demand and lost trust in the explanations, thereby reducing the likelihood that users will pay attention to such explanations at all.

Keywords—mental models; explanations; end-user debugging; recommender systems; intelligent agents

Our paper, “The Role of Domain Knowledge and Cross-Functional Communication in Socio-Technical Coordination”, to be presented this coming week at ICSE2013!

So, our paper at the International Conference on Software Engineering, titled The Role of Domain Knowledge and Cross-Functional Communication in Socio-Technical Coordination, will be presented this coming week in San Francisco. Daniela is going to be presenting this paper on Thursday, May 23 at 1:30 PM in Grand Ballroom B!

The preprint of this paper appears on my blog. The main story is that we examine how diverse roles in two teams in Brazil working on requirements and their related artifacts coordinated along task dependencies using a case study method, and report on how knowledge and work dependencies affect their work.

There are a number of other great papers that are appearing in the same session, including two co-authored by Prem Devanbu. It’s a good session to be at, in my opinion.

Hope to see you there!

The Whats and Hows of Programmers’ Foraging Diets: What Types of Information are Programmers Looking for?

Information seeking is one of the most important activities in human-computer interaction! One of the most influential theories in understanding, modelling, and predicting information seeking is information foraging theory. In our research, we want to understand what kinds of diets – that is, the types of information goals programmers seek while debugging. By investigating the information diets of professional programmers from an information foraging theory perspective, our work aims to help bridge the gap between results from software engineering research and Information Foraging Theory foundations as well as results from human-computer interaction research.

A pork chop taken by johnnystilletto on Flickr

Is this tasty?

A head of broccoli by Jim Mead

Is this tasty?

My co-author, David Piorkowski, is travelling soon to Paris to present our latest work: “The Whats and Hows of Programmers’ Foraging Diets”. It’s a great time to expand on this paper. Here’s the PDF Preprint!

Our Method

We had two coders examine video of nine professional programmers to identify what exactly they were looking for when trying to fix a bug in an unfamiliar open-source program. We tried to identify their overall diet by identifying if they asked questions (and received answers) belonging to one of four categories: (1) finding a place to start in code, (2) expanding on that initial starting point, (3) understanding a group of code, or (4) understanding groups of groups of code.

What is a programmer’s diet while debugging?

Overall, we found that programmers spend 50% of their debugging time foraging for information.

Surprisingly, even though all participants were pursuing the same overall goal (the bug), they sought highly diverse diets. For example, Participant 2 asked mostly about groups of groups, Participant 3 asked about finding a place to start, Participant 5 didn’t really ask about anything at all, and Participant 6 also looked for a place to start. This suggests a need for debugging tools to support “long tail” demand curves of programmer information.

How did a programmer consume these diets?

How exactly did programmers go about finding what they wanted to consume?

Again, participants used a diverse mix of strategies. Participants spent only 24% of their time following between-patch foraging strategies (such as code inspection or simply reading the package explorer straight up-and-down), but between-patch foraging (such as doing data flow or control flow) has received most of the research attention.

Surprisingly, search was not a very popular strategy, accounting for less than 15% of participants’ information foraging – and not used at all by 4 of our 9 participants—suggesting that tool support is still critical for non-search strategies in debugging!

Whats Meets Hows

Participants stubbornly pursued particular information in the face of high costs and meager returns. Some participants followed a single pattern over and over again, using the same strategy. For example, in the cases that involved a programmer looking for Type 1-initial goals, participants used code search and spatial strategies extensively but not particularly fruitfully. This emphasizes a key difference between software development and other foraging domains: the highly selective nature of programmers’ dietary needs!

Takeaways

Thus, we considered what programmers want in their diets and how they forage to fulfill each of their dietary needs. Our results suggest that the diet perspective can help reveal when programming tools help to reduce this net demand—and when they do not—during the 50% of debugging time programmers spend foraging.

References and Links

Are you going to be at CHI 2013? Where and when is David’s talk?  It’s on Thursday, May 2, at 11:00 in Room Blue… be there!

D. Piorkowski, S. D. Fleming, I. Kwan, M. Burnett, C. Scaffidi, R. Bellamy, J. Jordhal. The Whats and Hows of Programmers’ Foraging Diets, to appear in ACM Conference on Human-Computer Interaction (CHI), Paris, France, 2013. PDF Preprint

Our paper on the CHI 2013 web site

And… in case you haven’t seen it yet, the video preview!

Picture of tasty pork chop by Johnny Stilleto. Picture of tasty broccoli by Jim Mead.

The Role of Domain Knowledge and Cross-Functional Communication in Socio-Technical Coordination

Our recent ICSE paper has been accepted and I’ve made it available online here as a pre-print version: ICSE2013-DomainKnowledge-Paper38.pdf.

The paper discusses an investigation into the spread of domain knowledge, as well as specific cross-functional knowledge across two different global software teams. Essentially, there are two kinds of “structures” internally that may guide project communication. First, there’s the cross-functional communication structure, where people within the same roles are allowed to communicate but people of different roles need to communicate via certain team members (usually team leaders) to avoid misunderstandings. There’s also communication across task assignments as well.

One team had relatively experienced team members and a dense communication structure whereas the other team had inexperienced team members and a siloed communication structure. We identified that people with domain knowledge were more often involved in communication. We also identified brokers in both teams who mediated knowledge from person to person – these brokers spanned multiple application domains in our case studies. Surprisingly, team members followed the cross-functional communication structure, but they did not always follow the expected task assignments. We hope these results can help facilitate knowledge sharing and knowledge management in these types of teams.

Abstract: Software projects involve diverse roles and artifacts that have dependencies to requirements. Project team members in different roles need to coordinate but their coordination is affected by the availability of domain knowledge, which is distributed among different project members, and organizational structures that control cross-functional communication. Our study examines how information flowed between different roles in two software projects that had contrasting distributions of domain knowledge and different communication structures. Using observations, interviews, and surveys, we examined how diverse roles working on requirements and their related artifacts coordinated along task dependencies. We found that communication only partially matched task dependencies and that team members that are boundary spanners have extensive domain knowledge and hold key positions in the control structure. These findings have implications on how organizational structures interfere with task assignments and influence communication in the project, suggesting how practitioners can adjust team configuration and communication structures.

Daniela Damian, Remko Helms, Irwin Kwan, Sabrina Marczak, and Benjamin Koelewijn. “The Role of Domain Knowledge and Cross-Functional Communication in Socio-Technical Coordination”, to appear in the International Conference on Software Engineering (ICSE), May 18-26, 2013, San Francisco, USA.

Download ICSE2013-DomainKnowledge-Paper38.pdf.

The hidden experts in software-engineering communication (NIER track)

This article isn’t a new publication but I thought I’d provide some information about it here. I did this work by analyzing email communication between team members within a large, multinational organization: almost 5000 emails in all, sent all across the organization.

We found that many email discussions involved people who were included in the discussion thread only after the first email was sent! This was surprising because I thought, initially, that if you emailed people about a topic you’d put all of them in the To/CC of the first message. Instead, in this organization, in 57% of the threads someone added a new recipient to the To/CC list as the thread went on.

In addition, I examined the messages and identified four main situations why emergence occurred:

  • Crisis: There was a big crisis situation, and the message was being passed to as many people as possible so that someone, anyone, might have information that will help.
  • Explicit requests: In the discussion, there was a specific request that a person who was not initially included in the message be involved or undergo a task. This is quite common for expertise-seeking; some people would realise that they couldn’t solve a problem and CC a third-party for help.
  • Announcements: Announcements were large-scale announcements of some sort, and had to reach large numbers of people.
  • Following-up: After a particular event, a message would be sent following up on the event. If there were people involved in the event who were not initially invited, they were included on follow-up emails.

There were a number of takeaways that affect my email habits even now – I try to ensure that people are CCed right from the start, and if someone asks me to recommend someone they should talk to, rather than simply telling them that they should speak with Person X, I actually CC Person X as part of my reply.

ACM DL Author-ize serviceThe hidden experts in software-engineering communication (NIER track)

Irwin Kwan, Daniela Damian
ICSE ’11 Proceedings of the 33rd International Conference on Software Engineering, 2011