Much of my research at Oregon State University examines debugging using a lens called Information Foraging Theory. I’ve written a few posts on this topic but I haven’t really given a good overview of what Information Foraging Theory is and what it provides for software engineering.
The theory, in a nutshell, is a theory of human behaviour that describes how people forage for information. They are theorized to forage in a way to provide maximum benefit for minimum value and to make decisions based on input from the environment that affects this cost/benefit ratio. This theory is applicable to software engineering because software engineering is a very information-seeking intensive activity. People spend a lot of time looking for things—whether it’s “What does this variable do?” down to, “Where can I start investigating this problem?”
Another reason why this theory is valuable in software engineering is because software engineering research often is built on ideas but not necessarily on underlying theories. Information foraging theory provides a theoretical framework that can help consolidate previous results and provide not only an explanation for why previous tools and findings have worked in the past, but also can make predictions for how people may behave in the future.
Now that we have an idea of what it is and why it’s relevant to software engineering, let’s dive into what information foraging is. Much of this post is adapted from material that appears in An Information Foraging Theory Perspective on Tools for Debugging, Refactoring, and Reuse Tasks that appears in the ACM Transactions on Software Engineering and Methodology (TOSEM), 2013. In another post, we’ll talk about how it relates to software engineering research.
Information Foraging Theory: What it is
Information Foraging Theory was originally proposed by Peter Pirolli and Stuart Card at what was then Xerox PARC to explain how individuals search the web for information. The idea was inspired by ecology’s Optimal Foraging Theory which is the idea that foraging animals attempt to maximize their energy intake (by finding food) over the time required to find that food.
Constructs and Theory
In Information Foraging Theory, the human, called a predator, is looking for information in an environment, like the web. A predator can seek information from an information source, called an information patch, and a topology is made up of many patches. Many patches make up an information topology. Patches are connected to each other through links—each link requires a certain cost to go from one patch to another. Within each patch, there are information features. These features might be words or sentences on a screen, graphics and pictures, icons, even colours and shapes.
The predator has an information goal in mind and want to seek information that satisfies that goal. This predator forages through the information topology seeking prey, which are information features that are related to the predator’s goal.
The activity of getting at information has a cost (usually time) but consuming information from a source also has an associated value (how relevant or important the information is). After consuming some amount of information (which is called prey), the predator may decide that it’s no longer worth the predator’s time to continue processing that patch and the predator navigates away from the patch to a new one that is considered more valuable.
Some information features are connected to links. In web pages, links are usually located in particular places, are coloured differently, and are sometimes underlined when you mouse over them. These features are called cues. A predator can use these cues to try to predict the value of the information on the other side of a link.
So, a developer who is foraging for information has to make a decision whether to stay within the current patch and continue processing the information in it or to access a different patch and process information from there. To make the optimal decision, the developer wants highest value information for the lowest cost!
If we decide to use math to represent this relationship, it looks like this:
This is pretty basic so far—everyone wants to maximize their value and get the lowest cost! What is really interesting about this theory is what people’s perceptions of high value and low cost are.
Perceptions and Scents
Even though a predator wants to maximize value and get low cost, one of the main issues is that predators don’t know everything. They only know what they can see currently. Thus, predators perceive an expected value and an expected cost whenever they are processing information features from a patch, including the cues that indicate if a patch is worth leaving.
Since most patches have multiple cues, this means that the predator has to make a number of estimations, based on the cue (and possibly other factors) about whether to leave the patch. This is called information scent. Scent is often represented in practice by measures of textual similarity. Scent is also influenced by the amount of attention—for example, how big the cue’s visual size is, or the position of the cue.
Summary of Information Foraging Theory Constructs
That’s a lot of constructs. Fortunately, Fleming et al. (in an article that I helped write) built a pretty handy table to remind everyone what all of these concepts are.
|Topology||Collection of information patches and links between those patches within a particular information environment|
|Information patch||Region in the topology that contain information features|
|Links L||Traversable arcs between patches|
|Information features||Elements of the environment that the predator can process to gain knowledge|
|Cues||Set of information features associated with a particular link|
|Predator||Person in search of information|
|Information goal||Set of information features that the predator wants to find|
|Prey||An individual feature in the goal set|
|Information scent||Given a link with an associated cue, the predatorâ€™s estimation of the probability that traversing the link will lead to prey|
|Attention||Amount of attention that a predator pays to a particular cue|
|Information value V||Benefit of processed information to the predator|
|Interaction cost C||Value that the predator anticipates gaining through a particular course of action (e.g., following a particular link)|
|Expected value E(V)||Value that the predator anticipates gaining through a particular course of action (e.g., following a particular link)|
|Expected cost E(C)||Cost that the predator anticipates incurring in following of a particular course of action|
IFT’s Key Constructs, adapted from Fleming et al. 2013, An Information Foraging Theory Perspective on Tools for Debugging, Refactoring, and Reuse Tasks, ACM Transactions on Software Engineering and Methodology.
Predictions and Validations
There’s a lot of scientific work that has designed mathematical models of information foraging theory in the web domain. Pirolli and Card, 1999 investigated models to predict how people surf the web; this work was further augmented by incorporationg scent Chi et al. 2000, Chi et al. 2001.
Information foraging theory has also since been used to investigate collaborative search on the web, as well as social media tagging.
Next time: Information Foraging Theory in Software Engineering
Now that we have an idea of what information foraging theory is, I will present an overview next time about how this theory’s been applied in software engineering. So far, information foraging theory has been applied primarily to debugging tasks. Margaret Burnett has been leading the charge in this direction, but the concept is beginning to take hold in other areas of software engineering. Nan Niu, for instance, recently published at ICSE a requirements engineering paper on traceability using constructs from information foraging theory.
Stay tuned for the next part in this series!