Chapter 4: Data Analytics

Data Analytics

Warning: This chapter is full of statistical methods.


Properties have been slow to move past analysis of demographic and past behavioral data, primarily due to the fact that most do not have the training or personnel resources to spend on much more than that. Professional teams may have three or fewer employees in charge of database management and CRM. Currently, the supply of well-trained database personnel with good analytical skills and an understanding of fan attitudes and behavior is very limited.  Consequently, salaries are high and may not seem to fit within personnel budgets.

The graph below demonstrates the growth in demand for positions requiring CRM skills. [1] Similarly, data analyst salaries average is $77,000 in the US and has increased 12% since 2009.[2] Of course, you could just outsource all of your data analysis to India, given that they are paid in rupees, which apparently aren’t worth much.

CRM Hiring Trend

CRM Hiring Trend

Despite high salary levels, one must ask if hiring such an individual could at least recover the salary given the increased effectiveness and efficiency of the team’s marketing and sales efforts. The most aggressive sports and entertainment organizations believe so. For instance, Madison Square Garden employs over ten people in their Audience Insights & Planning division.  Others, such as the Chicago Bears, employ specialists in areas such as sponsorship development analysis—to provide sponsorship ROI analytics to support the sponsorship sales staff.  Others take advantage of consulting agencies, such as Turnkey Sports, for periodic projects and analyses. The point is that someone needs to be able to crunch the numbers, and it may as well be you.  Or, at least you should have a good working understanding of data analytics in case it comes up in casual conversation.

Data Analytics

In the context of sports and entertainment marketing, data analytics is the science of examining data using statistical methods and models to confirm, explain, or predict attitudes and behaviors of fans.  Sports & entertainment marketers are typically interested in predicting which fans are most likely to buy  tickets, attend events, consume media, support sponsors, and buy merchandise.

The model below illustrates the relationships between the types of data we collect. We will discuss attitudes and behavior before turning to the topics of data, methods, and models.


Our attitudes determine our behavior.  Our attitudes about a property are made up of our beliefs or perceptions (what we think) about the team, what we think about our interactions with others and what we believe others in our social sphere believe about the team (what others think), our overall affective response (how we feel) toward the team, and our behavioral intentions (what we do) regarding the team. Of course, individual differences influence how we think and respond to the team.

As the analytical people that we are, we are interested in understanding the relationships among the data that represent these dimensions that confirm, explain, or predict fan behavior.  Teams have typically done well to collect demographics and past behavior, with less emphasis on the other individual factors and attitudinal dimensions. What kinds of data might we collect? Let’s examine Individual Factors which may influence attitudes while explaining the different types of data we collect.

Types of Data

Individual Factors

Sports and entertainment organizations routinely gather demographic data such as gender, income, education, address/zip code, date of birth, and marital status. Past behaviors are also frequently recorded in terms of previous attendance, when games were attended, and other purchase behavior.  Related behaviors, such as participation in fantasy leagues, may also be found to increase media consumption and attendance.[3]

Much of the demographic and some behavioral data are categorical (nominal) data. Gender may be coded as male (1) and female (2) in the data. The numbers represent the category, but nothing else. Organizations sometimes collect ordinal data (e.g., “Rank your top three…”), which is less useful for analysis because one can only tally the results.  We should point out now that we do not recommend the use of ranking data. We’ll explain more why later, just to try to add some suspense to this chapter. For now, entertain yourself by reading the chart[4] below with stair steps depicting the four types of data.


Organizations may gather psychographic data regarding an individual’s enduring traits or values that might influence perceptions, feelings, and behaviors.  For instance, research shows that individuals expressing a high need for thrill, adventure, curiosity, arousal, and social standing are good candidates for participating in and watching high risk sports, such as those featured in ESPN’s X-Games. [5] While this finding may not be particularly shocking to you, you might be surprised to know that 55% of Australian Rules Football fans don’t know that players’ shorts have remained the same length as when the sport was first introduced in 1859 in Melbourne, Australia.[6] In any case, these kinds of psychographics are measured with interval data that allow us to determine correlations between such things as fan traits and fan perceptions.

Situations or conditions may influence how we perceive the team and its players. The rivalry between the home team and the opponent is a situational element that influences the likelihood of attendance.[7] Similarly, quality of the teams playing, TV coverage, day of the week, time of day, and month in the season, are variables that influence sport consumption (namely, gambling).[8] In the university setting, you may not have been a huge fan of the university team before you arrived at school. However, now that your situation has changed, you are now a fan of the team.  Similarly, your parents might have graduated from a different institution, but now that you are enrolled here, they might attend games with you. Once you graduate, the odds are low that they will closely identify with the team and attend games. Hence, the situation influenced their attitudes and behaviors. In such a case, it would be important for the team to know the affiliation of family members so as to strengthen the relationship with the organization to continue after you graduate.[9] If  you graduate.

The simplest form of data collection is nominal data, as when we ask you to identify your gender. For most people this is a fairly straightforward dichotomous response.  In other cases, however, a frequent mistake in data collection is to ask for nominal (categorical) data when the same information can be collected with continuous (interval or ratio) data. Interval data assumes equal distances between intervals. Consider these questions:

Passion Questions

Avid Fan Questions

In the second format, we can still count the number of extremely avid fans marking 4’s and/or 5’s. We can also count those indifferent (3’s) and those less (2’s) or not at all avid (1’s).  We can make two groups of low (1’s and 2’s) and high (4’s and 5’s) if we want.  If we wanted people to decide one way or the other, we could use an even-numbered interval scale (the 3rd question above) so that we could divide all fans into low (1-3) and high (4-6) groups. More importantly, however, with interval data we can determine if a linear relationship exists between the variable (avidity) and other variables (such as attendance).

Ratio data has an origin (zero) and absolute equal distance between data. For example, instead of asking individuals to check an age category (nominal), ratio data can be collected by asking for year of birth (by which age is calculated from zero to infinity).  In case you’re wondering, you’ll get just as many or more responses to the year-of-birth question as you will for broad age categories (e.g., 18-24, 25-34, etc.).  Again, you can always regroup the data into broad categories if you want. But, you cannot break down the broad age categories into more specific age groups. In the same way, instead of asking fans to check a box for the number of games  attended in categories (e.g., 0, 1-5, 6-10, 11-15, etc.), just ask them how many of the home games they have or will attend (e.g., 0-41 NBA games). They’ll be at least as accurate as checking the broad categories and the ratio data allows us to conduct more robust statistical analyses. And, we’re all about being robust.


We are interested in any personal perceptions and social perceptions related to what people may think that might influence behavior relevant to revenue (viz., meeting, media, and merchandise). Perceptions vary from person to person regarding any particular stimuli or input. Some people look at the picture of Lady Gaga and see one of the greatest performers of our time. What do you think about when you see Lady Gaga?

Gaga cover

We can measure personal perceptions (what people think) about Lady Gaga on a variety of dimensions. We can measure social perceptions regarding what they believe others think. How do you think fans in Europe and America perceive Lady Gaga? If you believe that the average scores on her public image are significantly different between fans in these two places, then you are probably thinking of an analysis of variance (ANOVA) statistical test.  That’s great—because that means we are on the same page going forward in this chapter.

With respect to social perceptions, we include what we think about interacting with others and also what we believe others think about aspects related to the experience. These include any of the variables in our identification model (e.g., social prestige), as well as other elements known to influence behavior. Beyond what we already know about the prominent role of identification and passion influencing behavior (Chapters 1 and 2), the table below provides an overview of recent research on other personal perceptions, social perceptions, and affective responses found to influence attendance at events.

Influences on Fan Consumption Behavior*

Personal Perceptions Social Perceptions Affective Response
Player image and skills Community pride Excitement
Wholesome environment Socialization Pleasure
Cause support Bonding Arousal
Drama Perceived Crowding Boredom
Service quality Social dysfunction Displeasure
Sportscape environment Social aggression/violence Suspense
Variety Seeking Team social status Stress
Ticket and promotion value Social well-being Enjoyment
Destination image Social integration Adoration
Outcome uncertainty Camaraderie Vicarious achievement
Leisure alternatives Socially-connected (isolated) Liking
Escape Celebrity/Player Worship Satisfaction
Fantasy & flow Familial participation Moods (romantic)
Website quality & theme Gender identity Hope
*Don’t worry. Fans are not consumed in any tangible form. Also, please see the end of the chapter for the research references from which this table was generated.

This table illustrates the breadth and depth of perceptions and feelings that can influence fan behavior. While not exhaustive, this table provides a good starting point for understanding that different psychological constructs can be identified and measured.

What is a construct? A construct represents an unobservable psychological trait or state that can be measured indirectly with a collection of related behaviors or opinions that are associated in a meaningful way.  Constructs have clear boundaries that differentiate the concept from other constructs. Excitement is a construct that represents an emotional response to a stimulus that can be described in affective terms such as exciting, sensational, stimulating, and thrilling. Excitement is clearly different from boredom, but may be related to other positive emotions such as pleasure. Now would be a good time to go back to look at the Circumplex of Emotions illustration in Chapter 2. Each of these emotions in the circumplex are different constructs. Each can be measured with a collection of closely related items that represent the construct.

Sometimes clearly defined constructs can be measured with single-items. For instance, satisfaction is often measured with a single item similar to the first item below. However, more complex constructs require multiple items to make sure we have valid measures. Validity means we are measuring what we say we are measuring. If we want reliable results, we often use more than one item to make certain that we obtain a consistent pattern of results each time we measure the construct. What might happen if we used the second item below to ask fans their perceptions of a concert? Dependent upon the concert, fans might say that the performance was extraordinary–extraordinarily bad or extraordinarily good. And, an “ordinary” concert for Coldplay is excellent, whereas an “ordinary” concert for someone else may be terrible. Hence, great care must be taken in selecting the items we use to measure constructs.

Dissatisfied         1              2              3              4              5              Satisfied

Ordinary              1              2              3              4              5              Extraordinary

Perceptions and Affective Response

In most of the studies referenced in the table above, the researchers examined the effects of personal and/or social perceptions on affective responses of fans, which in turn influenced consumption behaviors. This think–>feel–>do  relationship is in keeping with classic consumer behavior models.

We can often measure specific perceptions (what we think) about ourselves and others.  These perceptions may be interrelated. For example, fans who think the team performs well are likely to have high perceived value of tickets and promotions and to believe others think the same.

Emotions tend to be more holistic or global in nature and function as a consequence of related cognitions. For example, a variety of perceptions regarding the venue, the fans, the show, and the performers may lead to feelings of excitement or pleasure at an event. For reasons of consistency, if you think positive things about a property, you will experience positive feelings about the property, and will be willing to spend resources to enjoy experiences related to the property. In summary, what you think, feel, and do makes up your attitude toward the property.

Stats are Fun

We now have some notion of the types of data we can collect and how the data represents what might be going on in the minds and hearts of fans. In order to analyze data, we must be familiar with the basic statistical methods that we were supposed to have learned during required statistical courses but have long since purged from memory. However, since you have become engrossed in this chapter, you are already thinking about how we might measure student perceptions of your statistics courses. This is, in fact, the intended purpose of teaching evaluation forms. And, as you know, those forms would be a great concept if anyone ever actually looked at the results.[10]

Sorry for the distraction. Now back to the fun of statistics. Let us first establish the fact that statistics are fun by asking you to refer to the following:

stats humor 1

Illustration #1

stats humor 2

Illustration #2






Please click here for Illustration #3



Strictly speaking, the third illustration is about math not statistics. But, statistics seem to include a lot of math, so either way it’s funny.

Sorry again for the distraction. We are now going to get directly to the point. We must use statistics in order to analyze data. What kind of statistics? First, we assume that you recall the basic principles of central tendency.[11] Along with central tendency, the following are our favorite types of statistical concepts and methods:

Statistical Method You’re looking for: Variable types X–>Y example
Crosstabs(Chi-square) Category differences X=nominalY=nominal Season subscribers (Y/N) differ by gender (M/F)
ANOVA(F-test) Differences in means between groups X=nominalY=interval/ratio Events attended (0-maximum) differs by gender (M/F)
Correlation(t-value) Relationships between two variables X=interval/ratioY=interval/ratio People who are more passionate fans are likely to attend more frequently
Multiple Regression(t-value) Predicting one variable based on two+ variables X=interval/ratio or dummy (0,1)Y=interval/ratio Multiple personal traits (passion, involvement, etc.) predict attendance

In the next section, we will demonstrate how to use these statistical methods by examining the effects of fan passion on key fan behaviors. The good news is you don’t have to remember how to hand-crank all of the formulas for analysis of variance (ANOVA) or regression. Nobody does that anymore.

The statistical package we will use to work this magic will be SPSS, which stands for Statistical Package for the Social Sciences. Most upstanding business schools have this software on the PCs in the business school lab.[12] All you have to know is how to press the correct buttons and to interpret the data. Accordingly, we have this handy table below to assist in this matter.


For all statistical analyses, click on Analyze

Statistical Analysis Then click Then click In box click
Crosstabs (Chi-square) Descriptives Crosstabs Stats: Chi-SquareCells: Row, Column, Total
ANOVA (comparing means of groups) Compare Means One-way Anova Options: DescriptivesFactor: Insert categorical dataDependent: Insert interval data
Correlation Correlate Bivariate None
Multiple Regression Regression Linear NoneIndependent(s): X-variablesDependent: Y-variable

Assuming we are looking for differences, it’s good to remember when running these analyses that we generally like to find probabilities less than or equal to .05 (p ≤.05). Failing that, we’ll just tell it like it is. If the significance is .06, we’ll call it .06. What does that mean? It means that 94 times out of 100, we would expect the same relationship between the variables to hold.  Odds are that this is a relatively weak relationship. Yet, it makes no sense to say that p = .05 is worthy of our attention, but that we will totally ignore p = .06. That said, we will get pretty geeked up if we get p ≤ .01.

The Passion of the Fan

When we collect data for research or database purposes, we are intentional about collecting data that we already know or hypothesize will explain or predict buying behavior. Each question or item we ask customers to complete should have a useful purpose. Two of the most common data analytics goals are to produce (a) lead scoring models–predicting which suspects in the data base are the best prospects for future sales, and (b)retention models–explaining what customer characteristics and customer touch points are associated with loyal customers compared to those who defect.

Data mining approaches search for significant statistical relationships in data that has already been collected. A more proactive approach take advantage of data mining, but seeks to build models based on solid theory and practice. As you already know by now, fan passion is a powerful tool in predicting fan behaviors.

Fan passion is the degree to which one devotes one’s heart, soul, mind, and time to the object of the passion. We measure the construct of fan passion with the four items displayed (below). The first and fourth items measure a fan’s cognition about abstract feelings (i.e., what he thinks about how he feels). The second and third item measures what the fan reports regarding how he spends mental and physical resources following the property (team).

passion-items1Survey Items for Passion Construct

Survey software tools, such as Qualtrics, enable the measurement of each of these items with a sliding scale ranging from zero to 100. While perhaps not true ratio scales, we can treat it as such since we have an origin of zero that means something: zero passion. The average score for an individual on these scales is also a percentage of the maximum score. We can compute a summed mean score by adding the responses to each of the items and dividing by the number of items:

(item1 + item2 + item3 + item4)/4 = X/100

Pick a property and compute your passion score using the four items. For instance, if you were to ask me about the Texas Rangers, my scores would be 90, 60, 70, and 40. I know that last score looks low compared to the others. I love the Rangers, but my life would probably still be okay without Rangers baseball. However, I know some people who are obsessed with teams like the Dallas Cowboys and they consider ending their lives after important losses. But, enough about their problems. My summed mean score would be 65. If we knew your passion score for a particular team, compared to other fans’ scores, what would we be able to explain or predict about your behaviors?

To demonstrate relationships between individual factors, perceptions, feelings, and behaviors, we collected data from over1600 residents representative of the population in terms of gender, age, ethnic heritage, and marital status in the Atlanta (GA) and Miami (FL) areas. A second study of 1200 residents representative of the population in the Dallas-Ft.Worth area is also used to illustrate our methods of data analysis in the sports context.

The following sections provide examples of descriptive analyses, followed by (1) relationships between nominal variables (Crosstabs), (2) relationships between nominal data and interval data (ANOVA), and  (3 relationships between interval data (correlations, MANOVA, and multiple regression). We examine relationships between dependent variables, typically behaviors, since we expect that what people do are consequences or dependent upon who they are (demographics) or what they think or feel (e.g., passion). An independent variable (IV) is expected to influence or determine the dependent variable (DV). Independent variables are symbolically referred to as X’s and dependent variables are designated as Y’s.  For example, fan passion (IV) should influence our five DV’s (attendance; TV, radio, news, and web consumption).

Descriptive Analysis

To measure consumption behaviors, we asked fans a variety of questions regarding their attendance at events related to music, causes, and sports, including:

  • How many music concerts of professional artists did you attend in the past 12 months?
  • How many professional sporting events did you attend in the past 12 months?
  • How many cause-related events did you attend in the past 12 months?

Additional questions asked about attendance at specific venues in the area. All of our variables are continuous variables (interval or ratio data), which allow us to compute means and to analyze linear relationships between variables. We also used our four-item passion scale to measure fan passion for music, sports, and causes.

The most basic reports, and the ones most frequently used by managers, include summary statistics reflecting the central tendencies of fans. Based on this representative sample from these markets, what can you learn about fans? Below is selected SPSS output from running Descriptives.


At the most basic level, we learn that on average people in these two cities attend one to two (1.68) professional concerts a year, attend about three cause-related events (3.17) and attend four pro sporting events in these cities (4.03). Looking at the passion scores, it looks like people in these cities are most passionate about music, compared to causes and sports.

If we wanted to re-teach the basics of central tendency again, we would spend a few pages discussing histograms, normal curves, standard distributions, and the like. But, since we want to get done with this chapter and on with our favorite statistical methods, we’ll just summarize by saying that there’s a lot of variance going on. If that doesn’t suffice, please go to the Social Research Methods page for a review.

Mean scores are more meaningful if we have some point of comparison. A fan passion score means more if you know the average fan passion score for other alternatives in the same category in the same market. Thankfully, we thought of that when we did another survey by asking the single-item passion measure (the first one of our four passion items) for all of the teams in the Fort Worth-Dallas area. Comparing the single passion item for all teams we can determine the relative strength of the fan followings in the market:

Fan Passion in the DFW Market

  1. Cowboys 64.03
  2. Mavericks 50.39
  3. Rangers 47.83
  4. Stars 34.97
  5. TCU 29.49
  6. SMU 23.02
  7. FC Dallas 17.85

Some readers may still be in suspense from earlier in the chapter when we noted that we do not recommend the use of rank-order (ordinal) data. As you can see, we were able to rank-order the favorite teams in the DFW area by listing the mean scores in descending order. In so doing, we can determine the relative distance in fan passion between any two teams. The data is more accurate and allows us to conduct other analyses since we have continuous (not rank-order) data.


We can regroup continuous data into categories if we want to report according to group size. Suppose we want to compare the percentage of passionate fans of music with sports?In SPSS, we can transform the data by recoding into different variables so that we have one group of highly passionate fans (passion ≥ 50) and less passionate fans (passion <50). For our purposes, we will name these new categorical variables Passion_SportsHighLow and Passion_MusicHIghLow, to denote we have split them into two categories of low and high passion.

An interesting question would be: What is the overlap between the number of passionate sports fans and passionate music fans? We can calculate this overlap by performing a crosstabs. The command interface in SPSS appears below.

crosstabs commands

Crosstabs Results

Crosstabs Results

In this case, we see that 55.8% (643/1152) of passionate music fans are also passionate sports fans. Among passionate sports fans, 80.5% (643/799) are passionate music fans. So, one could say that if you are a passionate sports fan, you are most likely also a passionate music fan, but not quite so much vice-versa. Statistically speaking, there is a significant likelihood that music fans will also be sports fan (Chi-Square = 65.188, p < .001). The crosstabs results also show us that49.4% of the market are passionate sports fans and 50.6% of the market are passionate music fans. So, as you can see, crosstabs are a pretty handy statistical tool that provides relatively straightforward statistics easily understood by managers when you put them into nice PowerPoint slides with fewer numbers and words and more pictures.

Analysis of Variance (ANOVA)

We are often interested in determining if perceptions, feelings, or behaviors differ among different types of fans. Do frequent movie-goers have different perceptions of customer service than infrequent movie-goers? Do members of a specific ethnic background attend more or less events than others? Do women think there are enough restroom facilities compared to men? The last question is of such great importance that Congress is taking action to secure “potty parity” in federal buildings. We prefer to refer to this issue as “porcelain proportionality,” and hope that the legislators can find a bi-partisan solution that not only ensures equality of opportunity, but equality of outcome.[13]

Analysis of variance (ANOVA) determines the effect of categorical variables oncontinuous variables. The key thing to remember is that the independent variable is nominal data. You are comparing the mean scores from continuous variables between groups of people. Statisticians refer to the categorical variables as factors in ANOVA. From our DFW data, we might wonder if males or females are more likely to be passionate fans of the sports teams in this market. Gender is the factor and fan passion (a one-item score from 0-10) for the teams are the dependent variables.[2]

Fan Passion ANOVA Results ANOVA-results

For which of the DFW teams is gender a significant influence on fan passion? Examining the significance levels, we see significant differences for fans of the Cowboys (p = .019) and the Rangers (p = .014). By comparing the mean scores for fan passion for the Cowboys, for instance, males (6.65) are more passionate fans than females (6.16). However, no significant difference exists between male and female fans of the Mavericks (p = .128), the Stars (p = .068), and FC Dallas (p = .438). The mean scores of males are slightly higher for these three teams, but the difference is not significant. Hence, we, we see that males are generally more likely to be passionate fans than females. However, the difference is not significant for the Mavericks (p = .128) and only marginally significant for the Stars (p = .068). In many ways, this is good news, suggesting these three teams do a good job of attracting both males and females. Note the word “suggest.” We don’t ever “prove” anything with statistics, we just provide evidence or support confirming or explaining relationships.

Multivariate Analysis of Variance (MANOVA)

Sometimes we are interested in the interaction effects of more than one factor on one or more dependent variables. For example, we might want to determine if gender and marital status interact to influence fan passion. Marital status in this data includes single, domestic partner, married, separated, divorced, and widowed. Would you expect males or females in any of those categories to be significantly more passionate about the Cowboys? Does getting married infringe upon being a passionate fan? Interesting that you ask.

Dallas Cowboys Average Fan Passion
Marital Status (F = 2.65, p = .02) Male Female
Single 70.97 64.73
Domestic Partner 50.00 68.11
Married 67.18 60.67
Separated 81.67 41.43
Divorced 61.79 58.70
Widowed 59.00 57.22
OVERALL 66.4 61.6

The multivariate F-test indicates that a two-way interaction occurs (F = 2.65, p = .02), driven by large differences in two of the marital status categories. Overall, the pattern of results suggests what we already know; that men are more passionate about the Cowboys than females. However, the results offer interesting implications for those living with domestic partners and those recently separated. What do you think these implications are? Why are females living with a domestic partner much more passionate about the Cowboys than males in the same situation? Conversely, why are males going through a separation the most passionate fan of all (and 40 points higher than separated females)? For personal safety reasons, we are not going to answer those questions.

Now, let’s consider a model regarding the effect of fan passion on attendance and other consumption behaviors related to the Dallas Mavericks. Instead of grouping into only low and high passion fans, let’s group fans based on their passion scores and label them as they appear in the table below. In this case, we have only one factor (the six levels of fan passion) and five dependent variables. To account for multiple probabilities with multiple dependent variables, we use MANOVA to calculate whether or not the mean scores for each of the dependent variables are significantly different across the six fan types. We interpret the results the same way we do in ANOVA by examining the F-value (Wilks’ Lambda = 90.41, p < .001) for the multivariate effects and then each of the individual F-values for the dependent variables to determine significance. The F-values for TV (523.23), radio (131.93), news (291.36), website (237.46), and attendance (67.34) in the output (not shown) are all significant (p <.001). In case you are wondering, we controlled for individual factors such as income, age, and gender.

Consumption: % of 82 games 41 games
Passion Fan Type TV Radio News Website Attendance
0 Non-fan 0 0 0 0 0
1-20 Inactive 7% 2% 13% 1% 0
20-39 TV Fan 32% 11% 39% 12% 0
40-59 Active 56% 20% 63% 24% 2
60-79 Game 73% 31% 75% 48% 4
80-100 Passionate 83% 51% 82% 69% 7

These examples illustrate that with the MANOVA method we can determine the effects of multiple factors on multiple dependent variables. MANOVA is also useful in that we can control for other explanatory variables, such as income, age, or any other continuous variable. While beyond the scope of this chapter to explain these methods, the hope is that the reader recognizes the capabilities available to data analysts. From a practical perspective, the point is that  marketers are interested in designing marketing and sales programs that more effectively target buyers. Examining buyer behaviors with existing and newly designed data capture may reveal specific, sizeable segments that can be selected from the database for a particular campaign. Sweet.


Experimental Design

Analysis of variance is also an important tool in experimental design. Experimental design manipulates the factors (IVs) and controls for other variables (covariates) that might influence the dependent variable (DV). With experimental design, the goal is to control for all of the other possible explanatory variables so that we can determine the effect that is only due to the change in the manipulated factor.

Managers are often interested in determining whether a new promotional tool is more effective than another. Is one weekend promotion more effective than another? Does changing the design of the electronic newsletter result in more open and click-thru rates? Many other things can influence why people respond to promotions or newsletters, besides any changes we might make to improve them. How do we know the results aren’t due to random chance? We design experiments.

We conducted an experiment to test the effects of using an in-person avatar on a website to promote offers to visitors to the site. The presence or absence of the avatar was manipulated across identical websites. Click here to see an example of what we mean:Washington Capitals avatar. Using a representative sample of online shoppers, half visited the same website with no in-person avatar and half of the sample visited the website with the avatar. Nothing else was changed between the two websites. We controlled for personal differences in terms of demographics and shopping habits. The results showed that the use of the avatar significantly increased shoppers’ feelings of the “socialness” of the website, which in turn made them more likely to visit the website and make purchases. The experimental design allowed us to conclude that the effects were solely due to the avatar.[14]

In the same way, database managers working with sales managers may run experiments to determine if a particular sales offer is more effective than another. For example, following directions from management, salespeople could alternate two offers and record responses into the CRM system. Given that the team has access to other information about the customers (e.g., demographics), analysis of variance can reveal which offer is more productive.


We use correlations to determine if a change in one variable is associated with a change in another variable. Each of the variables must be continuous data. Correlation coefficients (denoted as “r”) range from -1 to +1. Values near zero suggest little correlation, while numbers closer to +/- 1 indicate stronger correlations.

Consider the table below regarding the attendance of professional music concerts, fan passion (our 4-item measure), fan passion (single-item measure), household income, age, household size, marital status, and gender. Marital status (single vs. married/partnered) and gender (male/female) are dummy variables (coded 1 and 2, respectively). We included both the sum average score for the 4-item passion scale and the single-item measure to show that the 4-item scale explains more of the variance in attendance, supporting our earlier claim regarding construct measurement with multiple items. But, if you were limited in space while collecting data, the single-item is still useful.

SPSS Correlation Commands

SPSS Correlation Commands

Correlation Results

Correlation Results

Which variables have the strongest correlations with attending concerts? If you said fan passion (.389) and income (.129),  then you are able to tell the difference between bigger numbers and smaller ones.

The size of the coefficients indicates the extent to which a change in one variable corresponds with a change in another variable. The significant (**) correlations indicate that as income (..129) and passion (.389) increase, so does attendance at concerts. Conversely, age has a negative correlation with attendance (-.08), suggesting that as people age they are less likely to attend concerts (or that younger are more likely to attend). The effects of marital status (-.051) and gender (-.056) indicate married/partnered (singles) and females (males) attend less (more) frequently.

What inquisitive minds want to know is whether the individual demographic factors matter at all if we know how passionate fans are about music. Given the strong correlation with passion for music and concert attendance, maybe the demographics don’t add much explanatory value. We can answer this question with our last favorite statistical method.

Multiple Regression

Around the turn of the 19th century, Karl “Carl” Pearson was sitting around thinking about establishing the discipline of mathematical statistics. He’d already come up with plenty of great statistical ideas, such as the correlation coefficient, continuous curves, chi-square, p-values, and the like. Then, suddenly, in 1898, it dawned on him that what with all of these correlations, maybe he should create multiple regression. He announced his finding to his wife, Maria, who pointed out that computers weren’t going to be invented for another 40 years, so he may as well go spend his time down at his other great idea, the Men and Women’s Club.

Nonetheless, today, when we have more than one continuous independent variable and one continuous dependent variable, we can conduct multiple regression analyses with relative ease. Referring back to the previous correlation table, we can see that some of the variables are correlated with each other. If attendance at concerts is our dependent variable, what we want to know is the extent to which any of these variables independently explain or predict attendance. We would expect that fan passion will predict attendance, but will also knowing income levels, household size, or other demographic information add explanatory value? The table below displays the results of the relevant multiple regression.

SPSS Regression Commands

SPSS Regression Commands

Passion Regression Results

Passion Regression Results

The model explains 21.4% of the variance in attendance at professional concerts in Atlanta and Miami, as denoted by the R2 figure. Which of the independent variables (IV’s) explains a significant amount of variance in attendance? We examine those variables that are significant (p < .05) and the associated standardized beta coefficient. This beta coefficient signifies the weight (or level of influence) the variable has on the dependent variable relative to other variables. As we can see, fan passion (.431) has the strongest level of influence, followed by income (.233). Gender (-.048, p = .043), used as a dummy variable (1,2), indicates males (females) are more (less) likely to attend. Household size (-.032) is significant (p < .0001), suggesting families (larger households) are less likely to attend. In this equation, age (p = .206) and marital status (p = .210) are not significant.

Does the fact that age and marital status are not significant mean these variables are not important? No, not necessarily. Recall from the correlation results that these are significantly correlated with attendance. What the regression results reveal is that these two variables do not add any explanatory or predictive power beyond the other four significant variables. As a note, the VIF column lets us know if some of the variables are highly correlated with other variables in the model that are supposed to be independent (not dependent on each other) variables. The larger the VIF scores, the more likely a problem with multicollinearity exists.

Do you understand why it is important to collect data on things other than demographics? What if all we had were age, income, gender, marital status, and household size?These are the typical data teams collect on fans. How much variance do those variables explain in attendance? A grand total of 4.7%.


So, to sum up this chapter of fun with statistics, what did we learn? We learned that to conduct data analytics that we must use statistics and that statistics are fun. Or at least have the potential for fun if you hang around with people like Karl “Carl” Pearson.


References for the Table: Influences on Fan Consumption Behavior

Please read all of the following articles from which this table was derived.

  1. Pritchard, Mark P. and Daniel C. Funk (2010), “The formation and effect of attitude importance in professional sport,” European Journal of Marketing, 44(7/8), 1017-1036.
  2. Wakefield, Kirk L. and Jeffrey G. Blodgett (1994), “The importance of servicescapes in leisure service settings,” Journal of Services Marketing, 8, 66-76.
  3. Wakefield and Blodgett (1999), “Customer responses to intangible and tangible service factors”, Psychology & Marketing, 16, 51-68.
  4. Wakefield and Hugh J. Sloan (1995), “The effects of team loyalty and selected stadium factors on spectator attendance”, Journal of Sport Management, 9, 153-72.
  5. Wakefield and Victoria D. Bush (1998), “Promoting leisure services: economic and emotional aspects of consumer response,” Journal of Services Marketing, 12 (3), 209
  6. Wakefield and James H. Barnes, “Retailing hedonic consumption: A model of sales promotion of a leisure service,” Journal of Retailing, 72 (4), 409-427.
  7. Mohan, Leon J. (2010), “Effect of destination image on attendance at team sporting events,” Tourism and Hospitality Research, 10 (3), 157-170.
  8. Alavy, Kevin, Alison Gaskell, Stephanie Leach, and Stefan Szymanski (2010), “On the edge of your seat: Demand for football on television and uncertainty of outcome hypothesis,” International Journal of Sport Finance, 5, 75-95.
  9. Yo Kyoum Kim and Galen Trail (2010), “Constraints and motivators: A new model to explain sport consumer behavior,” Journal of Sport Management, 24 (2), 190-210.
  10. Wakefield and Daniel L Wann (2006), “An examination of dysfunctional sport fans: method of classification and relationships with problem behaviors,” Journal of Leisure Research, 38, 168-186.
  11. Andrew, Damon PS, (2009), “The relationship between spectator motivations and media and merchandise consumption at a professional martial arts event,” Sport Marketing Quarterly, 18, 199-209.
  12. Wann, Daniel L. and Stephen Weaver (2009), “Understanding the relationship between sport team identification and dimensions of social well-being,” North American Journal of Psychology, 11(2), 219-230.
  13. End, Christian M., (2009), “Sports and relationships: The influence of game outcome on romantic relationships,” North American Journal of Psychology, 11 (1), 37-48.
  14. Woo, Boyun (2009), “Testing models of motives and points of attachment among spectators in college football,” Sport Marketing Quarterly, 18, 38-53.
  15. Pritchard, Mark P., Daniel C. Funk, and Kostas Alexandris (2009), “Barriers to repeat patronage: The impact of spectator constraints,” European Journal of Marketing, 43 (1/2), 169-187.
  16. Filo, Kevin, Daniel C. Funk, and Glen Hornby (2009), “The role of web site content on motive and attitude change for sport events,” Journal of Sport Management, 23 (1), 21-41.
  17. O’Reilly, Norm, (2008), “If you can’t win, why should I buy a ticket: Hope, fan welfare, and competitive balance,” International Journal of Sport Finance, 3 (2), 106-118.
  18. Stevens, Matthew and Martin Young (2010), “Independent correlates of reported gambling among indigenous Australians,” Social Indicators Research, 98 (1), 147-166.
  19. Hyman, Micahel R. and Jeremy J. Sierra (2010), “Idolizing sport celebrities: A gateway to psychopathology,” Young Consumers, 11 (3), 226.
  20. Romano, Maria Clelia and Dario Bruzzese (2007), “Fathers’ participation in the domestic activities of everyday life,” Social Indicators Research, 84, 97-116.
  21. McCabe, Catherine (2007), “Spectators’ attitudes toward basketball: An application of multifactorial gender identity,” North American Journal of Psychology, 9 (2), 211-228.



[3] Nesbit, Todd M. and Kerry A. King (2010), “The impact of fantasy football participation on NFL attendance,” Atlantic Economic Journal, 38 (1), 95-108.

[4] Adapted from:

[5] Shoham, Aviv, Gregory M. Rose, and Lynn R. Kahle (1998), “Marketing of risky sports: From intention to action,” Journal of the Academy of Marketing Science, 26 (Fall), 307-321.

[6] Actually, research shows that 55% of “footy” fans attend games primarily as “Theatre Goers” who are there for the entertainment of a good match—not necessarily to see the team win. See Tapp, Alan and Jeff Clowes (2002), “From ‘carefree casuals’ to ‘professional wanderers’: Segmentation possibilities for football supporters,” European Journal of Marketing, 36 (11/12), 1248-1269.

[7] Luellen, Tar B. and Daniel L. Wann (2010), “Rival salience and sport team identification,”Sport Marketing Quarterly, 19, 97-106.

[8] Paul, Rodney J. and Andrew P. Weinbach (2010), “The determinants of betting volume for sports in North America : Evidence of sports betting as consumption in the NBA and NHL,” International Journal of Sport Finance, 5, 128-140.

[9] See Pritchard, Mark P., Jeffrey Stinson, and Elizabeth Patton (2010), “Affinity and affiliation: The dual-carriage way to team identification,” Sport Marketing Quarterly, 19 (2), 67-77.

[10] In case you are wondering, your professor does look at the results and deeply cares for you as an individual.

[11]Oh, so you don’t? Well, then click on these links to bring it back to memory: mean, median, mode, standard deviation, standard error, and confidence intervals. You may also Google other helpful pages.

[12] SPSS is on the Baylor Hankamer Business School labs and in the Curb Room (C105).

[13] It may interest you to know that women, on average, take about twice as long to use facilities as men. For more on this story, please see:

[14] See Wang, Liz, Julie Baker, Judy Wagner, and Kirk Wakefield (2007), “Can a retail website be social?”Journal of Marketing, 71, 143-157.; Wakefield, Robin, Kirk Wakefield, Julie Baker and Liz Wang (2010), “How website socialness leads to website Use,” European Journal of Information Systems, August.

Print Friendly
  1. [1]


  1. Alexander Wheaton
    Sep 13, 2011 @ 05:13:32

    Despite the fact that it doesn’t directly have anything to do with statistics, “Illustration #3″ under “Stats are Fun” is, in fact, very funny. Also, I’ve noticed that no one has commented on the site thus far, so I figured that making a post would be a beneficial decision.

    • Dr W
      Sep 13, 2011 @ 10:19:08

      Thanks, Alex. We do what we can to entertain!

Add Comment Register

Leave a Reply