Setting 1: Yoplait Yogurt Advertising
For this problem, you will need to download and install the package Ecdat if you have not done so already. If you do not have Ecdat installed, run the following command and follow the instructions. install.packages (“Ecdat”, dependencies = T) Once you have it installed, run the following commands to load the data for this problem into R. library(Ecdat) data (Yogurt) Once you run these commands, R will contain a data frame object called Yogurt. These data are from a study by Jain, Vilcassim and Chintagunta that uses more sophisticated methods than we willproy in this question. For us, the data set is an interesting setting in which to apply multiple linear regression.
Alter loading the data into R, produce summary statistics of the data (mean, standard deviation, minimum, maximum for each numerical variable; counts for each categorical variable). Produce a nice table to display these summary statistics. Comment briefly on any striking patterns.
Compute the summary statistics for each numerical variable for each level of the categorical
variable choice. Comment briefly on any systematic differences in these variables across the levels of this categorical variable.
Consider a multiple regression model of the form
| Y=X'B +P'y+U
where X is a vector that contains a constant and a full set of the featured advertisement dummy variables, P is a vector that contains the prices for each of the brands of yogurt and Y is a dummy variable for whether the individual bought Yoplait.
(a) Without looking at the data, do you expect there to be heteroskedasticity in this regressionmodel? Explain precisely why or why not.
(b) Does including every featured advertisement dummy variable in the regression lead tomulticollinearity? Explain why or why not and articulate the importance of avoiding perfect multicollinearity.
(c) Use R to produce multiple regression estimates for this regression model. Report theestimates (and standard errors in parentheses) in the first column of a nicely-formattedtable of results. (d) Suppose that we are willing to assume that the conditional expectation E (Y|X,P) is linear.
What does this imply about E (U|X,P]? What does it imply about (?)?
(e) Examine the coefficient estimate on the dummy variable feat.yoplait, Bf.y.
Give an interpretation for this coefficient estimate. Be precise and give context. ii. Is feat. yoplait statistically significant at the 5 percent level? Provide a formal
hypothesis test. iii. Use the summary output in R to construct a 95 percent confidence interval for Bf.v.
Give a precise interpretation of this confidence interval that uses context. (f) Estimate the regression model without the regressor feat.weight. Report the estimatesfrom this reduced regression model in the second column of your table of results.
Relative to the estimates you produced in part (c), how do the coefficient estimateschange? Explain why the coefficient estimates change in the context of multiple regression.
1A Random-Coefficients Logit Brand-Choice Model Applied to Panel Data Dipak C. Jain, Naufel J. Vilcassim and Pradeep K. Chintagunta Journal of Business & Economic Statistics Vol. 12, No. 3 (Jul., 1994), pp. 317-328
(g) Does the existence of a featured advertisement of another brand of yogurt relate to theprobability that an individual purchases Yoplait brand yogurt?
Within the context of the regression model in (1), formally state the null and alternative hypotheses that are appropriate for answering this question. ii. What is an appropriate test statistic to use to conduct this hypothesis test? Beforecomputing this test statistic, state your decision rule. iii. Compute the test statistic and the p-value for this test using both Stata and R. Whatcan you conclude from this hypothesis test? (h) The variable id is an individual identifier. There are 100 individuals in the data set andrepeated observations for each individual. This may or may not be a problem.
Plot the residuals versus id. Do you notice any patterns? ii. Compute the mean of the residuals for each id. Hint: In R, the command tapply(a,b, mean) computes the mean of a for each value of b. Why is the mean of thesemeans not zero? iii. Produce a histogram of these 100 mean residuals by id. Based on this histogramand the context of the problem, what can you conclude? Are any assumptions of the multiple regression model violated?
Now, consider a multiple regression model of the form
Y=8; + X'B+P'y+U
(2)where d; is an id-specific intercept, X is a vector that contains full set of the featured advertisement dummy variables, P is a vector that contains the prices for each of the brands of yogurt and Y is a dummy variable for whether the individual bought Yoplait.
(a) Before estimating the regression specification, do you expect to obtain the same estimatesfor as you did in part 3? Explain precisely why you expect what you expect.
(b) Estimate this regression model in R. To estimate a model with 100 id-specific intercepts,coerce id to be a factor using the as.factor() function and include the coerced variable as an explanatory variable (now a factor) in addition to the predictors you used from before.
Report the coefficient estimates and standard errors for ) in the fourth columnof your table of results. Do not report the 100 intercept coefficient estimates. How dothese estimates compare with the ones you produced in part 3? ii. Produce a histogram of the 100 id-specific intercept estimates. How do these estimates compare with the histogram of mean residuals you computed in part 3? Shouldthey be similar to one another? Should they be correlated? Why or why not?
(c) Are there significant individual-specific factors that affect the probability of purchasingYoplait yogurt?
In the context of the regression model in (2), state null and alternative hypothesesappropriate for answering this question.
ii. What is an appropriate test statistic?
iii. Carry out this test using R. What is the value of the test statistic? What is the p-value?
What do you conclude? Da (d) Imagine that after you explain the regression specification in (2) to a friend of yours whostudies philosophy, your friend says, “Some people are more frugal than others when it comes to yogurt. Isn't this a problem for you?” Respond thoughtfully to this question in plain English
Now, consider a multiple regression model of the form
Y=8; + X'B+P_X_y + Pyoplait Q; +U
(3)where di is an id-specific intercept, Qi is an id-specific slope on the price of Yoplait, X is a vector that contains full set of the featured advertisement dummy variables, P-y is a vector that contains the prices for each of the brands of yogurt except for Yoplait and Y is a dummy variable for whether the individual bought Yoplait,
(a) In words, explain how to use statistical software to estimate the regression model in (3). (b) Use R to estimate this regression model. You should discover that R drops some id xprice. yoplait interactions due to "singularities.” What does this mean?
Summarize your findings in a typed report.
Setting 2: YouTube Partner Advertising
The data set, adsensedata.csv, contains daily observations on a YouTube channel's daily earnings (Earnings), number of clicks on advertisements (Clicks) and number of page impressions for those advertisements (Impressions). In the data, we also have daily observations broken down by ad payment type: pay per impression (PPI) or pay per click (PPC). The data are separately broken down by ad format: Video, Image, Flash and Text. For this reason, there are seven sets of variables.
Summarize the data in a useful format. [-2 tables, ~2-3 figures] Using the tools we developed
in the course, present means, standard deviations, five number summaries, and useful plots to get a sense of the variation contained in the data. Remember to keep the focus on earnings, and the determinants of earnings. The more your summary statistics are focused on this goal, the better.
Single Regression Analysis [1 to several tables] Use a series of single linear regressions toexplain what determines EarnPPI and EarnPPC. Do the number of clicks relate to pay perimpression earnings? Interpret what this means in the context of the setting.
3. Multiple Regression Analysis (several tables] Consider the following statistical model of payper-impression earnings:
EarnPPI = Bo+B1ClicksPPI +B2ImprPPI+e
before estimating the multiple regression, ask what it means to introduce an additional explanatory variable:
(a) If the earnings are truly paid per impression and no other factors influenced pay-perimpression advertisement revenue, what do you expect to be the coefficient on ClicksPPI?
(b) If clicks have no effect on pay-per-impression advertisements, what would be the effect ofincluding ClicksPPI in the regression? (c) Now that you have given it some thought, estimate this regression specification using OLS,using the right standard errors.
Is ClicksPPI is statistically significant? Explain the implications for this statisticalresult.
Repeat (3) for the following statistical model of pay-per-click earnings:
EarnPPC = Bo+Bi ClicksPPC +B2ImprPPC+e
(5)In reality, PPI advertisements and PPC advertisements compete in an auction to determine whichads are placed on the YouTube channel. The advertisement with the highest bid in terms of cost per thousand impressions wins the auction. Given this fact, how would you interpret the following regression models for PPI and PPC earnings?
EarnPPI = Bo+Bi ClicksPPI+B2ImprPPI+B3ClicksPPC +34ImprPPC+e EarnPPC = Bo+Bi ClicksPPI+BzImprPPI+B3 ClicksPPC +B41mprPPC+e
(6) (7)Estimate these two regression models and interpret the output in light of a reasonable economic model that clicks and impressions determine YouTube Partner earnings.
The data also contain information on advertising format. Estimate the following statisticalmodel of total earnings:
(8)Earnings = Bo + BiImprVideo + B2Imprimage + B3ImprFlash + B4ImprText
+35Clicks Video + B6ClicksImage +B-ClicksFlash+ B8 ClicksText +e
using OLS, where Earnings is total channel earnings.
(a) What do you learn from these regression results? Which ad formats seem to pay the mostand for what? (b) For comparison, estimate the statistical relationship of Earnings to (Impressions, Clicks)within advertising format type. That is, estimate four different regressions of the form
S
EarningsType = Bo + BilmprType + B2ClicksType te
where Type E {Video, Image, Flash, Text}. Report OLS estimates in a table for easy comparison. Use your estimates to discuss how the relationship between earnings, impressions and clicks differs across advertising types.
clich
Summarize the analysis. In particular, what conclusions would you offer to this YouTube partner? YouTube reports X = carinsons and W = earnings as the payoff to the YouTube parner of impressions and clicks, respectively. How does the regression analysis of the YouTube data from this partner's earnings improve upon these simple statistics?
1,212,718Orders
4.9/5Rating
5,063Experts
Turnitin Report
$10.00Proofreading and Editing
$9.00Per PageConsultation with Expert
$35.00Per HourLive Session 1-on-1
$40.00Per 30 min.Quality Check
$25.00Total
FreeGet
500 Words Free
on your assignment today
Get
500 Words Free
on your assignment today
Request Callback
Doing your Assignment with our resources is simple, take Expert assistance to ensure HD Grades. Here you Go....
🚨Don't Leave Empty-Handed!🚨
Snag a Sweet 70% OFF on Your Assignments! 📚💡
Grab it while it's hot!🔥
Claim Your DiscountHurry, Offer Expires Soon 🚀🚀