RESULTS

car.lm <- lm(log(Cost)~Miles, data=car)
b <- coef(car.lm)
mylm <- lm(Cost~Miles, data=car)
confintv <- exp(predict(car.lm, interval="confidence"))
predintv <- exp(predict(car.lm, interval="prediction"))

buy_point <- data.frame(Miles=35.239, Cost=14.971, generation='2013-17')
sell_point <- data.frame(Miles=75, Cost=15.83134, generation='2013-17')

ggplot(car, aes(x=Miles, y=Cost, color=generation)) +
  geom_ribbon(aes(ymin=confintv[,2], ymax=confintv[,3]),
              alpha=0.1, fill='skyblue', color='skyblue3') +
  geom_ribbon(aes(ymin=predintv[,2], ymax=predintv[,3]),
              alpha=0.1, fill='firebrick3', color='firebrick') +
  geom_point() +
  geom_text(aes(label='Selling Point, $15,831'),
            x=105, y=17, color='hotpink') +
  geom_text(aes(label='Buying Point, $14,971'),
            x=22, y=16, color='skyblue') +
  geom_segment(data=buy_point, xend=75, yend=15.83134) +
  geom_point(data=buy_point, size=3, color='skyblue2', ) +
  stat_function(fun=function(x) exp(b[1]+b[2]*x), aes(color='log(Cost)')) +
  geom_point(data=sell_point, size=3, color='hotpink') +
  theme_bw() +
  labs(
    title="Honda Accord Offers on KSL.com",
    y="Sales Price (in thousands of $)",
    x="Miles (in thousands of mi)"
  )

What car to buy and when to sell?

There is currently an offer for a 2017 Honda Accord EX with Honda Sensing with 35,239 miles on KSL for $14,971. Buying this car, and selling it when reached 75,000 miles would allow you to drive it for 39,761 miles and gain \(\approx 2\) cents per mile on its selling price. If you wished to sell it for about the same price you bought it for, you could drive it till about 85,000 miles, or if you wanted to flip the car, you could buy it and sell it for around $19,900, giving a profit of \(\approx 5k\). If you decided to drive it as long as you could without running into major problems, it would be best to sell it before it reached 150k miles, as it was very difficult to find Honda Accords for sale after that mile range suggesting they tend to break down or cause problems past that point.

Regression Interpretation

Using our model, we see that the average Honda Accord’s sales price appear to decrease by 0.57% for each set of 1,000 miles driven, and the predicted value for a new Honda Accord is approximately $24,264.

DATA

Data was collected from KSL Cars by filtering searches for Honda Accords between 2013 and 2022, and pulling 5 cars from every interval of 10,000 miles (0-10, 10-20…). The year range of 2013-2022 was prioritized because it contains 2 generations of Honda Accords (see cars.com); this would allow us to filter for just one generation and see if there was differing trends between the two generations. The EX trim, and other variants of it, were prioritized in data collection, however when 5 EX trims couldn’t be found in an interval, other trims were recorded. With that being said, this model is most accurate when predicting Honda Accords with the EX trim.

datatable(car, options=list(lengthMenu = c(5,10,30)), extensions="Responsive")

REGRESSION

Hypotheses

We believe there is a linear relationship between the log of the sales price of a Honda Accord, and its miles. We can trust that the log of the sales price will bring a good linear regression based on the Box-Cox suggestion for this model:

boxCox(mylm)

With that in mind, we believe our data will generally follow the following relationship:

\[ \log(\underbrace{Y_i}_\text{Sales Price}) = \overbrace{\beta_0}^\text{y-int} + \overbrace{\beta_1}^\text{slope} \underbrace{X_i}_\text{Miles} + \epsilon_i \quad \text{where} \ \epsilon_i \sim N(0, \sigma^2) \]

This relationship can be simplified to:

\[ \underbrace{Y_i}_\text{Sales Price} = \overbrace{e^{\beta_0}}^\text{y-int} (\overbrace{e^{\beta_1}}^\text{slope})^{\overbrace{X_i}^\text{Miles}}e^{\epsilon_i} \quad \text{where} \ \epsilon_i \sim N(0, \sigma^2) \]

Both the \(e^{\beta_0}\) and \(e^{\beta_1}\) are of interest, as we care about how Sales Price changes depending on the change of Miles, and our y-intercept has a valid and interesting interpretation; what is the predicted cost of a new Honda Accord?

Hypotheses:

\[ H_0: \beta_0 = 0 \ \text{and} \ \beta_1 = 0 \\ H_a: \beta_0 \neq 0 \ \text{and} \ \beta_1 \neq 0 \]

This analysis will have a significance level of \(\alpha = 0.05\).

Linear Regression

ggplot(car, aes(x=Miles, y=log(Cost))) +
  geom_point(aes(color=generation)) +
  geom_smooth(method='lm', formula=y~x, se=F) +
  theme_bw() +
  labs(title='Regression Viewed with the Log Transformation',
       x='Miles (in thousands of mi)',
       y='log( Sales Price (in thousands of $) )')

At a first glance, using the log(Sales Price) appears to give the data a pretty linear relationship. This will be validated by our diagnostics.

summary(car.lm) %>% 
  pander()
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.189 0.04241 75.2 2.234e-72
Miles -0.0057 0.0004682 -12.17 1.947e-19
Fitting linear model: log(Cost) ~ Miles
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
77 0.1881 0.6639 0.6595

Our p-values for slope and intercept are both below our significance level of 0.05, showing that there is a relationship between Miles and the log of Sales Price, and that we can reject our null hypothesis.

Interpretation

\[ \underbrace{\hat{Y}_i}_\text{Predicted Sales Price} = \overbrace{e^{3.189}}^\text{y-int} (\overbrace{e^{-0.0057}}^\text{slope})^{\overbrace{X_i}^\text{Miles}} \\ \underbrace{\hat{Y}_i}_\text{Predicted Sales Price} \approx 24.264 \ (0.994)^{\overbrace{X_i}^\text{Miles}} \]

As stated earlier, we will be interpreting our approximated values of \(e^{\beta_0}\) and \(e^{\beta_1}\), which are \(e^{b_0}\) and \(e^{b_1}\).

\[ e^{b_0} = 24.26415 \\ e^{b_1} = 0.9943162 \]

\(e^{b_0}\) tells us that the average new Honda Accord is approximately $24,264. \(e^{b_1}\) tells us that for each set of 1,000 miles driven in a Honda Accord, the average selling price for that vehicle decreases by 0.57%.

Diagnostics

par(mfrow=c(1,3))
plot(car.lm, which=1:2)
plot(car.lm$residuals)

Linear Regression Assumptions:

  • Linear Relationship: The residuals vs fitted shows a pretty linear relationship; while there is a minor bend in the red line, the data does not seem to heavily obey that relationship.
  • Normal Errors: Our QQ-plot appears to hold closely to the line, meaning we can assume our data follows a pattern of normality.
  • Constant Variance: Looking at the residuals vs fitted plot, we don’t see a pattern or major change in variance.
  • Fixed x-values: We can assume that our data from KSL.com provides accurate car data.
  • Independent Error: We don’t see any trends in our residual vs order plot; our errors are independent.

Yeet