为什么R-adj r squaredd会出现负值

statistics - What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least squares regression? - Stack Overflow
Join the Stack Overflow Community
Stack Overflow is a community of 6.8 million programmers, just like you, helping each other.
J it only takes a minute:
Could someone explain to the statistically naive what the difference between Multiple R-squared and Adjusted R-squared is?
I am doing a single-variate regression analysis as follows:
v.lm &- lm(epm ~ n_days, data=v)
print(summary(v.lm))
lm(formula = epm ~ n_days, data = v)
Residuals:
-693.59 -325.79
Coefficients:
Estimate Std. Error t value Pr(&|t|)
(Intercept)
&2e-16 ***
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 410.1 on 28 degrees of freedom
Multiple R-squared: 0.1746,
Adjusted R-squared: 0.1451
F-statistic: 5.921 on 1 and 28 DF,
p-value: 0.0216
27.9k27126252
25.2k167193
The "adjustment" in adjusted R-squared is related to the number of variables and the number of observations.
If you keep adding variables (predictors) to your model, R-squared will improve - that is, the predictors will appear to explain the variance - but some of that improvement may be due to chance alone.
So adjusted R-squared tries to correct for this, by taking into account the ratio (N-1)/(N-k-1) where N = number of observations and k = number of variables (predictors).
It's probably not a concern in your case, since you have a single variate.
Some references:
6,42812336
The Adjusted R-squared is close to, but different from, the value of R2. Instead of being based on the explained sum of squares SSR and the total sum of squares SSY, it is based on the overall variance (a quantity we do not typically calculate), s2T = SSY/(n - 1) and the error variance MSE (from the ANOVA table) and is worked out like this: adjusted R-squared = (s2T - MSE) / s2T.
This approach provides a better basis for judging the improvement in a fit due to adding an explanatory variable, but it does not have the simple summarizing interpretation that R2 has.
If I haven't made a mistake, you should verify the values of adjusted R-squared and R-squared as follows:
s2T &- sum(anova(v.lm)[[2]]) / sum(anova(v.lm)[[1]])
MSE &- anova(v.lm)[[3]][2]
adj.R2 &- (s2T - MSE) / s2T
On the other side, R2 is: SSR/SSY, where SSR = SSY - SSE
SSE &- deviance(v.lm) # or SSE &- sum((epm - predict(v.lm,list(n_days)))^2)
SSY &- deviance(lm(epm ~ 1)) # or SSY &- sum((epm-mean(epm))^2)
SSR &- (SSY - SSE) # or SSR &- sum((predict(v.lm,list(n_days)) - mean(epm))^2)
R2 &- SSR / SSY
17.8k864104
The R-squared is not dependent on the number of variables in the model. The adjusted R-squared is.
The adjusted R-squared adds a penalty for adding variables to the model that are uncorrelated with the variable your trying to explain. You can use it to test if a variable is relevant to the thing your trying to explain.
Adjusted R-squared is R-squared with some divisions added to make it dependent on the number of variables in the model.
3,48922032
Note that, in addition to number of predictive variables, the Adjusted R-squared formula above also adjusts for sample size.
A small sample will give a deceptively large R-squared.
Ping Yin & Xitao Fan, J. of Experimental Education 69(2): 203-224, "Estimating R-squared shrinkage in multiple regression", compares different methods for adjusting r-squared and concludes that the commonly-used ones quoted above are not good.
They recommend the Olkin & Pratt formula.
However, I've seen some indication that population size has a much larger effect than any of these formulas indicate.
I am not convinced that any of these formulas are good enough to allow you to compare regressions done with very different sample sizes (e.g., 2,000 vs. 200,000 the standard formulas would make almost no sample-size-based adjustment).
I would do some cross-validation to check the r-squared on each sample.
Your Answer
Sign up or
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Post as a guest
By posting your answer, you agree to the
Not the answer you're looking for?
Browse other questions tagged
rev .25289
Stack Overflow works best with JavaScript enabledR-squared - Minitab
In This TopicWhat is R-squared?
R2 is the percentage of response variable variation that is explained by its relationship with one or more predictor variables. Usually, the higher the R2, the better the model fits your data. R2 is always between 0 and 100%. R-squared is also known as the coefficient of determination or multiple determination (in multiple linear regression).Graphical illustration of R-squared
You can plot observed values by fitted values to graphically illustrate R2 values for regression models.
Plots of Observed Responses Versus Fitted ResponsesThe first regression model explains 85.5% of the variance while the second one explains 22.6%. The more variance that is explained by the regression model the closer the data points will fall to the fitted regression line. Theoretically, if a model could explain 100% of the variance, the fitted values would always equal the observed values and, therefore, all the data points would fall on the fitted regression line. What is R-squared adjusted?
R2 adjusted is the percentage of response variable variation that is explained by its relationship with one or more predictor variables, adjusted for the number of predictors in the model. This adjustment is important because the R2 for any model will always increase when a new term is added. A model with more terms can seem to have a better fit because it has more terms.
Use R2 adjusted to determine how well the model fits your data when you want to adjust for the number of predictors in the model. The adjusted R2 value incorporates the number of predictors in the model to help you choose the correct model.
Example of R-squared adjusted
For example, you work for a potato chip company that examines the factors which affect the percentage of crumbled potato chips per container. You include the percentage of potato relative to other ingredients, cooling rate, and cooking temperature as predictors in the regression model. You receive the following results as you add the predictors in a forward stepwise approach:
Cooling rate
Cooking temp
R2 adjusted
Regression p-value
The first step yields a statistically significant regression model. Adding the second term, you see that the adjusted R2 has increased indicating that "cooling rate" has improved the model. You add the third term, cooking temperature, and while the R2 increases, the adjusted R2 does not. Because cooking temperature has not improved the model, you might consider removing it from the model. What is predicted R-squared?
Use predicted R2 to determine how well the model predicts responses for new observations. Larger values of predicted R2 indicate models of greater predictive ability.
Predicted R2 is calculated by systematically removing each observation from the data set, estimating the regression equation, and determining how well the model predicts the removed observation. Predicted R2 ranges between 0 and 100% and is calculated from the PRESS statistic.
Predicted R2 can prevent over-fitting the model and can be more useful than adjusted R2 for comparing models because it is calculated using observations not included in model estimation. Over-fitting refers to models that seem to explain the relationship between the predictor and response variables for the data set used for model calculation but fail to provide valid predictions for new observations.
Example of predicted R-squared
For example, you work for a financial consulting company and are developing a model to predict future market conditions. The model looks promising because it has an R2 of 87%. However, when you calculate the predicted R2 you see that it drops to 52%. This might indicate an over-fitted model and indicates that your model will not predict new observations as accurately as it fits your existing data. Englishfran?aisDeutschportuguêsespa?ol日本語???中文(简体)By using this site you agree to the use of cookies for analytics and personalized content.&&当前位置:>>新闻资讯>>正文
三花微通道并购美国R-Squared公司
11:23:06 &
来源:产业在线ChinaIOL &
日,遥远的美国团队传来喜讯,经过9个月紧张高效的沟通、考察与艰苦谈判,三花控股集团斥资1000万美元正式收购了美国R-Squared Puckett Inc.公司100%的股权,并于日正式向外公告。该并购事项跨出了三花控股集团国际化生产制造海外布点的历史性第一步,为微通道以及其他相关产品在北美战略市场的进一步拓展奠定了基础。
这次被三花并购的公司位于美国密西西比州Puckett市,是当地一家精密传热系统的制造厂家,拥有在此行业中多年的历史,具备健康的业务与良好的团队。并购实施后北美项目团队在公司倪晓明总经理的带领下,迅速介入该公司的经营与整合工作,各方面推进顺利,在确保该公司正常经营的同时有效组织与杭州总部企业的取长补短,协同提高。该公司的收购,将进一步强化三花微通道在北美市场的战略地位,更好的服务北美战略市场,该项目的成功并购及运行,是三花实现国际化制造布点的关键一步,也是三花发展成为真正的全球供应商的重要一步,具有重要的战略意义。
关键词:&&&&&&&& (编辑:)
版权说明:凡来源标注为“产业在线ChinaIOL”的信息、数据及图片内容、报告及目录均为本网原创,著作权受我国法律保护。如需转载,请注明“来源:产业在线”。约稿或长期合作,请联系本网。
2012年中国稀土金属、稀土氧化物出口量下降,出口平均价格逐月走低,本专题将对金属...
进入2013年,虽然美国避免财政政策坠入悬崖最终达成协议,但美联储部分官员对量化宽松...
观察 · 原创

我要回帖

更多关于 r squared怎么算 的文章

 

随机推荐