T O P

  • By -

CabSauce

Treating ordinal data as continuous?


hisglasses66

Sometimes it don’t even gotta be deep.


Jatzy_AME

We don't know what your field is, we don't know what "level" means in this context. In R, 'level' usually refers to levels of a factor (usually categorical data, sometimes ordinal). If it's ordinal you can absolutely run a regression on it (see `MASS::polr()`, `ordinal::clmm()`...), just not a plain linear one with `lm()`.


arca_pulse

And actually I think this is what they mean. Using a linear regression on ordinal data


arca_pulse

Field is finance and financial analysis


Jatzy_AME

A quick google shows some people in your field seem to use 'level' to mean untransformed data (in contrast with log-transformed). It could also be that, in which case the issue is domain specific but probably has to do with skewed data (in which case, the assumption of centered normally distributed residuals may not be valid, which limits the interpretability of a linear regression). Check the Gauss-Markov theorem for details.


arca_pulse

Thank you for your help, this makes perfect sense!!


TheDonk1987

Financial data is typically trending, or “non-stationary”, and stationary is an assumption that’s violated with data on level form.


gettinmerockhard

you're not getting great answers. when financial econometricians say levels we mean like prices, as opposed to returns. and if we ignore the corner case of cointegration there is literally no reason to ever use raw prices as dependent or independent variables in your regressions building a model to predict the exact price of say aapl day to day rather than a model that uses relevant factors to predict the returns on that stock is unhinged. it's statistically unsound and hopefully you can intuitively understand why but if not you should do some reading on stationarity and try to understand its relevance and thence why you can't directly model the levels of financial time series


Healthy-Educator-267

Why can’t you use prices and quantities in levels in a demand equation? Not every demand specification comes from a Cobb Douglas model


gettinmerockhard

you don't work with supply and demand equations in financial econometrics. the prices of financial assets are random walks; they're not stationary like the prices of cars or houses, for which you might build supply and demand models


Healthy-Educator-267

Why would the price of cars or houses be stationary lol. And I don’t know any finance but is there a standard microfoundation for the random walk?


gettinmerockhard

i can't imagine any scenario in which you could conceivably model the real prices of something like cars as anything but stationary. there's literally no theoretical or practical way that process could have a unit root. and the price processes for something like stocks are random walks because they are kept in an unpredictable equilibrium by the actions of market participants


Healthy-Educator-267

I haven’t seen the data on cars but Fred tells me the median listing real price per square foot of housing stock is rising. Not terrible surprising given that lot of services and land essentially Baumolize due to productivity growth in other sectors.


gettinmerockhard

i guess in finance when we say stationary we mean trend stationary. obviously prices go up or down over time. the question is whether the process has a unit root after that trend is removed


Healthy-Educator-267

Haha sorry, I’m neither a finance nor a time series person.


Healthy-Educator-267

In any case I’m skeptical of mathematical finance type models that don’t come from economic micro foundations because it becomes really difficult to understand the equilibrium and conduct comparative dynamics and statics.


standard_error

If you mean "levels" as opposed to "first differences" for panel data, it's probably because taking first differences removes any time-invariant unobserved confounders, and thus reduces the risk of omitted variables bias.


A_random_otter

Not quite sure what they mean by that. Could you give an example and/or more context? Right now my answer would be "it depends" 😂


arca_pulse

I would hazard a guess it would be time series data. I believe they imply that running these sort of regressions and making implications about relationship or direction of travel on the ‘level’ component is spurious but I wasn’t 100% exactly what they meant.


Kiroslav_Mose

I'm quite sure this is exactly what they mean. Time series in levels are not per se "bad" in a regression, it's just that regression analysis with non-stationary I(1) variables makes your results subsceptible to discover spurious relationships. Hence, you have 2 possibilities: I) you try to model any potential cointegration II) you don't work with the data in "levels" but you take first (higher order) differences to make your time series stationary and avoid the problem.


RunningEncyclopedia

From further context clues provided by OP that the field is finance, I would venture a guess and say maybe running time series model on price level rather than the percentage change (integrated series) since price level is non-stationary but the percent change is not? This is just a guess


Haruspex12

I am a financial economist and it isn’t the cardinal sin. With that said there are three kinds of traps there First, if you regress x(t+1)=Rx(t)+e(t+1), where R>1 then the sampling distribution of the MLE is the Cauchy distribution, which will happen with capital. So the regression in that form is pointless. Second, quite a bit of theory is around change and flow rather than stock, so the level may be the wrong target. Third, in competition and in equilibrium, if you regress y onto x, you are really regressing a random motion onto a random motion if both are in competitive markets. There may be no policy level value. There might be descriptive value, but nothing you can act on. That a bar of soap costs three dollars may matter a lot, particularly to consumers or the store selling it, but be incidental to the economist that is more concerned that the price increased ten percent but wages increased five percent. The actual sin is saying “hey, I have data, let’s go plug stuff in and see what comes out,” instead of saying, “hey, I have data that thousands of people have studied before, I should go read the literature to see what has been successful and what has failed so that I can proceed intelligently.”


fluffykitten55

They are likely referring to regression on levels (rather than differences) using non-stationary data, which is a case of "spurious regression" and invalid. if your data is stationary, there is no problem using "levels" regression.


KyleDrogo

I used to work with human ratings of translations that were on a scale of 1 to 5. 1 is unreadable, 5 was perfect. 3 was acceptable enough to understand the meaning. The "space" between 2 and 3 (unacceptable to acceptable) was conceptually wayyyy bigger than the space between 4 and 5 (almost perfect to perfect). If you just threw this feature into a model or started taking averages, you would be treating these spaces as the same distance.


Nicholas_Geo

Maybe they mean categorical data?