Standardized Tests: Have we gone too far?

stevendaryl · May 9, 2015

HomogenousCow said:

The simple truth is that most high school students do not care about the intrinsic value of whatever they are being taught (assuming there is any, I'm looking at you Samuel Beckett). There seems to be some grand fantasy that students are inquisitive angels, oppressed by "The Man" and his weapon of choice-the SAT.

I think that a certain amount of drive to understand does exist in most (all?) people. Outside of the realm of academics, people are driven to understand how to win video games, they are driven to understand why Dumbledore told Snape to kill him, they are driven to understand why this batch of cookies turned out worse than the last batch. People really do have a drive to understand. The problem (for school purposes) is that the average person, if led by his own curiosity, would get around to understanding how to do calculus some time like never. What most people are naturally curious about isn't what schools want to teach.

Andy Resnick · May 9, 2015

stevendaryl said:

I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant? <snip>

It goes to the most basic reason for tests: the purpose of a test is to measure "something" (more on this later). So you start with some average score and then try and figure out interventions that result in higher test scores. And this is the basic problem- trying to establish statistically significant results showing how some specific intervention (just in time teaching, flipped classroom, thank-pair-share, problem-based learning, etc.) results in improved test scores. That's done (at best) by having the identical instructor teach multiple sections, some of which are negative controls and some are with the intervention. That assumes that every section is made of interchangeable students, for example. And then the process has to be repeated to ensure reproducibility. And then the intervention has to be performed at different schools, using different teachers. In the end, the statistical error associated with the study results is typically as large as the effect.

Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

http://blogs.wsj.com/economics/2014...me-inequality-how-wealthier-kids-rank-higher/

stevendaryl · May 9, 2015

Andy Resnick said:

It goes to the most basic reason for tests: the purpose of a test is to measure "something" (more on this later). So you start with some average score and then try and figure out interventions that result in higher test scores. And this is the basic problem- trying to establish statistically significant results showing how some specific intervention (just in time teaching, flipped classroom, thank-pair-share, problem-based learning, etc.) results in improved test scores. That's done (at best) by having the identical instructor teach multiple sections, some of which are negative controls and some are with the intervention. That assumes that every section is made of interchangeable students, for example. And then the process has to be repeated to ensure reproducibility. And then the intervention has to be performed at different schools, using different teachers. In the end, the statistical error associated with the study results is typically as large as the effect.

Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

I'm not going to argue about whether there even is such a thing as "aptitude", much less whether SAT measures it. However, it seems to me that if someone is tested on vocabulary (for instance), you can find out whether that person knows what a word means. If someone is tested on solving algebraic equations in one variable, you can find out whether they know how to do that. It certainly may be the case that finding out that someone doesn't know how to solve an equation in one variable doesn't necessarily say what should be done about it.

I see your first paragraph as simply about the difficulties of figuring out effective teaching methods. That certainly is a hard problem, but it seems orthogonal to the issue of testing.

stevendaryl · May 9, 2015

stevendaryl said:

I'm not going to argue about whether there even is such a thing as "aptitude", much less whether SAT measures it. However, it seems to me that if someone is tested on vocabulary (for instance), you can find out whether that person knows what a word means. If someone is tested on solving algebraic equations in one variable, you can find out whether they know how to do that. It certainly may be the case that finding out that someone doesn't know how to solve an equation in one variable doesn't necessarily say what should be done about it.

I see your first paragraph as simply about the difficulties of figuring out effective teaching methods. That certainly is a hard problem, but it seems orthogonal to the issue of testing.

Just to make it clear, I'm not in favor of tests as "measurements". I don't think that an overall numeric result, from 0 to 100 or 0 to 1500 (or whatever the range is for SATs) means much at all. But the fact that a student is able or unable to answer specific questions certainly is meaningful. I favor tests as diagnostics or assessment, not as measures of quality of the student.

Andy Resnick · May 9, 2015

stevendaryl said:

<snip>I see your first paragraph as simply about the difficulties of figuring out effective teaching methods. That certainly is a hard problem, but it seems orthogonal to the issue of testing.

But you would hopefully agree that there should be a reason to test and that the test should evaluate how effectively students achieve some specified learning objective.

Learning involves much more than rote memorization. A good example of a (relatively) new standardized test is the Force Concept Inventory:

http://www.flaguide.org/tools/diagnostic/force_concept_inventory.php

This is the key innovation: "Each question offers only one correct Newtonian solution, with common-sense distractors (incorrect possible answers) that are based upon student's misconceptions about that topic, gained from interviews."

This exam is an attempt to directly measure student learning, not simple recall.

rollingstein · May 9, 2015

PWiz said:

As an HS student, I can't agree more. I can feel the immense pressure of the need to perform well in these standardized tests. It's like everyone around me is emphasizing how much of a passport high scores are to a good college, but I can't help but feel that the focus is less on learning and more on blind analysis of performance under stress. Everyone is not built out of the same wood you know.

But this stress mimics real life situations pretty well. Most jobs you get will test your descision making skills under pressure.

So might as well embrace the testing stress & try to thrive & perform better under stress. And I can tell you that the skill is learned to a large extent. The more tests you give the less the stress will impact your scores adversely.

Yes, not everyone is built of the same wood, but that is partly what the tests are trying to discern.

stevendaryl · May 9, 2015

Andy Resnick said:

But you would hopefully agree that there should be a reason to test and that the test should evaluate how effectively students achieve some specified learning objective.

To the extent that there is a reason to teach something at all, there is a reason to see whether you've accomplished it. If you don't care whether a student learns arithmetic, why teach it?

Learning involves much more than rote memorization. A good example of a (relatively) new standardized test is the Force Concept Inventory:

http://www.flaguide.org/tools/diagnostic/force_concept_inventory.php

This is the key innovation: "Each question offers only one correct Newtonian solution, with common-sense distractors (incorrect possible answers) that are based upon student's misconceptions about that topic, gained from interviews."

This exam is an attempt to directly measure student learning, not simple recall.

Yeah, sure. Tests shouldn't test memorization. (Or at least shouldn't test ONLY memorization. It's possible that there is a benefit to committing some things to memory.)

rollingstein · May 9, 2015

stevendaryl said:

What most people are naturally curious about isn't what schools want to teach.

Should schools be teaching what kids are curious about or what is more likely to be useful to them to earn a living or to contribute to the skillset that society demands from them?

I think you are perfectly right that kids tend to be naturally driven & curious about certain things. But the whole point behind schooling & discipline is to teach people stuff they may not enjoy doing on their own but they ought to know. And funnily enough there are a range of activities that are not enjoyable in their initial learning curve that subsequently do become enjoyable & it takes something like school to take you through that initial uggh drudgery.

No one every enjoyed learning multiplication tables. But having learned them we now find them pretty useful.

I think one of the under-appreciated functions of school is to get us to endure the boredom of things that we do not like to do.

stevendaryl · May 9, 2015

rollingstein said:

Should schools be teaching what kids are curious about or what is more likely to be useful to them to earn a living or to contribute to the skillset that society demands from them?

I'm not making a claim about that, I'm just pointing out that the fact that students don't naturally want to lean what is taught in schools does not mean that they aren't naturally curious and driven to understand things. Just that there is a mismatch between what they are curious about and what schools teach. That might be inevitable.

rollingstein · May 9, 2015

stevendaryl said:

I'm not making a claim about that, I'm just pointing out that the fact that students don't naturally want to lean what is taught in schools does not mean that they aren't naturally curious and driven to understand things. Just that there is a mismatch between what they are curious about and what schools teach. That might be inevitable.

Ah ok. My bad then. I thought you were being critical of schools for teaching what they teach.

Andy Resnick · May 10, 2015

stevendaryl said:

Just to make it clear, I'm not in favor of tests as "measurements". I don't think that an overall numeric result, from 0 to 100 or 0 to 1500 (or whatever the range is for SATs) means much at all. <snip>

But this is, in fact, precisely what the function of a standardized test is: to provide numerical comparisons across the student population.

The only test I am aware of that is not associated with a numerical score is the Rorschach test.

JorisL · May 10, 2015

Andy Resnick said:

But this is, in fact, precisely what the function of a standardized test is: to provide numerical comparisons across the student population.

The only test I am aware of that is not associated with a numerical score is the Rorschach test.

I accept your point. But part of the video and discussion in this topic was related not only to the volume/usefulness of these tests but the quality.
As far as I can tell a lot/some of these tests require signing an agreement stating that you will not discuss the questions.
Someone also mentioned the problem with quality from personal experience with plain wrong questions in there.

Another point I find interesting is the example in the video where a student was expected to get more than 100% (486 points was the goal with only 483 points available I believe)
How can such expectations be assessed? Clearly the method of measurement as a whole has some serious flaws (using the language of physics).

At this point the discussion can take several directions e.g. are (well-designed) standardized tests useful, How can we assure well-designed tests, ...

rollingstein · May 10, 2015

JorisL said:

As far as I can tell a lot/some of these tests require signing an agreement stating that you will not discuss the questions.

Has anyone gotten sued for discussing questions from the SAT / ACT etc? Just curious. Otherwise I think we should just disregard those agreements as unenforceable BS boilerplate. I would like to see the legal precedent on this.

You can put what you want in an agreement but ultimately you've got to find a sympathetic court that will enforce it. I want to see a jury find a kid guilty for violating Pearson's test confidentiality.

JorisL said:

Someone also mentioned the problem with quality from personal experience with plain wrong questions in there.

What test doesn't have some wrong questions on it?!

JorisL · May 10, 2015

rollingstein said:

Has anyone gotten sued for discussing questions from the SAT / ACT etc? Just curious. Otherwise I think we should just disregard those agreements as unenforceable BS boilerplate. I would like to see the legal precedent on this.

You can put what you want in an agreement but ultimately you've got to find a sympathetic court that will enforce it. I want to see a jury find a kid guilty for violating Pearson's test confidentiality.

Well do you see people discussing questions anywhere? The post I referenced is post #28 by Fredrik, check it out. They hide behind the agreement.
And it works, which kid thinks about this stuff? They probably just keep it going long enough for the other party to either get fed up or out of money.

Point is its rotten, and if they feel their position is threatened, I'm certain something will happen.
Money makes things happen, remember?

rollingstein said:

What test doesn't have some wrong questions on it?!

And which teacher doesn't agree when you explain in detail why it is wrong? More importantly how can you without risk of prosecution get a second opinion?

Even when there are vague parts of a question you either ask for clarification or get back to the teacher afterwards if this is somehow impossible.

Finally, these are standardised tests.
Isn't the written test you take when getting your drivers license standardised? Here it is, I never ever heard of an error in the tests even right after changes in the law (it is easier for errors to slip in one would think).
Point is if a lot of students take these tests isn't it absolutely necessary to check, double-check, ... the tests?
Also I'm sure they have a lot of questions used for many years if not decades. Shouldn't those be 100% correct?

Fredrik · May 10, 2015

I should perhaps clarify that my experience isn't with the tests made for school kids. It's with a couple of professional certification exams (made by one of the Pearson companies). What they can do if you violate the agreement is to kick you out of the certification program. Now you have spent at the very least a few hundred dollars (possibly many thousands, if you took classes or bought hardware to practice on), and you're no longer certified, even though you passed the exam. I don't know what they would do if a high school kid would violate the confidentiality agreement.

Even if you want to violate the confidentiality agreement, it's pretty difficult to do that, especially for the kind of tests I did. It's difficult to remember the questions exactly, and you have to hand in all the notes you've made at the end of the exam. You're not even allowed to erase them. You're also not allowed to look at a question again once the test is over, not even to provide feedback about possible issues with a question. And you don't really have time to try to memorize the questions. The exam I did was extremely difficult to complete on time. You basically had to cheat, or remember what you did last time you took the test.

What I found especially bizarre about my experience was that they were completely unwilling to discuss any specific points I had made. Instead of trying to refute my arguments, they just said that there's nothing wrong with the test, even though I know for sure that some of the questions were bad.

It certainly seems to me that (at least with these professional exams), the rules are in place to ensure that they don't have to make the tests good.

stevendaryl · May 10, 2015

Andy Resnick said:

But this is, in fact, precisely what the function of a standardized test is: to provide numerical comparisons across the student population.

The only test I am aware of that is not associated with a numerical score is the Rorschach test.

If we are talking about SAT or ACT, then I agree with you that the point is to get a numerical score for the purpose of comparison between students. But is the word "standardized test" limited to those sorts of measurements?

To give you a counter-example: You can go to a website such as: http://www.sheppardsoftware.com/African_Geography.htm to test your knowledge of the countries in the continent of Africa. Now, you might object that there is no reason to know the names, locations and capitals of the countries in Africa, but it's just a simple example of knowledge that can be tested through a standardized test. The point of such a test is certainly NOT to compare the student to other students. It is NOT to come up with a numerical score: 0 to 100 (what percentage of the countries in Africa can you name). The point of such a test is to see if you DO know the countries in Africa. If you get them all right, then you do. If you miss even one then you don't. You can retake the test as often as you like, until you get 100%. Then you know all the countries in Africa (well, at least until you forget them).

To me, the proper goal of a test is to assess how well a student understands a subject. That can be done, at least with some subjects, using standardized tests.

Now, the SAT has all these questions that are not actually about understanding a subject, but seem to be some kind of measurement of mental fitness. It's been a long time since I've taken it, but back in the day, there were questions along the lines of:

Here is a sequence of pictures. Based on the pattern, what is the next picture in the sequence?

There were questions along the lines of:

Mustard is to hot dog as pickles are to what?

These questions were sort of interesting to me, because it was a challenge to figure out what the test-creators had in mind. I was pretty good at that sort of thing, but I'm not convinced that there is a strong point in asking those types of questions.

WWGD · May 10, 2015

stevendaryl said:

<Snip>:

Here is a sequence of pictures. Based on the pattern, what is the next picture in the sequence?

There were questions along the lines of:

Mustard is to hot dog as pickles are to what?

These questions were sort of interesting to me, because it was a challenge to figure out what the test-creators had in mind. I was pretty good at that sort of thing, but I'm not convinced that there is a strong point in asking those types of questions.

Ah, yes, what is _the_ next figure. If you do not fit into their narrow world/experiential view, you are wrong. Same with sequences of numbers, other than obvious ones like 1,2,3,4,... I am remembering the phrase " limit your imagination, keep you where they must".

Czcibor · May 10, 2015

Andy Resnick said:

Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

http://blogs.wsj.com/economics/2014...me-inequality-how-wealthier-kids-rank-higher/

(The article that you linked just shows that kids of richer parents get better score, without analysing any other factors like aptitude or heredity of IQ. I'm not saying that you are wrong, I'm merely pointing that the source that you linked does not prove your point)

Fredrik said:

I should perhaps clarify that my experience isn't with the tests made for school kids. It's with a couple of professional certification exams (made by one of the Pearson companies). What they can do if you violate the agreement is to kick you out of the certification program. Now you have spent at the very least a few hundred dollars (possibly many thousands, if you took classes or bought hardware to practice on), and you're no longer certified, even though you passed the exam. I don't know what they would do if a high school kid would violate the confidentiality agreement.

Even if you want to violate the confidentiality agreement, it's pretty difficult to do that, especially for the kind of tests I did. It's difficult to remember the questions exactly, and you have to hand in all the notes you've made at the end of the exam. You're not even allowed to erase them. You're also not allowed to look at a question again once the test is over, not even to provide feedback about possible issues with a question. And you don't really have time to try to memorize the questions. The exam I did was extremely difficult to complete on time. You basically had to cheat, or remember what you did last time you took the test.

What I found especially bizarre about my experience was that they were completely unwilling to discuss any specific points I had made. Instead of trying to refute my arguments, they just said that there's nothing wrong with the test, even though I know for sure that some of the questions were bad.

It certainly seems to me that (at least with these professional exams), the rules are in place to ensure that they don't have to make the tests good.

They behave even better than infallible beings that my gov put into examination boards. Except that now such boards in my country are being challenged as unconstitutional.

verty · May 10, 2015

Andy Resnick said:

Now, what does a specific test actually measure? Much ink has been spilled on this topic. For example, the strongest correlation with SAT scores is family income- not ''aptitude'.

http://blogs.wsj.com/economics/2014...me-inequality-how-wealthier-kids-rank-higher/

The data in that article is (are?) interesting...

Certainly I would think writing would scale with aptitude or intelligence. If one ignores spelling or simple grammar mistakes, the quality of writing is going to scale with ability. And writing is surely something that can't be standardized, it needs someone to mark it. So it seems reasonable prima facie to be able to look at the writing scores and read off how ability trends with wealth.

And looking at the data, writing scores ramp to $100k, are flat to $200k, and ramp thereafter. It's reasonable to assume those >$200k households have children who were privately tutored or have parents who are doctors, they are the geniuses and score very well. But we see that most middle class households score the same. I think it's reasonable to suppose that the sub $100k households are predominantly in poorer neighborhood and have schools that aren't as good, or that there could be a language bias in the writing scores for poorer households. So there's no real evidence from the writing scores that ability trends with wealth.

Reading is the same: a ramp to $100k, flat to $200k, ramping thereafter. It's clear the reading questions are sufficiently elementary that the same is true, any language bias only shows up in poorer households.

Math however is a consistent ramp, rising to the right. We know that this isn't measuring ability because it differs from the writing and reading scores. And math performance in general is contingent on quality of teaching/schooling. So for me it says more about the quality of the schools than any kind of proportion between ability with wealth.

So I see no evidence in the data that there is a trend between ability and wealth. ~~The claim that the SAT is more a measure of affluence than ability would seem to be on point.~~

I apologize, the math score is not a good measure of ability but the reading and writing scores seem to be pretty good; a decent score on the reading section is quite a reliable indicator of ability. Perhaps this is more toward the concept of emotional IQ.

RaulTheUCSCSlug · May 10, 2015

Interesting since one of the students in my friends class got stressed during class (during a proof on logic or something like that) and vomited. But we only take at most two midterms, and one final. I think the stress on a person depends on the student, rather than what the student is doing. What's the difference between standardized tests, and regular chapter tests like the ones I did in high school?

Also common core is fairly new. Of course the students that have to join the program abruptly will suffer, but students that will grow up with the common core system may do better in things like math and science. At least that is the idea.

Andy Resnick · May 11, 2015

stevendaryl said:

<snip>

To give you a counter-example: You can go to a website such as: http://www.sheppardsoftware.com/African_Geography.htm to test your knowledge of the countries in the continent of Africa. <snip> It is NOT to come up with a numerical score: 0 to 100 (what percentage of the countries in Africa can you name). <snip>

I don't understand your point- taking those 'tests' absolutely results in a numerical score. Your comments regarding the SAT underscore my point that there is only partial agreement about how 'learning outcomes' can be tested in the first place. How does one design a test to evaluate how well a student has learned to fashion a logical argument? To critically read an editorial column?

This thread is about 'standardized tests'- not 'testing'.

stevendaryl · May 11, 2015

Andy Resnick said:

I don't understand your point- taking those 'tests' absolutely results in a numerical score.

Yes, but the numerical score is for the benefit of the test-taker. The point of those self-tests is to get 100%. The scores are not for comparison between students.

JoeyCentral · May 11, 2015

jbunniii said:

Which shows that the correct answer is ##7 \times 8 = 8! / 6!##

You could also denote it as:

∫⁷₀8dx

Andy Resnick · May 12, 2015

stevendaryl said:

Yes, but the numerical score is for the benefit of the test-taker. The point of those self-tests is to get 100%. The scores are not for comparison between students.

I'm not sure what to say- standardized tests are called that ("standardized") because they are specifically designed to compare students. And compare their teachers. And compare their schools. And this comparison is used to determine the funding received by those schools.

stevendaryl · May 12, 2015

Andy Resnick said:

I'm not sure what to say- standardized tests are called that ("standardized") because they are specifically designed to compare students.

Being a standardized test means that the questions and answers are standardized. That's independent of whether it is used for self-assessment or for comparison between students, isn't it?

votingmachine · Jun 30, 2015

stevendaryl said:

I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant? Certainly, tests can't accurately measure inherent ability, but that's only relevant if you're trying to use the test to decide a student's entire future. But if you're only trying to decide what courses the student should take next, and whether the student needs additional help in a subject, then I think a test can give you a lot of information about that. That's why I advocate lots of small, low-stakes tests. They would just be a snapshot of where the student is, academically, not some kind of Tarot reading of what they are capable of next year or 10 years from now.

Your point about external factors such as a home life that is not conducive to learning is very good, but I'm not sure how schools should address those kinds of inequalities, other than to give students lots of opportunities for extra help.

Intuitively, I think that there is too much variation to get meaningful statistics. The tests are typically given with a dual purpose: to assess the student performance and to assess the education system performance. As you point out, it is fairly reasonable to use the tests for student performance.

The larger problem is in assessing the education system. A single brilliant student raises the average and you look like a brilliant teacher. A few well prepared students from affluent homes make you look great. And with a large variation, it might take longer than we want to wait to actually measure the thing accurately. And if we determine a school is bad after 10 years ... there was an entire cohort damaged by that, and the school is unlikely to be the same, as there are always changes being implemented.

Currently there are a lot of problems with education in the US. Using data and measurements to inform us seems a good idea. I'm not sure it does anything other than move things around randomly.

I remember a story once about a hypothetical company that had everyone flip 3 coins, and ordered them to get 3 heads. Now a few succeeded and were promptly held up as the "star" flippers. The company then asked them to explain how they did it to the rest (I relax my arm ... so everyone: relax your arms). Then the next day they flip again. And maybe a few repeat and a few new ones are "stars". Meanwhile a few of the really bad ones (the guy who had 3 tails, TWICE) get fired.

It sounds like process control. It passes the ordinary management requirements for a data-driven process change, and quality metrics. But it is still just using garbage data. Relaxing the arm made no difference.

I'm not opposed to testing. But it should be sensible testing that actually is useful. If it helps assess a student, and determine what class they need to be in next year, that seems fine. If it truly does inform about system performance, that also is great. But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system. They are the equivalent of being the lucky triple-head flipper, or the unlucky triple-tail flipper.

I am doubtful that test scores really will show much about how education should be done. Student success will likely not correlate with system success all that strongly. There will be some improvements that can help, but a truly statistically significant system evaluation really is fairly complex, and needs a lot of data.

HomogenousCow · Jun 30, 2015

votingmachine said:

Intuitively, I think that there is too much variation to get meaningful statistics. The tests are typically given with a dual purpose: to assess the student performance and to assess the education system performance. As you point out, it is fairly reasonable to use the tests for student performance.

The larger problem is in assessing the education system. A single brilliant student raises the average and you look like a brilliant teacher. A few well prepared students from affluent homes make you look great. And with a large variation, it might take longer than we want to wait to actually measure the thing accurately. And if we determine a school is bad after 10 years ... there was an entire cohort damaged by that, and the school is unlikely to be the same, as there are always changes being implemented.

Currently there are a lot of problems with education in the US. Using data and measurements to inform us seems a good idea. I'm not sure it does anything other than move things around randomly.

I remember a story once about a hypothetical company that had everyone flip 3 coins, and ordered them to get 3 heads. Now a few succeeded and were promptly held up as the "star" flippers. The company then asked them to explain how they did it to the rest (I relax my arm ... so everyone: relax your arms). Then the next day they flip again. And maybe a few repeat and a few new ones are "stars". Meanwhile a few of the really bad ones (the guy who had 3 tails, TWICE) get fired.

It sounds like process control. It passes the ordinary management requirements for a data-driven process change, and quality metrics. But it is still just using garbage data. Relaxing the arm made no difference.

I'm not opposed to testing. But it should be sensible testing that actually is useful. If it helps assess a student, and determine what class they need to be in next year, that seems fine. If it truly does inform about system performance, that also is great. But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system. They are the equivalent of being the lucky triple-head flipper, or the unlucky triple-tail flipper.

I am doubtful that test scores really will show much about how education should be done. Student success will likely not correlate with system success all that strongly. There will be some improvements that can help, but a truly statistically significant system evaluation really is fairly complex, and needs a lot of data.

That's an unreasonable comparison, the quality of the teachers does impact the test results of the students.

Dale · Jun 30, 2015

votingmachine said:

But the general sense of teachers and schools is that the test results are largely not representative of the performance of the educational system

Most people have the opinion that whatever metric is currently used to measure their performance is not representative of their performance.

jedishrfu · Jun 30, 2015

There is another thing to consider that whatever metric is chosen skews the results in a certain way as people try to optimize their score for their performance appraisal thus invalidating the metric.

Dr. Deming often said that a metric shouldn't be tied to an individual's performance for that very reason.

Instead it should be used to discover those teachers who are naturally better at teaching so that you can learn from them and train other teachers to do the same.

votingmachine · Jun 30, 2015

I agree that teacher quality matters and does impact the test results of students. What I said was INTUITIVELY, I think the data has too much randomness to allow easy statistical conclusions. The apocryphal story was clearly an exaggeration to show why we don't want to base process changes on bad data.

I also agree that people often think that whatever metric they are measured by misses some elusive qualities that make them special. But then again, some metrics DO miss the important thing. Adding testing is the frequent attempt to get a meaningful metric.

To draw conclusions from data, you need good data. And an understanding of the thing you are measuring. I might be wrong, and it may only take a dozen test scores and a single year to discover that a teacher needs more instruction on the craft of teaching. My perception is that it will take more scores and more time. But that is an intuitive perception, based on being a parent, and seeing student variation, seeing the occasional sick kid tested, and just having my own perception of populations and variations. There were comments here about how easy the tests were. Those comments also did not really endorse their teachers. But their test results would specifically support whatever the education system did.

I think tests can be a valuable part of measuring student performance and measuring system performance. But I think that bringing the tests into the system side needs to be done carefully. I thought that about the initial comment I was reading:

"I certainly agree that things like home life affect a student's performance on tests, but why does that make the test results not statistically significant?"

The answer is that anything that increases the variance makes it harder to draw statistical conclusions. If one teacher has a class with test scores of 50, with a sigma of 20, and another has a class with test scores of 60, with sigma of 23, then the variations from other environmental factors makes comparison of the two teaching styles a bit difficult. One might be wildly better. Or it might be small datasets like:

50, 40, 30, 50, 60, 70, 80, 20
average=50, std=20

50, 40, 30, 50, 60, 70, 80, 100
average=60, std=23

I took out the worst student score and stuck in a bright score. Or maybe the 20 was a kid who was sick on testing day (and there are generally no excused absences).

I don't know if that is realistic. But I think that for results to be statistically significant, it will take some good data. And strictly INTUITIVELY, I think that is difficult to get quickly and easily.

Dale · Jun 30, 2015

Statistical methods for dealing with those kinds of issues are well known. As long as the analysts are competent (not a given) I don't see any of those as being a real problem. Furthermore, available statistical methods are quite good at testing the data itself to determine if these issues are even problems for a given data set.

votingmachine · Jun 30, 2015

That is the big thing. Unfortunately you can't assign kids randomly to placebo groups that get no instruction and different treatment groups. Bugger that whole problem of human test subjects.

Good statisticians are important. I wish I was one, but I'm strictly low level. I agree that the analysts have to be competent, and they have to be alert to some crazy potential confounding variables. Home life, which was mentioned, would have to be looked at.

Do you think my "intuition" is wildly wrong or on the mark about the variation issue in student populations? I recall my brother relating the story of how his rural school tried very hard to convince him to find a way to keep his two very bright kids in their system. He was aware that they were aware of the tumble in their metrics from the transfer of two excellent students (and high test results). Likewise, there is a whole bunch of news around how schools try to transfer out any student that has poor test scores. If you have that pregnant teen ... you know what will happen.

I see now the following post was made already:
"There is another thing to consider that whatever metric is chosen skews the results in a certain way as people try to optimize their score for their performance appraisal this invalidating the metric." (Jedishrfu)

That is what I was getting at.

Dale · Jun 30, 2015

votingmachine said:

Unfortunately you can't assign kids randomly to placebo groups that get no instruction and different treatment groups.

But you don't need to if you use the right methods.

votingmachine said:

Do you think my "intuition" is wildly wrong or on the mark about the variation issue in student populations?

I think your intuition is wildly wrong about the importance of the issue. It is simply something that the statisticians need to account for in their methodology, not something that fundamentally precludes analysis.

votingmachine · Jun 30, 2015

DaleSpam said:

I think your intuition is wildly wrong about the importance of the issue. It is simply something that the statisticians need to account for in their methodology, not something that fundamentally precludes analysis.

Cool. Although I guess what I am saying is not that you can't do the analysis. As you point out the analysis tells you if there is a significant difference. I was getting at whether the randomness matters so much that the data set has to be extraordinarily large to draw conclusions. A grade-school teacher might have 25 students and teach several subjects to them, every year. If it ends up that you need 10 years of data to get to statistical significance, then that would accord with my first intuition that variation ends up making too much difference. 10 years is 250 kids, and that might let you start to control for socioeconomic factors, and confounding variables.

I recognize the analysis is its own thing, and it tells you what it tells you. My intuition was that the annual data sets would tell you that there was not a lot to conclude about individual teachers. And then schools systems need to have the teacher factor removed, and control for socioeconomic factors again. And of course everyone will try to game the system for better metrics as they move along.

Standardized Tests: Have we gone too far?

Similar threads

Hot Threads

Recent Insights