- #1
btb4198
- 572
- 10
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM?
I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube videos on it, but I still do not get how this could be using to find the location of the middle number in a dataset too big for RAM. I read something that said once you have all the values counted up in your dictionary, you can scan it to find the index of the first non-zero element and that is where your middle value will be but I really do not get that.
For example, these 3 sets:
set 1: {6, 2, 7, 3, 4, 3, 5, 4, 1}
set 2: {6, 4, 2, 5, 8, 1, 7, 3, 9}
set 3: {4, 2, 9, 8, 7, 6, 3, 5, 1}
your dictionary would be :
I read somewhere that the index of the first non-zero element is your center. In this case, it would be 1.
so you do
(9 - 1) / 2 = 4. ( I think this is the equation for the median but could be wrong)
The 9 comes from the fact that there are 9 elements in the all 3 sets. So, if we subtract the index of the first non-zero element 1 from the total number of elements 9, we get 8. Then, we divide that by 2 to get our 4.
and yes 4 is the Middle value, but I do not get how what I just did somehow find the middle value. I am very lost here on the math behind this.
can someone please example ? also did I even do this right ? again I never learned this in school, So this whole problem is completely new to me.
I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube videos on it, but I still do not get how this could be using to find the location of the middle number in a dataset too big for RAM. I read something that said once you have all the values counted up in your dictionary, you can scan it to find the index of the first non-zero element and that is where your middle value will be but I really do not get that.
For example, these 3 sets:
set 1: {6, 2, 7, 3, 4, 3, 5, 4, 1}
set 2: {6, 4, 2, 5, 8, 1, 7, 3, 9}
set 3: {4, 2, 9, 8, 7, 6, 3, 5, 1}
your dictionary would be :
index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
count | 3 | 3 | 4 | 4 | 3 | 3 | 3 | 2 | 2 |
I read somewhere that the index of the first non-zero element is your center. In this case, it would be 1.
so you do
(9 - 1) / 2 = 4. ( I think this is the equation for the median but could be wrong)
The 9 comes from the fact that there are 9 elements in the all 3 sets. So, if we subtract the index of the first non-zero element 1 from the total number of elements 9, we get 8. Then, we divide that by 2 to get our 4.
and yes 4 is the Middle value, but I do not get how what I just did somehow find the middle value. I am very lost here on the math behind this.
can someone please example ? also did I even do this right ? again I never learned this in school, So this whole problem is completely new to me.