- finding the median of a list without sorting
- Posted by pereges on June 25th, 2008
I have seen some code which can find median of an odd numbered list
but how to do it for even numbered list ? please give me some
suggestions
int median( const intList * l ) {
for( int posn = 0; posn < l.size; posn++ )
int smaller = 0;
int larger = 0;
for( int p = 0; p < l.size; p++ )
if( l.items[p] < l.items[posn] )
smaller++;
else if( l.items[p] < l.items[posn] )
larger++;
if( smaller < l.size/2 && larger < l.size/2 )
return l.items[posn];
}
- Posted by Willem on June 25th, 2008
pereges wrote:
) I have seen some code which can find median of an odd numbered list
) but how to do it for even numbered list ? please give me some
) suggestions
First, try to figure out exactly what 'the median of an even sized list'
means.
What is the median of this list: 1 2 3 4
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
- Posted by pete on June 25th, 2008
pereges wrote:
What does it look like?
Copies of code are preferable to impressionistic drawings.
--
pete
- Posted by Ben Bacarisse on June 25th, 2008
pereges <Broli00@gmail.com> writes:
This is probably re-typed. If the probable typos are corrected, we
get this:
int median(int n, const int *list)
{
for (int posn = 0; posn < n; posn++) {
int smaller = 0;
int larger = 0;
for (int p = 0; p < n; p++) {
if (list[p] < list[posn])
smaller++;
else if (list[p] > list[posn])
larger++;
}
if (smaller <= n/2 && larger <= n/2)
return list[posn];
}
}
This gives *an* answer, even for even-length arrays. Is it not the
answer you want? I say this because the usual mathematical definition
of median makes the answer rational (rather than integral) in some
cases which is often more trouble than it is worth. It depends
entirely on what you want the median for. Very often, the "middle
element" rather than the true median is what you really want.
On a more theoretical level, this is a quadratic solution to a linear
problem. In practical terms, the linear solutions are rather fiddly
and it is often better to opt for a O(n log(n)) solution like sorting
or a selection algorithm modelled on a sort (like quickselect).
--
Ben.
- Posted by CBFalconer on June 25th, 2008
pereges wrote:
list#1 = 1, 2, 3, 4
list#2 = 4, 5, 6, 7
I fail to see any difference in the median calculations for these.
This may be considered a foolish answer, but it is intended to
point out the inaccuracy in the original question.
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
- Posted by Richard Harter on June 25th, 2008
Suppose we have an array and that we want to find a partitioning
element that divides the elements in the array as evenly as
possible, and that we want to do this as efficiently as possible
without moving elements.
In other words we have an array A[i], i=1,...,n. We want to find
the index of a middle element without messing up the array. If n
is odd we want the median; if n is even we want either of the
middle two.
"Efficiently as possible" has two dimensions, space and time.
Ideally we want an algorithm that is O(1) in space and O(n) in
both average time and worst case time. If there is no such
animal (and I don't think there is) what is the best that we can
do for various tradeoffs.
If we allow O(n) space we can create an array of indices and use
any of the standard algorithms for O(n) time. The question is,
can we do better than that? Thus:
If we allow O(log n) space can it be done in O(n) time?
If we allow O(1) space it be done in O(n*log n) time or better?
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
- Posted by tchow@lsa.umich.edu on June 25th, 2008
In article <48626d52.3543796@news.sbtc.net>,
Richard Harter <cri@tiac.net> wrote:
I don't know the exact answer to your question offhand, but the search term
"median-finding algorithm" should point you to the right body of literature.
I do know that there are some surprisingly efficient algorithms for this
problem.
--
Tim Chow tchow-at-alum-dot-mit-dot-edu
The range of our projectiles---even ... the artillery---however great, will
never exceed four of those miles of which as many thousand separate us from
the center of the earth. ---Galileo, Dialogues Concerning Two New Sciences
- Posted by Richard Harter on June 26th, 2008
On 25 Jun 2008 21:21:05 GMT, tchow@lsa.umich.edu wrote:
The problem is that most efficient algorithms, e.g., Hoare's
partition algorithm and the BFPRT algorithm, presuppose moving
data. The situation is this: These algorithms require O(log n)
passes; however in each pass the amount of data that must be
examined decreases geometrically. This is achieved by data moves
that concentrate the relevant data. The question at hand is what
can we do without moving the data. The best that I can see is
something that I call the butterfly algorithm which requires
O(log n) space and O(n) time (average, but worst case O(n^2)). I
suspect that there is a worst case bounded O(log n) space and
O(n) algorithm but I haven't come p with it.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
- Posted by Keith Thompson on June 26th, 2008
CBFalconer <cbfalconer@yahoo.com> writes:
It seemed obvious to me that an "even numbered list" is simply a list
with an even number of elements. I suppose it could have been stated
more precisely, but I thought it was clear enough as it was.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
- Posted by David Wagner on June 26th, 2008
Richard Harter wrote:
This is probably too trivial to be of interest to you, but in the special
case where all array elements are at most poly(n) in absolute value, it
looks to me like the answer is yes. If you have a guess at the median,
you can count how many elements are smaller than your guess in O(n)
time and decide whether to increase or decrease your guess according
to whether this count is less than or greater than n/2, respectively.
Now use binary search. (Or did I overlook something?)
- Posted by Richard Harter on June 26th, 2008
On Thu, 26 Jun 2008 06:37:26 +0000 (UTC),
daw@taverner.cs.berkeley.edu (David Wagner) wrote:
In this case you are right - the time would be O(n*log n) [log n
passes, scan the entire array each time]. For numeric elements
you can do better than that by computing the low order moments
and using them to get a small range of values. If the keys are
arbitrary length keys it doesn't work.
Thanks for the suggestion.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
- Posted by Malcolm McLean on June 26th, 2008
"David Wagner" <daw@taverner.cs.berkeley.edu> wrote in message
below the first element.
The using a heuristic iterpolate the answer. As you pass through for the
second time, pick up the element closest to your guess, number above the
guess, and number below the guess. If the guess is wrong you have bounds,
and try again.
It's O( N log N), but the logarithmic base of N is bigger than two. It takes
virtually no memory space.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
- Posted by user923005 on June 26th, 2008
On Jun 24, 10:27*pm, pereges <Brol...@gmail.com> wrote:
Maybe:
#include <stdlib.h>
#include <float.h>
#include <math.h>
typedef double Etype;
extern Etype RandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t RandRange(size_t a, size_t b);
extern size_t RandomPartition(Etype * A, size_t p, size_t r);
extern size_t Partition(Etype * A, size_t p, size_t r);
/*
**
** In the following code, every reference to CLR means:
**
** "Introduction to Algorithms"
** By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
** ISBN 0-07-013143-0
*/
/*
** CLR, page 187
*/
Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;
if (i <= k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}
size_t RandRange(size_t a, size_t b)
{
size_t c = (size_t) ((double) rand() / ((double) RAND_MAX
+ 1) * (b - a));
return c + a;
}
/*
** CLR, page 162
*/
size_t RandomPartition(Etype A[], size_t p, size_t r)
{
size_t i = RandRange(p, r);
Etype Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}
/*
** CLR, page 154
*/
size_t Partition(Etype A[], size_t p, size_t r)
{
Etype x,
temp;
size_t i,
j;
x = A[p];
i = p - 1;
j = r + 1;
for (;
{
do {
j--;
} while (!(A[j] <= x));
do {
i++;
} while (!(A[i] >= x));
if (i < j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static const double CLOCK_INV = 1.0 / CLOCKS_PER_SEC;
int compare(const void *p, const void *q)
{
register double p0 = *(double *) p;
register double q0 = *(double *) q;
if (p0 > q0)
return 1;
if (p0 < q0)
return -1;
return 0;
}
void quit()
{
fprintf(stderr, "memory exhausted\n");
exit(1);
}
int main(int argc, char *argv[])
{
int res = 100000;
char *file = argv[1];
double median = 0;
double mean = 0;
int n = 0;
double d;
clock_t start,
end;
double *buf = (double *) malloc(sizeof(double) * res);
FILE *fin = fopen(file, "r");
if (buf == 0)
quit();
start = clock();
while (fscanf(fin, "%lg", &d) == 1) {
if (n == res) {
res += res;
buf = (double *) realloc(buf, sizeof(double) * res);
if (buf == 0)
quit();
}
buf[n++] = d;
mean = (n == 1) ? d : mean + (d - mean) / n;
}
/*
if n is odd, then we use (n+1)/2
If n is even, we use [(n/2)th term + (n/2+1)th term] / 2
*/
if (n) {
int mid = n / 2;
median = (n % 2) ? RandomSelect(buf, 0, n - 1, mid) :
(RandomSelect(buf, 0, n - 1, mid-1) + RandomSelect(buf, 0, n - 1,
mid))/2;
}
/*
qsort(buf, n, sizeof(double), compare);
if (n) {
int mid = n / 2;
median = (n % 2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}
*/
end = clock();
printf("number of elements = %d, median = %g, mean = %g in %f
seconds\n", n, median, mean, (end - start) * CLOCK_INV);
free(buf);
return 0;
}
/*
It would be a lot faster for odd numbered data sets:
C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in
8.000000 seconds
C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in
7.375000 seconds
C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in
7.437000 seconds
*/
- Posted by Gene on June 28th, 2008
On Jun 25, 2:26*am, Willem <wil...@stack.nl> wrote:
Every prob and stats book I've ever seen would say 2.5, i.e. the
arithmetic mean of the two middle elements.
- Posted by Ben Bacarisse on June 28th, 2008
Gene <gene.ressler@gmail.com> writes:
There are some that don't, although every one I've seen eventually
suggests picking the mean of the two as a "practical" solution. All
the technical definitions that I can find say that median of that
sample is not unique.
For algorithms it is often *not* practical to take the mean of the two
middle values: the mean may be of different type to the sample data
(e.g. a rational rather than an integer) or, worse, the mean may not
be well-defined. The median string of the set {"a", "b", "c"} is "b",
but what would we choose for the median of {"a", "b", "c", "d"} using
the conventional "mean of the two middle values" answer[1]?
Since the technical definitions all include both endpoints of the range
bounded by the two middle elements we can choose either end. This is
the practical answer in CS.
[1] Yes, one can come with a string the is mid-way between two others
(lexicographically) but that answer is overly complex for this problem.
--
Ben.