- How to implement this idea
- Posted by sticker.ji@gmail.com on October 9th, 2006
I have a dataset which contains records. The record can be treated as a
set, which contains elements with no duplication.
The query is a pair of elements. The system should return all the
records containning both of the elements.
How to implement this function quickly?
Example:
dataset:
record 1: {A B D F G}
record 2: {A C T Y}
record 3: {B C D Q L}
query {B C} returns 1, 3.
query {A F} return 1.
query {A H} return null.
The query will always be a pair of elements. No other format of query
exists.
Thank you very much for your idea.
- Posted by makc.the.great@gmail.com on October 9th, 2006
sticker.ji@gmail.com wrote:
assuming it was actually record 1: {A B D F C}, here's a script (save
as html file to run):
<SCRiPT>
record = [];
record[0] = ["A", "B", "D", "F", "C"];
record[1] = ["A", "C", "T", "Y"];
record[2] = ["B", "C", "D", "Q", "L"];
function query(x, y) {
var res = [];
for(i=0; i<record.length; i++) {
var found_x = false;
var found_y = false;
for(j=0; j<record[i].length; j++) {
if(record[i][j] == x) found_x = true;
if(record[i][j] == y) found_y = true;
if(found_x && found_y) {
res[res.length] = i;
break;
}
}
}
return res;
}
alert(query("B", "C"));
alert(query("A", "F"));
alert(query("A", "H"));
</SCRiPT>
- Posted by rossum on October 9th, 2006
On 9 Oct 2006 02:48:34 -0700, sticker.ji@gmail.com wrote:
record 1: {A B D F C}
Set up an inverted index:
element A: {r1 r2}
element B: {r1 r3}
element C: {r1 r2 r3}
element D: {r1 r3}
element E: {}
element F: {r1}
Retrieve the entries for the given two elements and return the
intersection of the two returned sets.
rossum
- Posted by makc.the.great@gmail.com on October 9th, 2006
rossum wrote:
that's surely fastest way, but it is also a hungriest. such index will
occupy much more space than dataset itself.
- Posted by CBFalconer on October 9th, 2006
sticker.ji@gmail.com wrote:
Formulate it in Pascal, where the fundamental data type "SET OF
element" exists, and there is a boolean operator "element IN set".
The existance of limits on set size will probably limit the
operation, but you can get all the ideas straight and then create
something that expands the effective set size.
--
"The mere formulation of a problem is far more often essential
than its solution, which may be merely a matter of mathematical
or experimental skill. To raise new questions, new possibilities,
to regard old problems from a new angle requires creative
imagination and and marks real advances in science."
-- Albert Einstein
- Posted by rossum on October 9th, 2006
On 9 Oct 2006 05:06:33 -0700, makc.the.great@gmail.com wrote:
just rearranged, so it will approximately double the size of the
dataset.
Alternatively, look at data compression methods - can you use a bitset
to represent either the set of elements or the inverted set of
records?
rossum
- Posted by makc.the.great@gmail.com on October 10th, 2006
rossum wrote:
no it wasnt me
consider simple record: r1 { A } for this record you will need A { r1
}, B {}, C {} ... Z {} or whenever this list ends (unless you will not
store empty pointers, but in your post you did
)
then if you will not store empty pointers, you will face more problems
like searching through index itself, and you will probably need "second
level" index for that, etc.
- Posted by rossum on October 10th, 2006
On 10 Oct 2006 01:22:15 -0700, "makc.the.great@gmail.com"
<makc.the.great@gmail.com> wrote:
most of the slots are filled then empty pointers will simplify
processing.
cope with a return of "entry not found" from the dataset.
rossum