I was saying you generate the graph from the points by doing the sums, and just take the O(n^2) hit at setup. Adding a single new point to the graph is only O(n). (This does sound a lot like AESA but I'm responding off the cuff here so haven't checked I'm afraid.)
For your million photo example I'm guessing this would still be your best bet if you have enough hardware to throw at it and your distance metric isn't woefully expensive to calculate.
Another option for that example would be to come up with a quick 'candidate match' algorithm - maybe based on a small set of numbers you generate cheaply for each photo individually. When you want to match a new photo, you find a largish number of candidate matches using these numbers, and use your expensive picture-picture matching algorithm on the results. (I'd also want to fill in the lines on the graph at that point by storing the distances, but that's still O(n^2) and you'll struggle with six billion photos on any near future kit.) Of course this depends on coming up with some cunning 'candidate match' algorithm that works.
If your requirement is to find the closest matching photo(s), I don't see how you can avoid the O(n^2) hit at some point (and throughout in storage terms), unless there are some strong properties of your match algorithm that don't leap out instantly?
My guess is that the known search algorithms are crap because the problem is bloody hard. It's bad enough doing NP optimisation stuff where evaluating candidate solutions is cheap - if evaluating solutions is hard you're stuffed unless you know something about how your data is structured/tends to cluster.
All of this is dodging the problem as you stated it - solving the metric space search problem elegantly - but that's because my instinct is that if a problem is well known but not well solved I'm unlikely to find a solution myself in polynomial time :-) (This is also probably why you have some distinction as an inventor of algorithms and I have none!)
no subject
Date: 2004-04-05 09:30 am (UTC)For your million photo example I'm guessing this would still be your best bet if you have enough hardware to throw at it and your distance metric isn't woefully expensive to calculate.
Another option for that example would be to come up with a quick 'candidate match' algorithm - maybe based on a small set of numbers you generate cheaply for each photo individually. When you want to match a new photo, you find a largish number of candidate matches using these numbers, and use your expensive picture-picture matching algorithm on the results. (I'd also want to fill in the lines on the graph at that point by storing the distances, but that's still O(n^2) and you'll struggle with six billion photos on any near future kit.) Of course this depends on coming up with some cunning 'candidate match' algorithm that works.
If your requirement is to find the closest matching photo(s), I don't see how you can avoid the O(n^2) hit at some point (and throughout in storage terms), unless there are some strong properties of your match algorithm that don't leap out instantly?
My guess is that the known search algorithms are crap because the problem is bloody hard. It's bad enough doing NP optimisation stuff where evaluating candidate solutions is cheap - if evaluating solutions is hard you're stuffed unless you know something about how your data is structured/tends to cluster.
All of this is dodging the problem as you stated it - solving the metric space search problem elegantly - but that's because my instinct is that if a problem is well known but not well solved I'm unlikely to find a solution myself in polynomial time :-) (This is also probably why you have some distinction as an inventor of algorithms and I have none!)