- Suffix trees versus prefix arrays
- Posted by almurph@altavista.com on July 4th, 2008
Hi everyone
Why do suffix trees/arrays _appear_ to be more commonly used then
prefix trees/arrays? Is it due to some property of language (like
script is read from the left to the right)? Or does it make no
difference at all? Is there some sort of advantage?
I would appreciate any comments that you may have on this.
Thanks,
Al.
- Posted by 4N on July 4th, 2008
<almurph@altavista.com> ha scritto nel messaggio
news:12d19b6a-331b-40f2-abad-bbff88309238@l42g2000hsc.googlegroups.com...
From Wikipedia:
A trie, or prefix tree, is an ordered tree data structure that is used to
store an associative array where the keys are usually strings.
A suffix tree (also called suffix trie, PAT tree or, in an earlier form,
position tree) is a data structure that presents the suffixes of a given
string in a way that allows for a particularly fast implementation of many
important string operations, for instance locating a substring in S,
locating a substring if a certain number of mistakes are allowed, locating
matches for a regular expression pattern etc. Suffix trees also provided one
of the first linear-time solutions for the longest common substring problem.
These speedups come at a cost: storing a string's suffix tree typically
requires significantly more space than storing the string itself.
- Posted by almurph@altavista.com on July 4th, 2008
It appears to be just a matter of definition.
4N wrote:
- Posted by 4N on July 5th, 2008
<almurph@altavista.com> ha scritto nel messaggio
news:e93b5e3c-81ab-4c00-bbbf-f7f18e2149fe@l42g2000hsc.googlegroups.com...
No!
A trie, or prefix tree, is a special type of bynary tree done to reduce the
number of bit compared when performing operations and it's always the same
even if you change the order of the object inserted in the tree.
A suffix tree is a tree where you store all the suffixes of a string, thus
using more ram, and it's used to perform special operations that a regular
trie or a binary tree cannot perform, eg. locating a substring in S,
locating a substring if a certain number of mistakes are allowed, locating
matches for a regular expression pattern etc. Suffix trees also provided one
of the first linear-time solutions for the longest common substring problem.
Is it more clear like this?