|
jInfer | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface Clusterer<T>
Interface for clustering algorithms implementations. Actual implementors will probably use AbstractNode as generic class for Clusterer and differentiate Nodes coming for clustering in runtime.
Purpose of clustering is to cluster elements based on some criterion into clusters - generally same name. Sometimes elements with same name appear in documents with different semantics, sometimes misspelled element names in documents causes semantically same elements to have different names.
Clusterer have to deal with these issues.
As method getRepresentantForItem is used for getting one representant of element/simpledata/attribute when adding steps into automaton (which have to be A.equals(B) when node A and B are in same cluster), clusterer have to parse elements right sides. Maybe by just doing:
for (Node x : queue) { if (x.isElement) { this.addAll(((Element) x).getSubnodes().getTokens()); } }When automaton is created, getRepresentantForItem() is called for everything on elements right side of rule. So Clusterer have to deal with SimpleData (one cluster for all simpledata nodes), attributes are omitted in automaton creation, can be omitted in clusterer. For those who wish to write simplifier with attributes handling, take a look at ClustererWithAttributes interface.
Each item has to be in exactly one cluster (that's what clustering is all about).
Method Summary | |
---|---|
void |
add(T item)
Add x to some clusterer, enqueue for processing. |
void |
addAll(Collection<T> items)
Add the whole collection to queue for clustering |
void |
cluster()
Do the main job, cluster enqueued items into clusters. |
List<Cluster<T>> |
getClusters()
Without doing clustering again, return result of last cluster() call. |
T |
getRepresentantForItem(T item)
Return representative of the item's cluster |
Method Detail |
---|
void add(T item)
item
- to addvoid addAll(Collection<T> items)
items
- elements to addvoid cluster() throws InterruptedException
Example: add(x), add(y), add(xx), add(yx), enqueued items: x, y, xx, yx. Calling cluster() creates clusters for example based on starting letter. Creates two clusters: (x, xx) | (y, yx)
Now let user use add(xd), add(zz). Calling cluster() again have to result in (x, xx, xd) | (y, yx) | (zz)
Of course, if cluster criterion is not so stable as first letter, items x, xx, y, yx
can change their clusters and so. Point is, that they don't disappear. Once an item is added
clusterer has to hold it for future cluster() calls.
cluster method has to check for interruption of thread by using:
if (Thread.interrupted()) {
throw new InterruptedException();
}
in some main loop.
InterruptedException
T getRepresentantForItem(T item)
item
-
List<Cluster<T>> getClusters()
cluster
|
jInfer | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |