|
jInfer | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface ClustererWithAttributes<T,S>
Extending Clusterer
interface with attributes in mind.
Interface for clustering algorithms implementations. Actual implementors will
probably use AbstractNode as generic class for Clusterer and differentiate
Nodes coming for clustering in runtime.
Purpose of clustering is to cluster elements based on some criterion into clusters - generally same name. Sometimes elements with same name appear in documents with different semantics, sometimes misspelled element names in documents causes semantically same elements to have different names.
Clusterer have to deal with these issues.
As method getRepresentantForItem is used for getting one representative of element/simpledata/attribute when adding steps into automaton (which have to be A.equals(B) when node A and B are in same cluster), clusterer have to parse elements right sides. Maybe by just doing:
for (Node x : queue) { if (x.isElement) { this.addAll(((Element) x).getSubnodes().getTokens()); } }When automaton is created, getRepresentantForItem() is called for everything on elements right side of rule. So Clusterer have to deal with SimpleData (one cluster for all simpledata nodes).
This Clusterer have to do more work on attributes. For each cluster, it has to collect attributes in cluster members and cluster attributes separately, by some criterion. Then simplifier may ask for all attributes clusters by giving representative to getAttributeClusters method.
Clusterer may decide that attribute should be converted into element, and update clusters accordingly (removing attributes, adding elements with same content to appropriate element cluster). There should be log message to user about such a decision, may be consulted with user in some way.
It's up to simplifier, to decide what to do with attributes (simplifying).
Each item has to be in exactly one cluster.
Method Summary | |
---|---|
List<Cluster<S>> |
getAttributeClusters(T representant)
Returns all clusters of attributes for a given representative of cluster (for a given element cluster de facto). |
Methods inherited from interface cz.cuni.mff.ksi.jinfer.twostep.clustering.Clusterer |
---|
add, addAll, cluster, getClusters, getRepresentantForItem |
Method Detail |
---|
List<Cluster<S>> getAttributeClusters(T representant)
representant
-
|
jInfer | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |