jInfer

cz.cuni.mff.ksi.jinfer.twostep.clustering
Interface ClustererWithAttributes<T,S>

All Superinterfaces:
Clusterer<T>
All Known Implementing Classes:
Iname

public interface ClustererWithAttributes<T,S>
extends Clusterer<T>

Extending Clusterer interface with attributes in mind. Interface for clustering algorithms implementations. Actual implementors will probably use AbstractNode as generic class for Clusterer and differentiate Nodes coming for clustering in runtime.

Purpose of clustering is to cluster elements based on some criterion into clusters - generally same name. Sometimes elements with same name appear in documents with different semantics, sometimes misspelled element names in documents causes semantically same elements to have different names.

Clusterer have to deal with these issues.

As method getRepresentantForItem is used for getting one representative of element/simpledata/attribute when adding steps into automaton (which have to be A.equals(B) when node A and B are in same cluster), clusterer have to parse elements right sides. Maybe by just doing:

 for (Node x : queue) {
   if (x.isElement) {
     this.addAll(((Element) x).getSubnodes().getTokens());
   }
 }
 
When automaton is created, getRepresentantForItem() is called for everything on elements right side of rule. So Clusterer have to deal with SimpleData (one cluster for all simpledata nodes).

This Clusterer have to do more work on attributes. For each cluster, it has to collect attributes in cluster members and cluster attributes separately, by some criterion. Then simplifier may ask for all attributes clusters by giving representative to getAttributeClusters method.

Clusterer may decide that attribute should be converted into element, and update clusters accordingly (removing attributes, adding elements with same content to appropriate element cluster). There should be log message to user about such a decision, may be consulted with user in some way.

It's up to simplifier, to decide what to do with attributes (simplifying).

Each item has to be in exactly one cluster.


Method Summary
 List<Cluster<S>> getAttributeClusters(T representant)
          Returns all clusters of attributes for a given representative of cluster (for a given element cluster de facto).
 
Methods inherited from interface cz.cuni.mff.ksi.jinfer.twostep.clustering.Clusterer
add, addAll, cluster, getClusters, getRepresentantForItem
 

Method Detail

getAttributeClusters

List<Cluster<S>> getAttributeClusters(T representant)
Returns all clusters of attributes for a given representative of cluster (for a given element cluster de facto). Attributes have to be collected from elements.

Parameters:
representant -
Returns:

jInfer

Generated on Fri Dec 9 00:01:25 CET 2011