DANN:Naive Classifier

From Syncleus Wiki

Jump to: navigation, search
dANN Information
DescriptionAn Artificial Intelligence Library written in Java.
Last ActivityToday
LicenseOSCL Type C
IRC Room#dANN on irc.freenode.org
HomepagedANN
DistributionsBinary ZIP w/JavaDoc
Binary Tarball w/JavaDoc
Source ZIP
Source Tarball
DocumentationJavadoc repository
Javadoc for GIT master
Javadoc for stable release
Developmentgit://git.syncleus.com/dANN.git
TRAC Bug Tracking
Hudson Continuous Integration
Mailing ListsdANN Announcements
dANN Development
Syncleus Announcements


A Naive classifier is capable of classifying an item by looking at various features of the item independently. Exactly what an item represents and what features you can extract from it varies on your implementation. A typical example is applied to human readable text documents where a document is an item and the words in the document are its features. Determining what the features are and how to extract them are a vital step in writing a domain specific naive classifier. The more useful features you are able to extract from an item the better the results you are likely to get from classification. Since the features are considered independent (thats why its called naive) these classifiers wont pick up on and benefit from the relationship between various features. Bayesian Networks are more suited for items with interdependent feature maps.

There are several types of Naive Classifiers build into the dANN library; we are going to cover only the simple Naive Classifiers such as SimpleNaiveClassifier, SimpleLanguageNaiveClassifier, and StemmingLanguageNaiveClassifier.

Example: Processing Natural Language

There are several steps to using Naive Classifiers that is best shown in a concrete example. The easiest example to understand is a SimpleLanguageNaiveClassifier used to process spoken or written phrases.

The first step is to create a new classifier we can work with:

TrainableLanguageNaiveClassifier<Integer> classifier =
     new SimpleLanguageNaiveClassifier<Integer>();

This will create a new classifier where items are classified into categories represented by Integer types. Another classifier to consider using is StemmingLanguageNaiveClassifier. This classifier is used in the exact same way however it applies Porter Stemming Algorithm to each word. This will cause words like running and run to be seen as the same feature. If you want to use this classifier you instantiate as follows:

TrainableLanguageNaiveClassifier<Integer> classifier =
     new StemmingLanguageNaiveClassifier<Integer>();


Next the real magic happens, we train the classifier:

classifier.train("Money is the root of all evil!", 1);
classifier.train("Money destroys the soul", 1);
classifier.train("Money kills!", 1);
classifier.train("The quick brown fox.", 1);
classifier.train("Money should be here once", 2);
classifier.train("some nonsense to take up space", 2);
classifier.train("Even more nonsense cause we can", 2);
classifier.train("nonsense is the root of all good", 2);
classifier.train("just a filler to waste space", 2);

You'll notice we intentionally trained the classifier with several obvious patterns. Money appears most often in category 1, and nonsense and space prefers category 2. This will show up when we ask for some classifications in the next step:

assert (classifier.featureClassification("Money") == 1);
assert (classifier.featureClassification("Fox") == 1);
assert (classifier.featureClassification("Nonsense") == 2);
assert (classifier.featureClassification("Waste") == 2);
assert (classifier.featureClassification("Evil") == 1);
assert (classifier.featureClassification("Good") == 2);

As you can see this simple class will classify the features (words) of a phrase into categories it has previously learned. Not only can it classify the features within an item but also items themselves:

assert (classifier.classification("Money was here once") == 2);
assert (classifier.classification("Money destroys the quick brown fox!") == 1);
assert (classifier.classification("kills the soul") == 1);
assert (classifier.classification("nonsense is the root of good") == 1);

See Also