As an alternative, I present an efficient, trainable algorithm that can be easily adapted to new text genres and some range of natural languages. The algorithm uses a lexicon with part-of-speech probabilities and a feed-forward neural network for rapid training. The method described requires minimal storage overhead and a very small amount of training data. The algorithm overcomes the limitations of existing methods and produces a very high accuracy.
The results presented demonstrate the successful implementation of the algorithm on a 27,294 sentence English corpus. Training time was less than one minute on a workstation and the method correctly labeled over 98.5% of the sentence boundaries. The method was also successful in labeling texts containing no capital letters. The system has been successfully adapted to German and French. The training times were similarly low and the resulting accuracy exceeded 99%.