Texts, articles, and papers on Maximum Entropy Classifiers tend to come in two varieties: the more popular "upper level", and the more technical.
The popular variety are good at explaining the Maximum Entropy concept and why such classifiers are considered to be better than Naive Bayes classifiers (and also why they are harder to calculate).
However, the discussions on how the coefficients are determined are much more heavy going. Or they have been when I've tried to read them - perhaps I need some sleep and no customer interruptions :-)
Are there any intermediate descriptions of maximum entropy coefficient determination - especially for the gradient / quasi-Newtonian methods (rather than the iterative methods). Ie. how these methods are used to determine the best coefficients that fit the training data. I've looked at code and I guess I have a conceptual problem understanding the connection between the classifier code itself, and gradient code (LBFGSB, CG, etc).