Can’t be handled as Microsoft document

… Can’t be handled as Microsoft document.
java.lang.ArrayIndexOutOfBoundsException …..

If you see this kind of exception while parsing the word document using Nutch then it indicates that document has problematic content & includes weired special characters which were not properly handled by the parser.