Detect whether a File seems to contain XML, taking encoding and whitespace into account
With the help of EncodingDetectingInputStream we can make an assumption whether a File is supposed to contain XML or not, without actually parsing the file (for performance reasons). Note that we do NOT make any assumptions of whether the XML is valid.
public static boolean isXml(File file) {
FileInputStream fis = null;
InputStreamReader isr = null;
try {
fis = new FileInputStream(file);
EncodingDetectingInputStream encodingDetectingInputStream = new EncodingDetectingInputStream(fis);
final Charset charset = encodingDetectingInputStream.getCharset();
isr = (charset != null) ? // UTF encoding detected by BOM
new InputStreamReader(encodingDetectingInputStream, charset) :
new InputStreamReader(encodingDetectingInputStream);
char c;
// Skip leading whitespace
do {
c = (char) isr.read();
} while(c != -1 && Character.isWhitespace(c));
// If first real character is <, then assume XML. Your case may require reading more characters
return c == '<';
}
catch(IOException ioex) {
return false; // Probably not an XML file
}
finally {
try {
if(isr != null)
isr.close();
if(fis != null)
fis.close();
}
catch (IOException ioex) {
// TODO: Handle error
}
}
Fork
1 Feedback
Nice and simple solution. Probably the best compromise between accuracy and performance. - Olivier Friday 16, 2010 9:59 AMYou must log in before you can give any feedback
You must log in before you can post a comment


462
0



Mark 'detect' tag as 'like'
Mark 'detect' tag as 'ignore'