Determine input encoding-Collection of common programming errors
Generally checking whether input is UTF is a matter of heuristics — there’s no definitive algorithm that’ll state you “yes/no”. The more complex the heuristic, the less false positives/negatives you will get, however there’s no “sure” way.
For an example of heuristics you can check out this library : http://utfcpp.sourceforge.net/
bool valid_utf8_file(iconst char* file_name)
{
ifstream ifs(file_name);
if (!ifs)
return false; // even better, throw here
istreambuf_iterator it(ifs.rdbuf());
istreambuf_iterator eos;
return utf8::is_valid(it, eos);
}
You can either use it, or check its sources how they have done it.