denske.com

I am Ramshackle Blood

January 17, 2005

How to convert documents to different encodings with Python in one line

Filed under: — denske @ 10:16 pm

You can utilize existing Python libraries to convert encodings in (basically) one line of code. I have to do this quite often, and I can’t figure out where else to put this information.

Import encodings, and set up your directories, and the magic line is:

newEncodedFile = unicode(contents_of_old_file, 'iso8859-15').encode(new_encoding,'replace')

This replaces the iso8859-15 encoding with a new_encoding that I’ve defined earlier or taken from the encodings module (like cp1251 or something). Possible values for errors are ’strict’ which raises an exception, ‘replace’ (which I’ve used) that replaces unknown characters with a ?, and ‘ignore’ which ignores errors.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress