I've written about the joys of Unicode and software development before . Using unexpected data in your testing is usually a good way to test for text encoding issues. Finding and fixing these those bugs early could save your team from a host of other related issues and hackery. Even if you don't expect to have unusual text content, this type of testing can help indicate if all your systems are configured consistently. Failure to do so can result in users seeing the dreaded Mojibake . Mojibake, when encoding goes bad I've recently created a python package for generating random Unicode codepoints so they can be incorporated easily into your automated tests and tools. It's called Unicode Babel , and can be used to create a simple iterator for supplying 'international' text to your app: from unicode_babel import tools, filters genny = tools.CodePointGenerator() for point in genny.random_codepoints( 10 , filters.filter_out_if_no_name) pr...
My thoughts on developing & testing.