I've written about the joys of Unicode and software development before. Using unexpected data in your testing is usually a good way to test for text encoding issues. Finding and fixing these those bugs early could save your team from a host of other related issues and hackery.
Even if you don't expect to have unusual text content, this type of testing can help indicate if all your systems are configured consistently. Failure to do so can result in users seeing the dreaded Mojibake.
|Mojibake, when encoding goes bad|
I've recently created a python package for generating random Unicode codepoints so they can be incorporated easily into your automated tests and tools. It's called Unicode Babel, and can be used to create a simple iterator for supplying 'international' text to your app:
unicode_babel tools, filters genny tools.CodePointGenerator() point genny.random_codepoints( , filters.filter_out_if_no_name) (point)
Will output something like:
Or you can integrate it with your existing tools like Selenium Webdriver, e.g.:
ᓆ ᗡ ꋛ 販 ۅ 䶣 楨 蟷 䔉 ݥ
unicode_babel tools, filters selenium webdriver selenium.webdriver.common.keys Keys browser webdriver.Chrome() browser.get( ) data_genny tools. () unusual_char data_genny.get_random_codepoint(filters.filter_out_if_no_name) search_box browser.find_element_by_name( ) search_box.send_keys(unusual_char Keys. )
I hope it helps you with your testing, send me any bugs!