Tag Archives: utf-8

How to make Notepad to save text in UTF-8 without BOM?

Questions: I have a CSV file with special accents and saving it in Notepad by selecting UTF-8 encoding. When I read the file using Java, it reads the BOM characters too. So I want to save this file in UTF-8 format without appending a BOM initially in Notepad. Otherwise is there any built-in class in… Read More »

Reading InputStream as UTF-8

Questions: I’m trying to read from a text/plain file over the internet, line-by-line. The code I have right now is: URL url = new URL(“http://kuehldesign.net/test.txt”); BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream())); LinkedList<String> lines = new LinkedList(); String readLine; while ((readLine = in.readLine()) != null) { lines.add(readLine); } for (String line : lines) { out.println(“> “… Read More »

Java properties UTF-8 encoding in Eclipse

Questions: I’ve recently had to switch encoding of webapp I’m working on from ISO-xx to utf8. Everything went smooth, except properties files. I added -Dfile.encoding=UTF-8 in eclipse.ini and normal files work fine. Properties however show some strange behaviour. If I copy utf8 encoded properties from Notepad++ and paste them in Eclipse, they show and work… Read More »

How to set UTF-8 encoding for a PHP file

Questions: I have a PHP script called : http://cyber-flick.com/apiMorpho.php?method=getMorphoData&word=kot That displays some data in plain text: CzÄ�Ĺ�Ä� mowy: rzeczownik Przypadek: dopeĹ�niacz Rodzaj: ĹźeĹ�ski Liczba: mnoga As you can see in place of proper chars there are so “bushes”. What i would like to do is display this in a way so that people see in… Read More »

C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

Questions: I have googled on this topic and I have looked at every answer, but I still don’t get it. Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code: Encoding iso = Encoding.GetEncoding(“ISO-8859-1”); Encoding utf8 = Encoding.UTF8; string msg = iso.GetString(utf8.GetBytes(Message)); My source string is Message = “ÄäÖöÕõÜü”… Read More »

Regex to detect Invalid UTF-8 String

Questions: In PHP, we can use mb_check_encoding() to determine if a string is valid UTF-8. But that’s not a portable solution as it requires the mbstring extension to be compiled in and enabled. Additionally, it won’t tell us which character is invalid. Is there a regular expression (or another other 100% portable method) that can… Read More »

Set UTF-8 display for Git GUI differences window

Questions: I can’t remember how I made Git GUI to display UTF-8 encoded differences correctly. Also I can’t find the guide in search engines. Now I need to do this at new workplace. Could you write down instructions? OS: Windows 7 Answers: # Global setting for all you repositories > git config –global gui.encoding utf-8… Read More »

How to handle user input of invalid UTF-8 characters?

Questions: I’m looking for general a strategy/advice on how to handle invalid UTF-8 input from users. Even though my webapp uses UTF-8, somehow some users enter invalid characters. This causes errors in PHP’s json_encode() and overall seems like a bad idea to have around. W3C I18N FAQ: Multilingual Forms says “If non-UTF-8 data is received,… Read More »

UTF-8 in PHP regular expressions

Questions: I need help with regular expressions. My string contains unicode characters and code below doesn’t work. First four characters must be numbers, then comma and then any alphabetic characters or whitespaces… I already read that if i add /u on end of regular expresion but it didn’t work for me… My code works with… Read More »

SVG in HTML5 – when is XML declaration `<?xml version=“1.0” encoding=“UTF-8”?>` needed?

Questions: Several questions already take on clarifying SVG namespace usage here. When is the XML declaration <?xml version=”1.0″ encoding=”UTF-8″?> needed in using SVG within HTML5 as inline images via <img> or as CSS background-images? This is slightly related to “Are SVG parameters such as ‘xmlns’ and ‘version’ needed”. The namespaces issues are clarified as necessary… Read More »