Following are the locations where encoding format is defined and how its done.
Page content format:
Set the HTML file's character encoding format through HTTP headers or meta tags.
Eg: < meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
If this is not set browsers assume ISO-8859-1 to be the default character encoding format.
Form submits:
GET and POST request parameter are also encoded according to the page encoding format.
This can be overridden by
using the accept-charset
="UTF-8" attribute in the form tag. Server Request Parameters:
In Servelts, JSPs and Portlets, request paramter encoding format can be set by the following statement.
request.setCharacterEncoding("UTF-8");
If this is not set the web servers assumes the default format as ISO-8859-1.
To make the things more generic, the encoding format can be set
in doFilter method of a Servlet Filter.
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain)
throws IOException, ServletException {
if (request.getCharacterEncoding() == null) {
String encoding = "UTF-8";
if (encoding != null)
request.setCharacterEncoding(encoding);
}
chain.doFilter(request, response);
//do it again, since JSPs will set it to the default
if (encoding != null)
request.setCharacterEncoding(encoding);
}
One tricky point is if the form encryption type is "multipart/form-data" then each value should be
read by specifying the encoding type like below.
FileItem item = (FileItem) iter.next();
if (item.isFormField())
value = item.getString("UTF-8").trim();
The final point the data encoding format should be handled is the Database it self. This should be
set at the time the database is created. If this format is changed after creating the database, existing
data will be corrupted.
Reference : http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/