Thursday, December 23, 2010

Accents becoming garbage on form submits and Character Encoding

How non-ASCII data like Accents and Apostrophes gets displayed in HTML, retrieving such data through form fields and how data gets stored in the database with correct format is handled in few place. The trick is, in all these places the encoding format should be the same, which is typically UTF-8.

Following are the locations where encoding format is defined and how its done.

Page content format:
Set the HTML file's character encoding format through HTTP headers or meta tags.
Eg: < meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />  
If this is not set browsers assume  ISO-8859-1 to be the default character encoding format.

Form submits:
GET and POST request parameter are also encoded according to the page encoding format.
This can be overridden by using the accept-charset="UTF-8" attribute in the form tag. 


Server Request Parameters:
In Servelts, JSPs and Portlets, request paramter encoding format can be set by the following statement.
request.setCharacterEncoding("UTF-8");
If this is not set the web servers assumes the default format as ISO-8859-1.
To make the things more generic, the encoding format can be set
in doFilter method of a Servlet Filter.

public void doFilter(ServletRequest request, ServletResponse response,
                         FilterChain chain)
   throws IOException, ServletException {
        if (request.getCharacterEncoding() == null) {
            String encoding = "UTF-8";
            if (encoding != null)
                request.setCharacterEncoding(encoding);
        }

chain.doFilter(request, response);       

//do it again, since JSPs will set it to the default       
     if (encoding != null)
        request.setCharacterEncoding(encoding);

}

One tricky point is if the form encryption type is "multipart/form-data" then each value should be
read by specifying the encoding type like below.

FileItem item = (FileItem) iter.next();
if (item.isFormField())
value = item.getString("UTF-8").trim();

The final point the data encoding format should be handled is the Database it self. This should be
set at the time the database is created. If this format is changed after creating the database, existing
data will be corrupted.


Reference : http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

Friday, December 10, 2010

Oracle License

Oracle’s technology products are licensed using two metrics: Named User Plus (NUP) or Processor.

Named User Plus Metric:
This metric is used in environments where users can be identified and counted. Named User Plus includes both humans and non-human operated devices. A licensed Named User Plus may access the program on any instances where it is deployed, provided that the minimum on each server is met.
Named User Plus minimum is 25 Named Users Plus per Processor. Total number of Named User Plus Licenses required are  either Named User Plus for total processors or Total number of Named Users which ever is greater.

Processor Metric:
This metric is mostly used in environments where the software users cannot be easily identified or counted, such as internet-based applications.
Total Number of Licensable Processors = (number of processors) *(number of cores)*(multi-core factor)

Multi-core factor is:
0.25 for SUN's UltraSparc T1 processors
0.50 for Intel and AMD processors
0.75 for all other multi-core processors
1.00 for single-core processors

Useful commands to check the server configurations:
Number of CPUs and Cores => grep -i core /proc/cpuinfo
processor model and all other info => cat /proc/cpuinfo
Linux Version => cat /proc/version

Subscribe