Data minimization

The European General Data Protection Regulation (GDPR) often refers to the principle of data minimization. But what does this mean, and how can it be applied in practice?

The EU website states the following: “The principle of “data minimisation” means that a data controller should limit the collection of personal information to what is directly relevant and necessary to accomplish a specified purpose. They should also retain the data only for as long as is necessary to fulfil that purpose. In other words, data controllers should collect only the personal data they really need, and should keep it only for as long as they need it.”

But what is necessary? In the following, I will give you an example of how I was able to implement data minimization in a new project.

Verification of training certificates

One of my customers has to train his employees so that they can use certain devices. The training certificates are printed out and kept. So if an employee wants to use the device, the employee has to look for the corresponding certificate at reception, and it cannot be verified whether the printout was really generated by the training system.

Now they wanted us to build an admin area in which the certificates are displayed so that reception can search for it in this area.

The first name, surname, birthday etc. of the employees should be saved with the certificate for a long time.

What does the GDPR say about this?
I need the data so that reception can see and check everything. It is necessary! Or is it not?

Let’s start with the purpose of the request. The customer wants to verify whether an employee has the certificate.

If I want to display a list of all employees in plain text, I need the data. But is this the only solution to achieve the purpose, or can I perhaps find a more data-minimizing option? The purpose says nothing about displaying the names in plain text. A hash comparison, for example, is sufficient for verification, as is usually done with passwords.

So, how do I implement the project? Whenever I issue a certificate, I hash the personal data. In the administration area, there is a dialog in which the recipient can enter the personal data to check whether a valid certificate is available. The data is also hashed, and a hash comparison is used to determine the match.
Instead of an application that juggles a large amount of personal data, my application no longer stores any personal data at all. And the purpose can still be achieved.

Conclusion

Data minimization is therefore not only to be considered in the direct implementation of a function. Data minimization starts with the design.

How to disable IP address logging for Apache web server and Tomcat

The General Data Protection Regulation (GDPR) prohibits excessive or unnecessary collection of personal data. IP adresses in server log files are considered personal data.

Logging of IP addresses is usually enabled by default in fresh web server installations. This article describes how to disable it for the Apache web server and the Tomcat application server, which are a common combination for JVM based web applications.

Apache Web Server

The apache web server logs the HTTP requests from web clients in log files called *access.log, which includes IP addresses. Another apache log file, which can contain IP addresses is error.log.

To disable IP address logging for Apache edit the main configuration file, usually called httpd.conf and scan it for LogFormat entries, for example:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

The format ‘verb’ %h stands for the hostname or IP address. This is what you want to remove. You may also want to remove the “Referer” and “User-Agent” components of the log format for privacy:

LogFormat "%l %u %t \"%r\" %>s %O" combined

Restart the server for the changes to take effect.

Tomcat Application Server

For Tomcat you have to configure the so-called AccessLogValve in the server.xml configuration file located in the $CATALINA_HOME/conf directory. The configuration entry looks like this:

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
       prefix="localhost_access_log." suffix=".txt"
       pattern="%h %l %u %t "%r" %s %b" />

The pattern attribute specifies the log format. Again, the relevant format verb you want to remove is %h for the hostname or IP address:

       pattern="%l %u %t "%r" %s %b" />

Restart the server for the changes to take effect.