IT specialists discovered that public document registers of state agencies are full of delicate personal information – from home addresses and passport numbers to severity of disability.
Stacks of sensitive data lying unprotected
A team participating in this weekend's Garage48 hackathon, that concentrated on public and big data this year, announced that they cannot publish the results of their project as it includes too much personal information they stumbled upon in public registers. There are hundreds of such registers all over Estonia – every ministry, agency, local government, school etc. have their own digital documents register.
Analysis of free-to-access documents from a few years ago produced delicate personal information that certainly should not be available to anyone with the skill to look for it.
At the head of team Psii that made the unpleasant discovery was chief of machine learning at Nortal Lauri Ilison who delivered a memorably stern presentation about it at the hackathon.
Ilison said that the team managed to build a program to scour public registers and look for documents that could hold delicate information in 48 hours. „We spent the most time waiting for downloads to finish,“ he said.
Next the team set about going over the documents their scanner had found. „We were shocked to discover we cannot publish these results. Instead we found we should shut the scanner down,“ Ilison said.
He said the team came across documents that included names and severity of disability of people. The team's program had discovered this particular data in the document registers of local governments.
„We initially thought we wouldn't find anything; however, the truth was we stumbled across something right away. We will not be providing any information on how and what we found exactly because we want the data protection watchdog to clean up this mess,“ Ilison said. He added the team has notified the agency of its find.
A similar problem was discovered back in April by Estonian startup Texta that created its own document registers analysis tool. Co-founder of Texta Silver Traat said they discovered a lot of highly detailed personal information in the documents register of the education ministry.
„We held a workshop as part of a language technology conference where we did what the state lacks the capacity to do itself. We downloaded 150,000 documents from the ministry's document register and discovered that they held, among other things, people's personal identification numbers, bank account numbers, addresses. We even came across some passport numbers,“ Traat described. He added that most of the information was from employment contracts.
„While a personal identification number does not constitute delicate personal information, a set of data that also includes the person's name, bank account number, and other things does,“ Traat said.
The company, that analyzed documents in the ministry's register following a request from Postimees, managed to find 39 employment contracts from among 3,000 documents in a single day. The state maintains hundreds of document registers that include millions of documents.
„The problem goes much further. We could easily analyze the entire register, as well as those of other ministries should the custodian develop a corresponding interest,“ Traat said. The co-founder said Texta has notified the ministry of its findings.
While the Estonian Data Protection Inspectorate regularly checks the security of document registers, it does so by hand. Control is often followed by supervision proceedings and less often by fines.
„We look at document registers more closely and hold a major survey once a year,“ said the agency's PR adviser Maire Iro. The watchdog looks at whether registers of ministries, county governments, and government agencies offer public access to documents that should not be publicly available, as well as that documents that need to be public are accessible.
„We have launched numerous supervision proceedings; however, the need for control action often disappears as institutions realize their mistakes and correct them,“ Iro said.
That said, the inspectorate has been forced to bring misdemeanor proceedings. „When someone has published delicate personal information or privileged information in great volume. Misdemeanor proceedings result in fines,“ Iro explained.
The watchdog has imposed fines in cases where document registers have offered free access to health data, documents with information on domestic violence or custody. One local government's document register offered public access to a forensic psychiatric examination report – its publication resulted in a misdemeanor proceeding. Fines have amounted to around €100.
It is said the situation has improved in recent years. „Officials are better aware and able to pay attention to protection of sensitive information. It is constant work as new documents are registered every day, people move and have to be trained on public information. That is why mistakes happen sometimes,“ Iro said.
The Ministry of Education and Research does not find the problem to be serious. Deputy director of the ministry's general department Terje Mäesalu admitted that it is possible registers offer access to some letters that should not be public.
„Because authors of letters are responsible for their publication as well as restricting access to them, it is possible some things are made public by accident. However, we usually correct these kinds of mistakes quickly upon learning of them. Rather people turn to us to ask why some documents aren't public,“ Mäesalu said.
She added that spot checks are carried out regularly. „Data presented by Texta makes it difficult to understand from which lines of which documents the information has been taken; that is why our finds were limited to names and personal identification numbers both of which are generally public information,“ Mäesalu said.
Adviser at the Estonian Information System's Authority Andres Kütt said that while document registers are in need of systematic reorganization, he does not perceive a direct security risk in the publication of sensitive personal information.
„Personal information has always been publicly accessible in those systems, it simply hasn't been highlighted like that before,“ Kütt said. He added that new IT solutions provide tools with which to highlight mistakes in registers so work could begin on correcting them.