Hold onto your calculators, statisticians!
After three years of fierce debates, conflicting academic papers and a lawsuit, the U.S. Census Bureau on Wednesday announced guidelines for how a controversial statistical method will be applied to the numbers used for drawing congressional and legislative districts. The method is meant to protect the privacy of people who participated in the 2020 census, though critics have claimed it favors confidentiality at the expense of accurate numbers.
The privacy method adds controlled “noise,” or intentional errors, to the data to obscure the identity of any given participant in the 2020 census while still providing statistically valid information. The final guidelines announced by the Census Bureau weigh more in favor of accuracy than privacy compared to past test versions released by the statistical agency that interested parties have been evaluating.
The debate over the method known as differential privacy has resulted in a nerd knife-fight of sorts among statisticians, demographers and the redistricting experts who argued over whether its application would make unusable the numbers used for redrawing congressional and legislative districts. Release of the specific guidelines could further intensify the ongoing debate about the accuracy of numbers gathered during a national headcount that took place in the midst of a global pandemic and an already supercharged political climate.
If you picture the privacy tool as a dial with lower settings offering the most privacy and higher settings providing the most accuracy, the Census Bureau dialed up the accuracy in the final guidelines. The statistical term for this dial is “epsilon,” and the bureau settled on an epsilon of 19.61, significantly higher than where the dial was set in earlier versions that critics raised concerns about.
“The decisions strike the best balance between the need to release detailed, usable statistics from the 2020 Census with our statutory responsibility to protect the privacy of individuals’ data,” said Ron Jarmin, acting director of the Census Bureau. “They were made after many years of research and candid feedback from data users and outside experts -– whom we thank for their invaluable input.”
University of Minnesota demographer Steven Ruggles, who had raised accuracy concerns about earlier versions, said Wednesday that the epsilon in the final guidelines is now so high it won’t offer much privacy protection.
“The inventors of differential privacy regard such a high epsilon as pointless,” Ruggles said.
The state of Alabama sued in an effort to stop differential privacy from being used at all on the redistricting data, claiming it would produce inaccurate numbers, and a panel of three judges could make a decision any day.
The Census Bureau says more privacy protections are needed than in past decades, as technological innovations magnify the threat of people being identified through their census answers, which are confidential by law. Computing power is now so vast that it can easily crunch third-party data sets that combine personal information from credit ratings and social media companies, purchasing records, voting patterns and public documents, among other things.
The redistricting data is expected to be released in mid-August. In September, the Census Bureau will release test data from the 2010 census with differential privacy applied using the final guidelines so researchers can examine how accurate it is.
Princeton University researchers Ari Goldbloom-Helzner and Sam Wang on Wednesday said in an email that the Census Bureau was being responsive to concerns raised and targeting accuracy in smaller jurisdictions. Studying an earlier version with greater privacy restrictions, the two researchers had previously said applying differential privacy had no practical impact on redistricting data.
“At this point my team is confident that the data will be fully fit for redistricting,” Wang said.
(AP)