The Centers for Disease Control and Prevention (CDC) has been at the center of controversy in recent weeks as at least 135 datasets and files have been removed from its public data platform, data.cdc.gov, following President Trump’s inauguration. The removals are believed to be part of an effort to eliminate language deemed objectionable by the Trump administration.
One particular term that has drawn attention is “gender,” as Trump’s executive order on “gender ideology” explicitly forbids federal agencies from using the word “gender” instead of “sex.” As a result, at least 67 items containing the word “gender” have been removed from the CDC’s data platform.
Other datasets related to sensitive topics like gender identity and sexual orientation have also been removed, such as those from the Youth Risk Behavior Surveillance System (YRBSS) and the Behavioral Risk Factor Surveillance System (BRFSS).
Despite these removals, some datasets have been reuploaded with modifications to comply with the new guidelines. For example, the “Heart Disease Mortality Data Among US Adults (35+) by State/Territory and County – 2018-2020” dataset had instances of the term “gender” replaced with “sex” before being republished.
A similar process was seen with the “Alzheimer’s Disease and Healthy Aging Data” dataset, where instances of “gender” were replaced with “sex” before being reuploaded.
STAT has been monitoring the changes to the CDC’s data platform in real-time since January 31 and has observed additional removals, bringing the total to 135. Despite some datasets being reuploaded, there are still discrepancies between the archived data and what is currently available on data.cdc.gov.
STAT has also made efforts to back up all available files from data.cdc.gov, providing a table for users to download original copies of removed datasets. However, it is important to note that these backup files may be out of date compared to what is currently available on the CDC’s website.
This situation is still developing, and STAT will continue to update its analysis as changes are detected in CDC data. It remains to be seen how further changes to federal health websites may impact public access to critical health data. As the administrator of this page, I am dedicated to enhancing its functionality and usefulness for all users. There are several exciting updates that I plan to implement in the near future to provide a more comprehensive experience:
1. Full Diffs of Republished CDC Datasets: I will publish full differentials of all datasets that have been republished by the CDC. This will allow users to track changes and updates made to the data over time.
2. Entire Collection of Backup Files: I intend to make the entire collection of backup files available for easy access. Users will be able to download and access these files whenever needed.
3. Improved Search Functionality: I will enhance the search functionality on the page to make it easier for users to discover specific files and datasets. This will streamline the process of finding relevant information.
4. Enhanced Metadata for File Listings: I plan to add more metadata to file listings, including descriptions, publish and last updated times, and other relevant information. This will provide users with more context about the datasets available on the page.
If you have any suggestions, requests, or need assistance with anything on the page, please don’t hesitate to reach out. Your feedback is valuable in helping me improve the overall user experience.
For those interested in obtaining a complete archive of CDC data, I recommend using the collection available on the Internet Archive. This collection, which is equivalent to STAT’s archive, can be downloaded using a BitTorrent client. With approximately 112 GB of free space required, users can access the torrent file for easy downloading.
Thank you for your continued support, and I look forward to implementing these updates to make the page even more valuable for all users.