In November 2014, CMS made history by releasing its first batch of open data, comprising approximately 27 terabytes of proton-proton collision data collected in 2010 at a 7 TeV center-of-mass energy. This groundbreaking release marked the beginning of a new era in particle physics at the LHC, where researchers, educators, and enthusiasts worldwide could access and analyze real collider data.
Over the past decade, CMS has maintained its commitment to open science, and in accordance with its open data policy has been regularly releasing new datasets and improving the tools and documentation available to users. The collaboration has consistently adhered to its policy of making 50% of its analyzable data publicly available after six years of collection; ten years after collection 100% of the data is to be released. This dedication has resulted in the release of nearly 5 petabytes of data, including both real collision data and simulated events.
All CMS open data are available via the CERN Open Data Portal. The portal, which is celebrating its ten-year anniversary as well, provides access to the datasets, search functionality, documentation, and storage.
The impact of these releases has been substantial. Researchers unaffiliated with CMS have published over 70 novel papers using the open data, while students around the world have had the opportunity to "rediscover" the Higgs boson using real CMS data and to make other measurements in the context of the International Masterclasses. The initiative has also fostered collaboration between particle physicists and data scientists, leading to advancements in machine-learning applications for high-energy physics.
CMS's pioneering efforts in open data have significantly influenced CERN's overall open data policy. In December 2020, CERN announced a new policy for scientific experiments at the Large Hadron Collider, committing to release level 3 scientific data typically within five years of collection. This policy, which applies to all four main LHC experiments, was shaped by the success and experience gained from CMS's open data initiatives. It reflects a broader shift towards open science practices across particle physics, demonstrating the far-reaching impact of CMS's commitment to data sharing.
As we look back on a decade of open data from CMS, we see a legacy of innovation, collaboration, and scientific progress. The experiment's commitment to transparency has not only enhanced the field's scientific output but has also increased its societal impact and public engagement. This anniversary serves as a testament to the power of open science. It sets a strong foundation for future advancements in particle physics research and education and shapes the landscape of open data policies in high-energy physics.
This success would not have been possible without the support of the CMS Collaboration and CERN, in particular the Open Data team in CERN IT and Scientific Information Services.
Find below all of the CMS Open Data releases:
- CMS releases first batch of high-level LHC Open Data
- CMS releases new batch of research data from LHC
- Observing the Higgs with over one petabyte of new CMS Open Data
- CMS releases open data for Machine Learning
- CMS completes 2010-2011 proton-proton data release
- CMS releases heavy-ion data from 2010 and 2011
- First CMS open data from LHC Run 2 released
- CMS completes the release of its entire Run-1 proton-proton data
- CMS completes Run-1 heavy ion open data collection
- CMS releases 13 TeV proton collision data from 2016