{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T05:38:01Z","timestamp":1769060281062,"version":"3.49.0"},"publisher-location":"Cham","reference-count":85,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031986819","type":"print"},{"value":"9783031986826","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T00:00:00Z","timestamp":1753228800000},"content-version":"vor","delay-in-days":203,"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>GPU computing is embracing weak memory concurrency for performance improvement. However, compared to CPUs, modern GPUs provide more fine-grained concurrency features such as scopes, have additional properties like divergence, and thereby follow different weak memory consistency models. These features and properties make concurrent programming on GPUs more complex and error-prone. To this end, we present <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\textsf{GPUMC}$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>GPUMC<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>, a stateless model checker to check the correctness of GPU shared-memory concurrent programs under scoped-RC11 weak memory concurrency model. <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\textsf{GPUMC}$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>GPUMC<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> explores all possible executions in GPU programs to reveal various errors - races, barrier divergence, and assertion violations. In addition, <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\textsf{GPUMC}$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>GPUMC<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> also automatically repairs these errors in the appropriate cases.<\/jats:p>\n          <jats:p>We evaluate <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\textsf{GPUMC}$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>GPUMC<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> on benchmarks and real-life GPU programs. <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\textsf{GPUMC}$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>GPUMC<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> is efficient both in time and memory in verifying large GPU programs where state-of-the-art tools are timed out. In addition, <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\textsf{GPUMC}$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>GPUMC<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> identifies all known errors in these benchmarks compared to the state-of-the-art tools.<\/jats:p>","DOI":"10.1007\/978-3-031-98682-6_17","type":"book-chapter","created":{"date-parts":[[2025,7,22]],"date-time":"2025-07-22T03:15:21Z","timestamp":1753154121000},"page":"321-346","update-policy":"https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["$$\\textsf{GPUMC}$$: A Stateless Model Checker for\u00a0GPU Weak Memory Concurrency"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0002-4454-2050","authenticated-orcid":false,"given":"Soham","family":"Chakraborty","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0003-0925-398X","authenticated-orcid":false,"given":"S.","family":"Krishna","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0002-8943-0722","authenticated-orcid":false,"given":"Andreas","family":"Pavlogiannis","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0009-0004-1197-5556","authenticated-orcid":false,"given":"Omkar","family":"Tuppe","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,23]]},"reference":[{"key":"17_CR1","unstructured":"Vulkan memory model. https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/KhronosGroup\/Vulkan-MemoryModel"},{"key":"17_CR2","unstructured":"Cuda C++ programming guide (2024). https:\/\/2.zoppoz.workers.dev:443\/https\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html"},{"key":"17_CR3","unstructured":"Cuda core compute libraries (CCCL) (2024). https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/NVIDIA\/cccl"},{"key":"17_CR4","unstructured":"Cutlass 3.6.0 (2024). https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/NVIDIA\/cutlass"},{"key":"17_CR5","unstructured":"The opencl$$^{\\text{TM}}$$ specification (2024). https:\/\/2.zoppoz.workers.dev:443\/https\/registry.khronos.org\/OpenCL\/specs\/3.0-unified\/html\/OpenCL_API.html"},{"key":"17_CR6","doi-asserted-by":"crossref","unstructured":"Abdulla, P.A., Aronis, S., Atig, M.F., Jonsson, B., Leonardsson, C., Sagonas, K.: Stateless model checking for TSO and PSO. In: Baier, C., Tinelli, C. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, pp. 353\u2013367. Springer, Heidelberg (2015)","DOI":"10.1007\/978-3-662-46681-0_28"},{"key":"17_CR7","doi-asserted-by":"publisher","unstructured":"Abdulla, P.A., Aronis, S., Jonsson, B., Sagonas, K.: Source sets: a foundation for optimal dynamic partial order reduction. J. ACM 64(4), 25:1\u201325:49 (2017). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3073408","DOI":"10.1145\/3073408"},{"key":"17_CR8","doi-asserted-by":"crossref","unstructured":"Abdulla, P.A., Atig, M.F., Das, S., Jonsson, B., Sagonas, K.: Parsimonious optimal dynamic partial order reduction. In: International Conference on Computer Aided Verification, pp. 19\u201343. Springer (2024)","DOI":"10.1007\/978-3-031-65630-9_2"},{"key":"17_CR9","doi-asserted-by":"publisher","unstructured":"Abdulla, P.A., Atig, M.F., Jonsson, B., L\u00e5ng, M., Ngo, T.P., Sagonas, K.: Optimal stateless model checking for reads-from equivalence under sequential consistency. Proc. ACM Program. Lang. 3(OOPSLA), 150:1\u2013150:29 (2019). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3360576","DOI":"10.1145\/3360576"},{"key":"17_CR10","doi-asserted-by":"publisher","unstructured":"Abdulla, P.A., Atig, M.F., Jonsson, B., Ngo, T.P.: Optimal stateless model checking under the release-acquire semantics. Proc. ACM Program. Lang. 2(OOPSLA) (2018). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3276505","DOI":"10.1145\/3276505"},{"key":"17_CR11","doi-asserted-by":"publisher","unstructured":"Abdulla, P.A., Atig, M.F., Krishna, S., Gupta, A., Tuppe, O.: Optimal stateless model checking for causal consistency. In: Proceedings of TACAS 2023, vol. 13993, pp. 105\u2013125. Springer (2023). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/978-3-031-30823-9_6","DOI":"10.1007\/978-3-031-30823-9_6"},{"key":"17_CR12","doi-asserted-by":"publisher","unstructured":"Alglave, J., et al.: GPU concurrency: weak behaviours and programming assumptions. In: PASPLOS 2015, pp. 577\u2013591. https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2694344.2694391","DOI":"10.1145\/2694344.2694391"},{"key":"17_CR13","doi-asserted-by":"publisher","unstructured":"Alglave, J., Deacon, W., Grisenthwaite, R., Hacquard, A., Maranget, L.: Armed cats: formal concurrency modelling at arm. ACM Trans. Program. Lang. Syst. 43(2) (2021). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3458926","DOI":"10.1145\/3458926"},{"key":"17_CR14","doi-asserted-by":"crossref","unstructured":"Alur, R., Devietti, J., Navarro Leija, O.S., Singhania, N.: GPUDrano: detecting uncoalesced accesses in GPU programs. In: Majumdar, R., Kun\u010dak, V. (eds.) Computer Aided Verification, pp. 507\u2013525. Springer, Cham (2017)","DOI":"10.1007\/978-3-319-63387-9_25"},{"key":"17_CR15","doi-asserted-by":"publisher","unstructured":"Batty, M., Donaldson, A.F., Wickerson, J.: Overhauling SC atomics in C11 and OpenCL. In: POPL 2016, pp. 634\u2013648. ACM (2016). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2837614.2837637","DOI":"10.1145\/2837614.2837637"},{"key":"17_CR16","doi-asserted-by":"publisher","unstructured":"Batty, M., Owens, S., Sarkar, S., Sewell, P., Weber, T.: Mathematizing C++ concurrency. In: POPL 2011, pp. 55\u201366. ACM (2011). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/1926385.1926394","DOI":"10.1145\/1926385.1926394"},{"key":"17_CR17","doi-asserted-by":"publisher","unstructured":"Betts, A., Chong, N., Donaldson, A.F., Qadeer, S., Thomson, P.: GPUverify: a verifier for GPU kernels. In: Leavens, G.T., Dwyer, M.B. (eds.) OOPSLA 2012, pp. 113\u2013132 (2012). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2384616.2384625","DOI":"10.1145\/2384616.2384625"},{"key":"17_CR18","doi-asserted-by":"publisher","unstructured":"Bui, T.L., Chatterjee, K., Gautam, T., Pavlogiannis, A., Toman, V.: The reads-from equivalence for the TSO and PSO memory models. Proc. ACM Program. Lang. 5(OOPSLA), 1\u201330 (2021). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3485541","DOI":"10.1145\/3485541"},{"key":"17_CR19","doi-asserted-by":"publisher","unstructured":"Chakraborty, S., Krishna, S., Pavlogiannis, A., Tuppe, O.: Supplementary material for GPUMC (2025). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.6084\/m9.figshare.29143991.v1, https:\/\/2.zoppoz.workers.dev:443\/https\/figshare.com\/articles\/dataset\/Supplementary_material_for_GPUMC\/29143991","DOI":"10.6084\/m9.figshare.29143991.v1"},{"key":"17_CR20","doi-asserted-by":"crossref","unstructured":"Chakraborty, S., Vafeiadis, V.: Formalizing the concurrency semantics of an LLVM fragment. In: CGO 2017, pp. 100\u2013110 (2017)","DOI":"10.1109\/CGO.2017.7863732"},{"key":"17_CR21","doi-asserted-by":"publisher","unstructured":"Chakraborty, S., Vafeiadis, V.: Grounding thin-air reads with event structures 3(POPL) (2019). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3290383","DOI":"10.1145\/3290383"},{"key":"17_CR22","doi-asserted-by":"publisher","unstructured":"Chalupa, M., Chatterjee, K., Pavlogiannis, A., Sinha, N., Vaidya, K.: Data-centric dynamic partial order reduction. Proc. ACM Program. Lang. 2(POPL) (2017). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3158119","DOI":"10.1145\/3158119"},{"key":"17_CR23","doi-asserted-by":"crossref","unstructured":"Clarke, E.M., Emerson, E.A., Sistla, A.P.: Automatic verification of finite state concurrent systems using temporal logic specifications: a practical approach. In: Proceedings of POPL 1983, pp. 117\u2013126. ACM Press (1983)","DOI":"10.1145\/567067.567080"},{"key":"17_CR24","doi-asserted-by":"publisher","unstructured":"Clarke, E.M., Grumberg, O., Minea, M., Peled, D.A.: State space reduction using partial order techniques. Int. J. Softw. Tools Technol. Transf. 2(3), 279\u2013287 (1999). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/s100090050035","DOI":"10.1007\/s100090050035"},{"key":"17_CR25","doi-asserted-by":"crossref","unstructured":"Cogumbreiro, T., Lange, J., Rong, D., Zicarelli, H.: Checking data-race freedom of GPU kernels, compositionally. In: Silva, A., Leino, K. (eds.) Computer Aided Verification, pp. 403\u2013426. Springer, Cham (2021)","DOI":"10.1007\/978-3-030-81685-8_19"},{"issue":"6","key":"17_CR26","doi-asserted-by":"publisher","first-page":"775","DOI":"10.1002\/cpe.2853","volume":"25","author":"B Coutinho","year":"2013","unstructured":"Coutinho, B., Sampaio, D., Pereira, F.M., Meira, W., Jr.: Profiling divergences in GPU applications. Concurrency Comput. Pract. Exp. 25(6), 775\u2013789 (2013)","journal-title":"Concurrency Comput. Pract. Exp."},{"key":"17_CR27","doi-asserted-by":"publisher","unstructured":"Eizenberg, A., Peng, Y., Pigli, T., Mansky, W., Devietti, J.: Barracuda: binary-level analysis of runtime races in Cuda programs. In: PLDI 2017. Association for Computing Machinery, New York (2017). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3062341.3062342","DOI":"10.1145\/3062341.3062342"},{"key":"17_CR28","doi-asserted-by":"publisher","unstructured":"Flanagan, C., Godefroid, P.: Dynamic partial-order reduction for model checking software. In: Palsberg, J., Abadi, M. (eds.) POPL 2005, pp. 110\u2013121. ACM (2005). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/1040305.1040315","DOI":"10.1145\/1040305.1040315"},{"key":"17_CR29","unstructured":"Francis, E.: Autonomous cars: no longer just science fiction (2014)"},{"issue":"1","key":"17_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2701618","volume":"12","author":"BR Gaster","year":"2015","unstructured":"Gaster, B.R., Hower, D., Howes, L.: HRF-relaxed: adapting HRF to the complexities of industrial heterogeneous memory models. ACM Trans. Arch. Code Optim. (TACO) 12(1), 1\u201326 (2015)","journal-title":"ACM Trans. Arch. Code Optim. (TACO)"},{"key":"17_CR31","doi-asserted-by":"publisher","unstructured":"Gibbons, P.B., Korach, E.: Testing shared memories. SIAM J. Comput. 26(4), 1208\u20131244 (1997). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1137\/S0097539794279614","DOI":"10.1137\/S0097539794279614"},{"key":"17_CR32","doi-asserted-by":"publisher","unstructured":"Godefroid, P. (ed.): Partial-Order Methods for the Verification of Concurrent Systems. LNCS, vol. 1032. Springer, Heidelberg (1996). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/3-540-60761-7","DOI":"10.1007\/3-540-60761-7"},{"key":"17_CR33","doi-asserted-by":"publisher","unstructured":"Goens, A., Chakraborty, S., Sarkar, S., Agarwal, S., Oswald, N., Nagarajan, V.: Compound memory models. Proc. ACM Program. Lang. 7(PLDI) (2023). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3591267","DOI":"10.1145\/3591267"},{"key":"17_CR34","doi-asserted-by":"publisher","unstructured":"Holey, A., Mekkat, V., Zhai, A.: HAccRG: hardware-accelerated data race detection in GPUs. In: 2013 42nd International Conference on Parallel Processing, pp. 60\u201369 (2013). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1109\/ICPP.2013.15","DOI":"10.1109\/ICPP.2013.15"},{"key":"17_CR35","doi-asserted-by":"crossref","unstructured":"Hower, D.R., et al.: Heterogeneous-race-free memory models. In: ASPLOS 2014, pp. 427\u2013440 (2014)","DOI":"10.1145\/2541940.2541981"},{"key":"17_CR36","unstructured":"ISO\/IEC 14882: Programming language C++ (2011)"},{"key":"17_CR37","unstructured":"ISO\/IEC 9899: Programming language C (2011)"},{"key":"17_CR38","doi-asserted-by":"publisher","unstructured":"Jeffrey, A., Riely, J.: On thin air reads towards an event structures model of relaxed memory. In: LICS 2016, pp. 759\u2013767 (2016). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2933575.2934536","DOI":"10.1145\/2933575.2934536"},{"key":"17_CR39","doi-asserted-by":"publisher","unstructured":"Kamath, A.K., Basu, A.: Iguard: in-GPU advanced race detection. In: SOSP 2021, pp. 49\u201365 (2021). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3477132.3483545","DOI":"10.1145\/3477132.3483545"},{"key":"17_CR40","doi-asserted-by":"publisher","unstructured":"Kamath, A.K., George, A.A., Basu, A.: ScoRD: a scoped race detector for GPUs. In: 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 1036\u20131049 (2020). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1109\/ISCA45697.2020.00088","DOI":"10.1109\/ISCA45697.2020.00088"},{"key":"17_CR41","doi-asserted-by":"publisher","unstructured":"Kang, J., Hur, C.K., Lahav, O., Vafeiadis, V., Dreyer, D.: A promising semantics for relaxed-memory concurrency. In: POPL 2017, pp. 175\u2013189 (2017). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3009837.3009850","DOI":"10.1145\/3009837.3009850"},{"key":"17_CR42","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/J.SCICO.2017.04.009","volume":"148","author":"J Ketema","year":"2017","unstructured":"Ketema, J., Donaldson, A.F.: Termination analysis for GPU kernels. Sci. Comput. Program. 148, 107\u2013122 (2017). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1016\/J.SCICO.2017.04.009","journal-title":"Sci. Comput. Program."},{"key":"17_CR43","doi-asserted-by":"publisher","unstructured":"Kokologiannakis, M., Lahav, O., Sagonas, K., Vafeiadis, V.: Effective stateless model checking for C\/C++ concurrency. Proc. ACM Program. Lang. 2(POPL), 17:1\u201317:32 (2018). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3158105","DOI":"10.1145\/3158105"},{"key":"17_CR44","doi-asserted-by":"publisher","unstructured":"Kokologiannakis, M., Marmanis, I., Gladstein, V., Vafeiadis, V.: Truly stateless, optimal dynamic partial order reduction. Proc. ACM Program. Lang. 6(POPL), 1\u201328 (2022). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3498711","DOI":"10.1145\/3498711"},{"key":"17_CR45","doi-asserted-by":"crossref","unstructured":"Kokologiannakis, M., Raad, A., Vafeiadis, V.: Model checking for weakly consistent libraries. In: Proceedings of PLDI 2019, pp. 96\u2013110. ACM (2019)","DOI":"10.1145\/3314221.3314609"},{"key":"17_CR46","doi-asserted-by":"crossref","unstructured":"Kokologiannakis, M., Vafeiadis, V.: Bam: efficient model checking for barriers. In: International Conference on Networked Systems, pp. 223\u2013239. Springer (2021)","DOI":"10.1007\/978-3-030-91014-3_16"},{"key":"17_CR47","doi-asserted-by":"publisher","unstructured":"Lahav, O., Vafeiadis, V.: Owicki-Gries reasoning for weak memory models. In: ICALP 2015, pp. 311\u2013323 (2015). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/978-3-662-47666-6_25","DOI":"10.1007\/978-3-662-47666-6_25"},{"key":"17_CR48","doi-asserted-by":"publisher","unstructured":"Lahav, O., Vafeiadis, V., Kang, J., Hur, C.K., Dreyer, D.: Repairing sequential consistency in C\/C++11. In: PLDI 2017, pp. 618\u2013632 (2017). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3062341.3062352, https:\/\/2.zoppoz.workers.dev:443\/https\/plv.mpi-sws.org\/scfix\/full.pdf","DOI":"10.1145\/3062341.3062352"},{"issue":"9","key":"17_CR49","doi-asserted-by":"publisher","first-page":"690","DOI":"10.1109\/TC.1979.1675439","volume":"28","author":"L Lamport","year":"1979","unstructured":"Lamport, L.: How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. 28(9), 690\u2013691 (1979). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1109\/TC.1979.1675439","journal-title":"IEEE Trans. Comput."},{"key":"17_CR50","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1007\/978-3-030-99527-0_24","volume-title":"Tools and Algorithms for the Construction and Analysis of Systems","author":"H Ponce-de Le\u00f3n","year":"2022","unstructured":"Ponce-de Le\u00f3n, H., Haas, T., Meyer, R.: Dartagnan: SMT-based violation witness validation (competition contribution). In: Fisman, D., Rosu, G. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, pp. 418\u2013423. Springer International Publishing, Cham (2022)"},{"key":"17_CR51","doi-asserted-by":"publisher","unstructured":"Levine, R., Cho, M., McKee, D., Quinn, A., Sorensen, T.: GPUHarbor: testing GPU memory consistency at large (experience paper). In: ISSTA 2023, pp. 779\u2013791 (2023). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3597926.3598095","DOI":"10.1145\/3597926.3598095"},{"key":"17_CR52","unstructured":"Ponce\u00a0de Le\u00f3n, H.: Dat3m (2024). https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/hernanponcedeleon\/Dat3M"},{"key":"17_CR53","doi-asserted-by":"publisher","unstructured":"Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: FSE 2010, Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 187\u2013196. Association for Computing Machinery, New York (2010). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/1882291.1882320","DOI":"10.1145\/1882291.1882320"},{"key":"17_CR54","doi-asserted-by":"publisher","unstructured":"Li, G., Gopalakrishnan, G.: Parameterized verification of GPU kernel programs. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, pp. 2450\u20132459 (2012). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1109\/IPDPSW.2012.302","DOI":"10.1109\/IPDPSW.2012.302"},{"key":"17_CR55","doi-asserted-by":"publisher","unstructured":"Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: PPOPP 2012, pp. 215\u2013224 (2012). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2145816.2145844","DOI":"10.1145\/2145816.2145844"},{"key":"17_CR56","doi-asserted-by":"publisher","unstructured":"Li, P., Li, G., Gopalakrishnan, G.: Practical symbolic race checking of GPU programs. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 179\u2013190 (2014). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1109\/SC.2014.20","DOI":"10.1109\/SC.2014.20"},{"key":"17_CR57","doi-asserted-by":"crossref","unstructured":"Li, P., et al.: LD: low-overhead GPU race detection without access monitoring. ACM Trans. Arch. Code Optim. (TACO) 14(1), 1\u201325 (2017)","DOI":"10.1145\/3046678"},{"key":"17_CR58","doi-asserted-by":"publisher","unstructured":"Lidbury, C., Donaldson, A.F.: Dynamic race detection for C++11. In: POPL 2017, pp. 443\u2013457 (2017). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3009837.3009857","DOI":"10.1145\/3009837.3009857"},{"key":"17_CR59","doi-asserted-by":"publisher","unstructured":"Luo, W., Demsky, B.: C11Tester: A Race Detector for C\/C++ Atomics, pp. 630\u2013646. Association for Computing Machinery, New York (2021). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3445814.3446711","DOI":"10.1145\/3445814.3446711"},{"key":"17_CR60","doi-asserted-by":"publisher","unstructured":"Lustig, D., Cooksey, S., Giroux, O.: Mixed-proxy extensions for the NVIDIA PTX memory consistency model: industrial product. In: ISCA 2022 (2022). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3470496.3533045","DOI":"10.1145\/3470496.3533045"},{"key":"17_CR61","doi-asserted-by":"publisher","unstructured":"Lustig, D., Sahasrabuddhe, S., Giroux, O.: A formal analysis of the NVIDIA PTX memory consistency model. In: ASPLOS 2019, pp. 257\u2013270. ACM (2019). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3297858.3304043","DOI":"10.1145\/3297858.3304043"},{"key":"17_CR62","doi-asserted-by":"publisher","unstructured":"Mathur, U., Pavlogiannis, A., Viswanathan, M.: The complexity of dynamic data race prediction. In: LICS 2020, pp. 713\u2013727. Association for Computing Machinery, New York (2020). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3373718.3394783","DOI":"10.1145\/3373718.3394783"},{"key":"17_CR63","doi-asserted-by":"publisher","unstructured":"Moiseenko, E., Kokologiannakis, M., Vafeiadis, V.: Model checking for a multi-execution memory model. Proc. ACM Program. Lang. 6(OOPSLA2) (2022). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3563315","DOI":"10.1145\/3563315"},{"key":"17_CR64","doi-asserted-by":"publisher","unstructured":"Norris, B., Demsky, B.: A practical approach for model checking C\/C++11 code. ACM Trans. Program. Lang. Syst. 38(3) (2016). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2806886","DOI":"10.1145\/2806886"},{"key":"17_CR65","doi-asserted-by":"publisher","unstructured":"Orr, M.S., Che, S., Yilmazer, A., Beckmann, B.M., Hill, M.D., Wood, D.A.: Synchronization using remote-scope promotion. In: ASPLOS 2015, pp. 73\u201386 (2015). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2694344.2694350","DOI":"10.1145\/2694344.2694350"},{"issue":"2","key":"17_CR66","doi-asserted-by":"publisher","first-page":"2840","DOI":"10.1007\/s11227-021-03980-5","volume":"78","author":"\u00d6 \u00d6zerk","year":"2022","unstructured":"\u00d6zerk, \u00d6., Elgezen, C., Mert, A.C., \u00d6zt\u00fcrk, E., Sava\u015f, E.: Efficient number theoretic transform implementation on GPU for homomorphic encryption. J. Supercomput. 78(2), 2840\u20132872 (2022)","journal-title":"J. Supercomput."},{"issue":"3","key":"17_CR67","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1038\/s42256-022-00463-x","volume":"4","author":"M Pandey","year":"2022","unstructured":"Pandey, M., et al.: The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4(3), 211\u2013221 (2022)","journal-title":"Nat. Mach. Intell."},{"key":"17_CR68","doi-asserted-by":"publisher","unstructured":"Courcoubetis, C. (ed.): CAV 1993. LNCS, vol. 697. Springer, Heidelberg (1993). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/3-540-56922-7","DOI":"10.1007\/3-540-56922-7"},{"key":"17_CR69","doi-asserted-by":"publisher","unstructured":"Peng, Y., Grover, V., Devietti, J.: Curd: a dynamic Cuda race detector. In: PLDI 2018, pp. 390\u2013403. Association for Computing Machinery, New York (2018). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3192366.3192368","DOI":"10.1145\/3192366.3192368"},{"key":"17_CR70","doi-asserted-by":"publisher","unstructured":"Podkopaev, A., Lahav, O., Vafeiadis, V.: Bridging the gap between programming languages and hardware weak memory models. Proc. ACM Program. Lang. 3(POPL) (2019). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3290382","DOI":"10.1145\/3290382"},{"key":"17_CR71","doi-asserted-by":"publisher","unstructured":"Pulte, C., Flur, S., Deacon, W., French, J., Sarkar, S., Sewell, P.: Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8. PACMPL 2(POPL), 19:1\u201319:29 (2018). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3158107","DOI":"10.1145\/3158107"},{"key":"17_CR72","doi-asserted-by":"publisher","unstructured":"Reinders, J., Ashbaugh, B., Brodman, J., Kinsner, M., Pennycook, J., Tian, X.: Data Parallel C++: Programming Accelerated Systems Using C++ and SYCL, 2 edn. Apress, Berkeley, CA (2023). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/978-1-4842-9691-2","DOI":"10.1007\/978-1-4842-9691-2"},{"key":"17_CR73","doi-asserted-by":"publisher","unstructured":"Reuther, A., Michaleas, P., Jones, M., Gadepally, V., Samsi, S., Kepner, J.: Survey and benchmarking of machine learning accelerators. In: 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp.\u00a01\u20139 (2019). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1109\/HPEC.2019.8916327","DOI":"10.1109\/HPEC.2019.8916327"},{"key":"17_CR74","doi-asserted-by":"publisher","unstructured":"Sorensen, T., Donaldson, A.F.: Exposing errors related to weak memory in GPU applications. In: Krintz, C., Berger, E.D. (eds.) Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13\u201317, 2016, pp. 100\u2013113. ACM (2016). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2908080.2908114","DOI":"10.1145\/2908080.2908114"},{"key":"17_CR75","doi-asserted-by":"publisher","unstructured":"Sorensen, T., Donaldson, A.F., Batty, M., Gopalakrishnan, G., Rakamari\u0107, Z.: Portable inter-workgroup barrier synchronisation for GPUs. In: OOPSLA 2016, Association for Computing Machinery, pp. 39\u201358 (2016). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2983990.2984032","DOI":"10.1145\/2983990.2984032"},{"key":"17_CR76","doi-asserted-by":"publisher","unstructured":"Sorensen, T., Evrard, H., Donaldson, A.F.: GPU schedulers: how fair is fair enough? In: Schewe, S., Zhang, L. (eds.) 29th International Conference on Concurrency Theory, CONCUR 2018, September 4\u20137, 2018, Beijing, China. LIPIcs, vol.\u00a0118, pp. 23:1\u201323:17 (2018). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.4230\/LIPICS.CONCUR.2018.23","DOI":"10.4230\/LIPICS.CONCUR.2018.23"},{"key":"17_CR77","doi-asserted-by":"publisher","unstructured":"Sorensen, T., Salvador, L.F., Raval, H., Evrard, H., Wickerson, J., Martonosi, M., Donaldson, A.F.: Specifying and testing GPU workgroup progress models. Proc. ACM Program. Lang. 5(OOPSLA), 1\u201330 (2021). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3485508","DOI":"10.1145\/3485508"},{"key":"17_CR78","doi-asserted-by":"publisher","unstructured":"Tong, H., Gavrilenko, N., Ponce\u00a0de Leon, H., Heljanko, K.: Towards unified analysis of GPU consistency. In: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 4, pp. 329\u2013344. ASPLOS 2024, Association for Computing Machinery, New York (2025). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3622781.3674174","DOI":"10.1145\/3622781.3674174"},{"key":"17_CR79","doi-asserted-by":"publisher","unstructured":"Tun\u00e7, H.C., Abdulla, P.A., Chakraborty, S., Krishna, S., Mathur, U., Pavlogiannis, A.: Optimal reads-from consistency checking for C11-style memory models. Proc. ACM Program. Lang. 7(PLDI), 137:761\u2013137:785 (2023). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3591251","DOI":"10.1145\/3591251"},{"key":"17_CR80","doi-asserted-by":"publisher","unstructured":"Wu, M., Ouyang, Y., Zhou, H., Zhang, L., Liu, C., Zhang, Y.: Simulee: detecting Cuda synchronization bugs via memory-access modeling. In: ICSE 2020, pp. 937\u2013948 (2020). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3377811.3380358","DOI":"10.1145\/3377811.3380358"},{"key":"17_CR81","doi-asserted-by":"crossref","unstructured":"Xiao, S., Feng, W.C.: Inter-block GPU communication via fast barrier synchronization. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1\u201312. IEEE (2010)","DOI":"10.1109\/IPDPS.2010.5470477"},{"key":"17_CR82","unstructured":"Yalamanchili, P., et al.: ArrayFire - a high performance software library for parallel computing with an easy-to-use API (2015). https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/arrayfire\/arrayfire"},{"key":"17_CR83","doi-asserted-by":"publisher","unstructured":"Zhang, N., Kusano, M., Wang, C.: Dynamic partial order reduction for relaxed memory models. In: PLDI \u201915, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 250\u2013259. Association for Computing Machinery, New York (2015). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2737924.2737956","DOI":"10.1145\/2737924.2737956"},{"issue":"8","key":"17_CR84","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1145\/2038037.1941574","volume":"46","author":"M Zheng","year":"2011","unstructured":"Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: Grace: a low-overhead mechanism for detecting data races in GPU programs. SIGPLAN Not. 46(8), 135\u2013146 (2011). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/2038037.1941574","journal-title":"SIGPLAN Not."},{"issue":"1","key":"17_CR85","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1109\/TPDS.2013.44","volume":"25","author":"M Zheng","year":"2014","unstructured":"Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: GMRace: detecting data races in GPU programs via a low-overhead scheme. IEEE Trans. Parallel Distrib. Syst. 25(1), 104\u2013115 (2014). https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1109\/TPDS.2013.44","journal-title":"IEEE Trans. Parallel Distrib. Syst."}],"container-title":["Lecture Notes in Computer Science","Computer Aided Verification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-98682-6_17","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T08:40:40Z","timestamp":1759999240000},"score":1,"resource":{"primary":{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/link.springer.com\/10.1007\/978-3-031-98682-6_17"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"ISBN":["9783031986819","9783031986826"],"references-count":85,"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1007\/978-3-031-98682-6_17","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025]]},"assertion":[{"value":"23 July 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Disclosure of Interests"}},{"value":"CAV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Computer Aided Verification","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Zagreb","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Croatia","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2025","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"21 July 2025","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"25 July 2025","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"37","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cav2025","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/2.zoppoz.workers.dev:443\/https\/conferences.i-cav.org\/2025\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}