Thanks to Mita Williams for pointing to this Washington Post article that makes it trivial to search and see whether any sites you're affiliated with have been used to train "Google’s C4 data set, a massive snapshot of the contents of 15 million websites that have been used to instruct some high-profile English-language AIs, called large language models, including Google’s T5 and Facebook’s LLaMA.