I think this should generally be true. The aggregation performed by model training is highly lossy and the model itself is a derived work at worst and is certainly fair use. It may produce stuff that violates copyright, but the way you use or distribute the product of the model that can violate copyright. Making it write code that’s a clone of copyright code or making it make pictures with copy ri