WebDataset format files are tar files, with two conventions: within each tar file, files that belong together and make up a training sample share the same basename when stripped of all filename extensions the shards of a tar file are numbered like something-000000.tar to something-012345.tar, usually specified using brace notation something-{000000..012345}.tar You can find a longer, more detailed