nlp-datasets Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP). Most stuff here is just raw unstructured text data, if you are looking for annotated corpora or Treebanks refer to the sources at the bottom. Datasets (English, multilang) Apache Software Foundation Public Mail Archives: all publicly available Apache Software Foundation mail a