Answer (1 of 161): Here are some big corpora we use in NLP in addition to the ones already mentioned: * ukWaC: a 2 billion word corpus constructed from the Web limiting the crawl to the .uk domain and using medium-frequency words from the BNC as seeds. The corpus was POS-tagged and lemmatized w...
