Challenges when working with document collections


We just published a blog post about our vision for integrating machine learning techniques into Uwazi. By combining human expert knowledge with machine intelligence we hope to support the work of human rights defenders.
However, the specific needs of an organization often depend on the project, the documents, resources, etc. I would like to ask researchers, lawyers and professionals what are the biggest challenges you encounter when working with large collection of documents? And imagine an (almost) ideal world, what would be the magic feature any document analysis/annotation tool should have?