When a brief calls for research or public-domain reference material, xDocs only ever pulls from sources that are actually clear to use — Project Gutenberg, Standard Ebooks, and a handful of other catalogs with unambiguous public-domain status.
No scraped fan sites, no gray-area archives, no "probably fine" sources. Every citation a chapter draws on is traceable back to where it came from, so what you publish is something you can stand behind.
It would be faster to point a crawler at the open web and let it grab whatever text matches. It would also be a liability you'd inherit the moment you published. We'd rather ship a slightly smaller source catalog than a legally murky one.