Dedoc
This sample demonstrates the use of Dedoc in combination with LangChain as a DocumentLoader.
Overviewโ
Dedoc is an open-source library/service that extracts texts, tables, attached files and document structure (e.g., titles, list items, etc.) from files of various formats.
Dedoc supports DOCX, XLSX, PPTX, EML, HTML, PDF, images and more.
Full list of supported formats can be found here.
Integration detailsโ
| Class | Package | Local | Serializable | JS support |
|---|---|---|---|---|
| DedocFileLoader | langchain_community | โ | beta | โ |
| DedocPDFLoader | langchain_community | โ | beta |