Compared to chat text, document files can contain vast amounts of information, such as academic reports and legal contracts. However, Large Language Models (LLMs) are inherently limited to processing only text or images, making it challenging to extract the rich contextual information within these files. As a result, application users often resort to manually copying and pasting large amounts of information to converse with LLMs, significantly increasing unnecessary operational overhead.
The file upload feature addresses this limitation by allowing files to be uploaded, parsed, referenced, and downloaded as File variables within workflow applications. This empowers developers to easily construct complex workflows capable of understanding and processing various media types, including images, audio, and video.
Both file upload and knowledge base provide additional contextual information for LLMs, but they differ significantly in usage scenarios and functionality:
Dify supports file uploads in both ChatFlow and WorkFlow type applications, processing them through variables for LLMs. Application developers can refer to the following methods to enable file upload functionality:
These two methods provide flexible file upload options for applications to meet the needs of different scenarios.
File Types
File
variables and array[file]
variables support the following file types and formats:
File Type | Supported Formats |
---|---|
Documents | TXT, MARKDOWN, PDF, HTML, XLSX, XLS, DOCX, CSV, EML, MSG, PPTX, PPT, XML, EPUB. |
Images | JPG, JPEG, PNG, GIF, WEBP, SVG. |
Audio | MP3, M4A, WAV, WEBM, AMR. |
Video | MP4, MOV, MPEG, MPGA. |
Others | Custom file extension support |
Some LLMs, such as Claude 3.5 Sonnet, now support direct processing and analysis of file content, enabling the use of file variables in the LLM node’s prompts.
To prevent potential issues, application developers should verify the supported file types on the LLM’s official website before utilizing the file variable.
Method 2: Enable File Upload in Application Chat Box (Chatflow Only)
Enabling this feature does not grant LLMs the ability to directly read files. A Document Extractor is still needed to parse documents into text for LLM comprehension.
gpt-4o-audio-preview
that support multimodal input can process audio directly without additional extractors.sys.files
variable in the input variables.Once enabled, users can upload files and engage in conversations in the dialogue box. However, with this method, the LLM application does not have the ability to remember file contents, and files need to be uploaded for each conversation.
If you want the LLM to remember file contents during conversations, please refer to Method 3.
Method 3: Enable File Upload by Adding File Variables
1. Add File Variables in the “Start” Node
Add input fields in the application’s “Start” node, choosing either “Single File” or “File List” as the field type for the variable.
Single File
Allows the application user to upload only one file.
File List
Allows the application user to batch upload multiple files at once.
For ease of operation, we will use a single file variable as an example.
File Parsing
There are two main ways to use file variables:
The choice between these methods depends on the file type and your specific requirements. Next, we will detail the specific steps for both methods.
2. Add Document Extractor Node
After uploading, files are stored in single file variables, which LLMs cannot directly read. Therefore, a “Document Extractor” node needs to be added first to extract content from uploaded document files and send it to the LLM node for information processing.
Use the file variable from the “Start” node as the input variable for the “Document Extractor” node.
Fill in the output variable of the “Document Extractor” node in the system prompt of the LLM node.
After completing these settings, application users can paste file URLs or upload local files in the WebApp, then interact with the LLM about the document content. Users can replace files at any time during the conversation, and the LLM will obtain the latest file content.
Referencing File Variables in LLM Nodes
For certain file types (such as images), file variables can be directly used within LLM nodes. This method is particularly suitable for scenarios requiring visual analysis. Here are the specific steps:
sys.files
variable.Below is an example configuration:
It’s important to note that when directly using file variables in LLM node, the developers need to ensure that the file variable contains only image files; otherwise, errors may occur. If users might upload different types of files, we need to use list operator node for filtering files.
File Download
Placing file variables in answer nodes or end nodes will provide a file download card in the conversation box when the application reaches that node. Clicking the card allows for file download.
If you want the application to support uploading multiple types of files, such as allowing users to upload document files, images, and audio/video files simultaneously, you need to add a “File List” variable in the “Start Node” and use the “List Operation” node to process different file types. For detailed instructions, please refer to the List Operation node.
Help improve our documentation by contributing directly
Found an error or have suggestions? Let us know