This article explains how e-mail threads/conversations in Legal Processing are built.
E-mail messages have certain properties and headers, which are used to recognize threads:
- Message-ID - specifies a globally unique identification of the current message.
- In-Reply-To - May contain the Message-ID of the message, to which the current message is a reply.
- References - May contain a list of the Message-IDs of all the messages in the chain from the current message back to the start of the thread. If the thread is very long, References field may not include the entire previous Messaged-IDs.
- Thread-Index (or Conversation Index) - defined in Microsoft Exchange Protocol for identifying of message position in the thread. Primarily used in Outlook.
- Thread-Topic (or Conversation Topic) - defined in Microsoft Exchange Protocol. This is the string that describes the overall topic of the conversation and equals to the Subject of the first (root) message. All messages within the thread have the same Conversation Topic.
Legal processing extracts this information from MSG files. References and Thread-Index properties (let's term each of them threading-field) are used to determine and build the hierarchical message chains, when any of these properties are present. It is also possible that part of messages from the thread will only have References fields and partial Thread-Indexes. Such a mixed scenario is also supported (if there is enough data to paste together these parts). The messages, that have threading-fields will be represented in hierarchical view.
The second way to recognize the thread is to use the Subject/Topic fields. In the most cases the messages from the same thread have the same Subject, except some prefixes ("Re:", "Fwd:" etc.) that are usually added during the course of the conversation. Topic can be defined as a Subject of the first message in the conversation. Such a simple, but meaningful criteria, makes it easy to determine a set of messages that belong to the same thread. This field helps to group messages into the thread but can't help to figure out the position of messages in it. The messages, that don't have threading-fields and have the same Topic will be grouped into the thread and represented in flat view.
If messages have the same Topic but some messages have threading-fields and other don't - the messages with threading information will be displayed hierarchically and the others will be displayed under the root message of thread.
Messages, that don't have threading-fields and don't have Topic are their own thread.
Additionally, PST files contain email messages (among with other type of data) that originated from Outlook. The messages from "internal conversations" (that were sent within one Exchange server) will have only Microsoft Conversation-Index/Topic properties, so the threading-field is present and threads will be built correctly.