Telegram evidence beyond the cloud
In September 2024, Telegram quietly rewrote its own mythology. For years, the platform had positioned itself as the privacy-conscious alternative to WhatsApp: end-to-end encryption for secret chats, no cooperation with governments except in terrorism cases, and a founder who publicly clashed with Russian authorities. Then came the transparency report. Between January and December 2024, Telegram responded to 900 U.S. law enforcement requests, handing over phone numbers and IP addresses for 2,253 users. The policy shift, reflected in Telegram’s official privacy policy, extended cooperation beyond terrorism to any criminal activity, including cybercrime. The app that investigators once filed in the “too hard” drawer is now significantly more cooperative than it used to be. That context matters, but it does not change the core forensic reality: the most valuable Telegram artifacts are almost never in what the platform will send you on request.

This guide focuses on practical Telegram forensic analysis across the artifacts investigators actually parse in the field: Android cache4.db, iOS db_sqlite, Telegram Desktop tdata, and time-sensitive WAL/freelist remnants.
The architecture investigators actually face
Telegram organizes its storage around a fundamental distinction that shapes every forensic decision you make. Regular chats (cloud chats) are synchronized on Telegram’s servers, accessible from any device, and stored server-side in a format Telegram can read. Secret chats are device-only, established via a Diffie-Hellman key exchange, and genuinely end-to-end encrypted: Telegram holds no copy, no key, nothing. This distinction has direct investigative impact. If you send a legal request to Telegram, you may receive phone numbers and IP addresses for criminal cases after September 2024. You will not receive message content, including content from regular cloud chats, because the platform generally declines to provide it across most jurisdictions.
This means that for any investigation where message content matters, the device is your primary target. The server is a fallback for identity attribution, not content recovery. Understanding that asymmetry is the prerequisite for everything that follows.
Version boundaries and validation discipline
Telegram forensic behavior is version-sensitive. Storage schema details, WAL persistence windows, and cache retention patterns can shift across Telegram releases and operating system updates. In practice, every parser output should be treated as a version-scoped hypothesis until confirmed with manual byte-level validation on the same build family.
If your lab workflow includes iOS acquisitions, the methodology discussed in this article should be read together with this practical guide to iOS evidence acquisition without jailbreak, especially for AFU/BFU handling and extraction constraints.
Field priorities in the first 60 minutes
If you have a narrow acquisition window, sequence matters more than tool choice.
- Preserve volatile state first: keep the device powered, isolate network paths when possible, and document lock state (AFU/BFU on iOS).
- Acquire app containers before broad triage exports: for Telegram, that means grabbing
cache4.db/WAL artifacts on Android andpostbox/db_sqliteplus WAL on iOS. - Pull attachment caches immediately after database extraction, because media lifetimes are often shorter than message metadata lifetimes.
- Defer enrichment steps (timeline decoration, attribution correlation, reporting formatting) until core artifacts are safely duplicated and hashed.
Android artifacts in cache4.db
On Android, Telegram stores its local data under /data/data/org.telegram.messenger/files/ inside the app’s private sandbox. Access generally requires root, a full filesystem/physical acquisition method, or a forensic agent with sufficient privileges (legacy ADB backup methods may work only on older devices and app versions). The main database is cache4.db, a standard SQLite file. Depending on journal mode, it may be accompanied by rollback journal artifacts or WAL files (cache4.db-wal and cache4.db-shm). The schema is readable without any proprietary decryption. For broader context on mobile artifact persistence on recent Android versions, this analysis of Android pattern-of-life artifacts is a useful companion.
The tables you want first are these:
- messages: stores message ID, date (Unix timestamp), dialog ID, sender ID, message text, and a
dataBLOB for media and service messages - users: maps numeric user IDs to first name, last name, username, and phone number where available
- dialogs: maps dialog IDs to type (user, group, channel, secret chat), unread count, and top message ID
- media: stores references to local file paths for attachments that have been downloaded to the device
- chat_settings_v2: stores per-chat notification and privacy settings
A basic query to reconstruct a conversation timeline in SQLite3:
SELECT
m.date,
datetime(m.date, 'unixepoch') AS readable_time,
u.first_name || ' ' || COALESCE(u.last_name, '') AS sender,
m.message
FROM messages m
LEFT JOIN users u ON m.uid = u.uid
WHERE m.dialog_id = <target_dialog_id>
ORDER BY m.date ASC;
The data column in messages is a serialized BLOB for non-text messages (photos, voice notes, documents, geo-pins). Parsing it requires understanding the MTProto TL serialization format, which Telegram uses internally. The teleparser tool by RealityNet handles this deserialization and extracts structured records from the BLOB fields, including media file references, forwarded message metadata, and geo-coordinates. For manual verification, the TL constructor ID is stored in the first four bytes of each BLOB in little-endian order: for example, 0x9c4e19c1 identifies a messageMediaPhoto record.
Group membership history and admin changes live in chat_settings_v2 and in service messages within the messages table itself, where the media field will contain a TL-encoded messageActionChatAddUser or messageActionChatDeleteUser object. These are forensically significant for establishing who was present in a group at a given time.
The cache4.db also stores search index data for in-app full-text search, which means that message fragments from conversations the user has actively searched for can persist in the database even if the conversation itself has been archived or the messages deleted. This is an underexplored artifact source.
iOS artifacts in db_sqlite
On iOS, Telegram stores data outside the standard app backup mechanism. iTunes and iCloud backups do not contain Telegram data. The only reliable path is a full filesystem extraction via an agent-based method (Belkasoft X, Cellebrite UFED with iOS agent), checkm8-based physical acquisition on supported devices (A5-A11 chipsets), or a jailbroken device. For devices in AFU (After First Unlock) state, the keychain is accessible and the Telegram data partition is decryptable at acquisition time.
After acquisition, Telegram data sits under:
/private/var/mobile/Containers/Shared/AppGroup/{GUID}/telegram-data/{account_id}/postbox/
The GUID is assigned at install time and changes if the app is reinstalled. Inside the postbox/ directory you will find a db/ subfolder containing db_sqlite and its associated WAL file (db_sqlite-wal), and a media/ folder with cached attachments.
The db_sqlite schema on iOS is not equivalent to the Android cache4.db. Telegram iOS stores records in a custom binary serialization format across three primary tables:
- t0: account settings, including the account owner’s numeric Telegram ID
- t2: chat and contact metadata (names, profile photos, chat types)
- t7: message records (one row per message, with key = chat+timestamp, value = message data)
Both key and value columns in these tables store raw binary data, not text. The critical operational detail is endianness: the key column uses big-endian encoding, while the value column uses little-endian. Ignoring this will produce wrong numeric values for every ID and timestamp you decode.
A practical example from t7: to extract the Unix timestamp of a message, read bytes 13-16 (one-based), or offsets 12-15 (zero-based), of the key column as a big-endian uint32. In Python:
import sqlite3
import struct
conn = sqlite3.connect('db_sqlite')
cursor = conn.execute("SELECT key, value FROM t7")
for row in cursor:
key_bytes = row[0]
# Chat ID: bytes 0-7, big-endian int64
chat_id = struct.unpack_from('>q', key_bytes, 0)[0]
# Timestamp: bytes 12-15, big-endian uint32
timestamp = struct.unpack_from('>I', key_bytes, 12)[0]
print(f"Chat: {chat_id}, Timestamp: {timestamp}")
For message content, the value column requires parsing the internal TL serialization. Text messages store the sender ID (8 bytes, little-endian int64, starting at byte offset 20 in the value), followed by a 4-byte length prefix and the UTF-8 message string. Geolocation pins store a la key (latitude) and lo key (longitude) as little-endian float64 values. The DFRWS 2025 paper by Jaeckel is currently one of the best public references for these offsets across recent iOS Telegram versions.
One artifact category that deserves specific attention: the t2 table caches search results from Telegram’s global search even when the user has no interaction history with the returned accounts. If an investigator queries t2 and retrieves contacts or channel names, those may reflect search activity rather than actual communication. A channel appearing in t2 with zero associated messages in t7 is a search cache artifact, not evidence of membership.
Decoding pitfalls that can break conclusions
Most Telegram parsing errors are small at byte level and catastrophic at reporting level.
- Endianness drift: decoding iOS
keyandvaluefields with the same byte order silently corrupts IDs and timestamps. - Tuple handling errors in scripts: forgetting that
sqlite3returns tuples (and parsingrowinstead ofrow[0]) produces invalid offsets. - Constructor confusion: matching on plain text instead of TL constructor IDs leads to false positives in binary scans.
- Cache over-interpretation: entries in
t2or desktop caches are often evidence of discovery/search, not evidence of communication or membership.
Artifact reliability matrix
The table below is a practical way to communicate evidential weight before drafting conclusions.
| Artifact source | Typical volatility | Evidential value | Main interpretation risk |
|---|---|---|---|
Main SQLite records (cache4.db, db_sqlite) |
Medium | High for message chronology and account linkage | Parsed fields may be wrong if offsets or endianness are misapplied |
WAL files (*-wal) |
High | High for recent activity and deleted/modified records | Time-window assumptions may be overstated without checkpoint context |
| Freelist pages | Very high | Medium to high when validated with parser + manual checks | False positives from partial or overwritten binary fragments |
Search/cache metadata (t2, desktop cache indexes) |
Medium | Medium for discovery behavior, low for communication proof | Confusing search/discovery artifacts with actual interaction |
| Provider legal-response metadata (IP, phone) | Low to medium (provider retention dependent) | Medium for attribution and correlation | Over-attributing a person/device without multi-source corroboration |
Secret chat artifacts on device and under time pressure
The secret chat architecture is genuinely end-to-end encrypted during transit: keys are negotiated via a Diffie-Hellman exchange between the two client devices, Telegram’s servers never hold the plaintext or the session keys. What is less widely understood is that once a message arrives on the device, it is stored unencrypted in the local database. The encryption protects the transport layer, not the local storage layer.
On iOS, secret chat messages appear in t7 under separate dialog IDs. Belkasoft X identifies them as anonymous dialogs in the tree view (no sender details) and resolves the sender name from message metadata at display time. On Android, secret chat messages are stored in cache4.db with a dialog_id that maps to an entry in dialogs with type = 2 (encrypted chat).
The recovery window for deleted secret chat messages is substantially narrower than for regular chats. Based on empirical analysis reported by Belkasoft, deleted secret chat records do not survive in SQLite freelist or WAL entries beyond a few minutes, while deleted messages from regular one-to-one chats can persist in WAL records for one hour to approximately two days depending on iOS version and database checkpoint behavior. If you are acquiring a device in response to an incident where secret chats are the primary target, acquisition latency directly affects evidential yield.
This timing problem is consistent with the broader mobile evidence theme that local notification and cache artifacts can outlive user expectations of deletion, as also discussed in this Signal-focused case study on iPhone notification forensics.
The WAL recovery path in standard SQLite is: check db_sqlite-wal for uncommitted or recently committed records, then scan freelist pages in the main database file for overwritten records. For db_sqlite, the freelist approach is complicated by the binary serialization format: you cannot simply grep for text strings. You need to scan for known TL constructor IDs at page boundaries and parse outward from those anchors.
Telegram Desktop artifacts in tdata
On Windows, Telegram Desktop stores its local data in:
%APPDATA%\Telegram Desktop\tdata
Files in this directory carry the .TDEF (Telegram Data Encrypted File) header. The encryption uses a key derived from the user’s local passcode (or a static default key if no passcode is set, which is the majority of installations). In the no-passcode case, the key material is stored in key_datas within the same directory, making decryption trivial for anyone with filesystem access.
The tg-desktop-investigation toolkit handles decryption of tdata files and extraction of structured records including message history, contacts, and media references. The log.txt file in the same directory records session timestamps in plaintext and survives even if the user attempts to clear the cache, making it reliable for establishing last-use timelines.
A behavior documented on this blog in 2020 and that remains relevant in more recent versions: Telegram Desktop’s in-app cache clearing function does not reliably purge files from the filesystem. A user who believes they have cleared their local Telegram cache may have done nothing of the sort. In practice, this means that tdata directories on investigated machines frequently contain media files, message drafts, and session artifacts going back months or years, regardless of any in-app deletion attempts.
On macOS, the equivalent path is ~/Library/Containers/org.telegram.desktop/Data/Library/Application Support/Telegram Desktop/tdata/. Depending on the Telegram client build, investigators may encounter storage structures that differ from Android’s cache4.db. Public reverse-engineering work by stek29 provides a practical starting point for PostBox-style extraction workflows.
What legal requests can realistically return
The September 2024 policy shift changed Telegram’s practical utility for investigations, but it is worth being precise about what it changed and what it did not. Telegram now responds to legal requests for any criminal offense, not just terrorism. In practice, the returned data is limited to the account’s phone number and IP address.
For investigators, this is useful for identity attribution at the account-opening stage and for correlating Telegram activity with network logs. The IP addresses provided reflect connection metadata retained by Telegram’s infrastructure, so they are constrained by applicable jurisdictional scope and provider retention policies. The transparency report for 2024 showed India leading with 14,641 requests affecting 23,535 users, which indicates aggressive use of this channel across very different legal systems with very different standards for what constitutes sufficient legal basis.
Message content from cloud chats remains unavailable through this channel. Telegram’s infrastructure does hold plaintext copies of non-secret messages server-side (decrypted for storage and re-encrypted for delivery), but the platform does not produce that content in response to legal process. This gap makes device-side acquisition the non-negotiable first step whenever message content is material, as discussed in the broader context of cloud forensic jurisdictional constraints covered here. A legal request gives you identity metadata. The device gives you the conversation.
Reporting caveats for court-facing deliverables
When findings move from lab notes to legal documents, wording discipline becomes part of forensic quality.
- Separate observation from inference in every paragraph (for example: “artifact present in
t2” versus “user communicated with this entity”). - Always state artifact provenance and volatility (main DB, WAL, freelist, cache), because persistence characteristics change evidential weight.
- Time claims should include decode method and uncertainty source (timezone transforms, endian parsing assumptions, checkpoint timing).
- For secret chats, explicitly document acquisition delay from seizure to extraction, because that delay directly constrains recoverability claims.
Defensible wording for legal reports
Poor wording: “The suspect contacted channel X.”
Defensible wording: “A channel identifier associated with X appears in Telegram t2 search cache records; no corresponding message exchange with X was recovered in t7 for the analyzed extraction.”
Poor wording: “Telegram deleted messages were fully recovered.”
Defensible wording: “Recovered records include deleted-message remnants from WAL and freelist artifacts; completeness cannot be guaranteed because persistence depends on checkpoint timing and overwrite activity.”