Token-count three documents.
Three pre-loaded documents — a short customer email, a thread with quoted history, and a Letter of Credit. Estimate the token count first (characters ÷ 4), then verify in a counter, then log which window each fits. The middle doc is the trap.
How to run Lab 2.1: Three documents are pre-loaded for you — a short customer email,
an email thread with quoted history, and a Letter of Credit. For each one, write your estimate
first (use the auto-fill or do chars ÷ 4 in your head). Then open a counter — try OpenAI's tokenizer or the Tokenizer Playground — copy the doc, paste it in, and log the actual number. The "Off by" indicator shows how close your estimate was.
0 / 3 documents logged
The middle doc is the one most people get wrong — quoted history quietly stacks up.
Rule of thumb: characters ÷ 4, or words × 1.3.
Your log
| Document | Chars | Estimate | Actual | Fits | Surprise |
|---|---|---|---|---|---|
| Doc A — short customer email | 294 | — | — | — | — |
| Doc B — email thread with quoted history | 2,243 | — | — | — | — |
| Doc C — Letter of Credit | 3,113 | — | — | — | — |
Stretch: Find one document that's too big for a small window.
What would you cut to make it fit — the quoted history, the appendix, the boilerplate?
Note it in the surprise column.