Further thoughts on learning with texts tools
24 Dec 2025
A couple of years ago, i wrote about what i liked and didn’t like about learning with texts tools. For the uninitiated, this is a category of software used for learning to read long-form text in a foreign language by segmenting the text and highlighting each word in a different colour depending on your familiarity with it.
Conceptually, this is great. As i wrote back then, staring at a page of chinese characters feels terrifying. But seeing that on that page, only a handful are new words and the rest are words i already know gives me the motivation i need to start reading, and to slowly work my way through. For the last couple of years, i used lute, a local web app, to read texts in chinese, and i think it was very beneficial. But over the last year, i’ve moved towards a more specialised tool, because there were a few things that continued to cause headaches for me.
To cut to the chase, the program i am using now is called chinese text analyser. It’s a really great piece of software which focuses on a subset of the learning with texts functionality. It is not intended to be a reader, but rather to give an analysis of the words used within a text, and some simple tools so that you can choose to learn some of those words.
The biggest issue i had using lute is that there are a lot of words in a text. Every time i’d come to a word highlighted red for unknown, i’d have to stop reading and make a decision about whether i knew it, or to check the meaning. A more ideal flow would be to just keep reading over words i was vague on, so as to slowly increase my reading speed, and then either hope that the word in question would come up again later, or that i would never see it again.
Chinese text analyser scans all the words in a text, and gives me a table showing how many times they occur, and also gives me a cumulative percentage of how much of the total text knowing those words. To better explain, here is the listing for the most frequent eleven words or characters in Yu Hua’s Chronicles of a Blood Merchant:
| Word | Frequency | %Frequency | Cumulative %Frequency | First Occurrence | HSK Level | TOCFL Level | Length | |
| 1 | 了 | 3217 | 4.054139 | 4.054139 | 394 | 1 | 1,4 | 1 |
| 2 | 的 | 3196 | 4.027675 | 8.081814 | 90 | 1 | 1 | 1 |
| 3 | 我 | 1993 | 2.511626 | 10.593439 | 1419 | 1 | 1 | 1 |
| 4 | 一 | 1805 | 2.274704 | 12.868143 | 323 | 1 | 1 | 1 |
| 5 | 说 | 1797 | 2.264622 | 15.132765 | 1566 | 1 | 1 | 1 |
| 6 | 他 | 1665 | 2.098272 | 17.231037 | 710 | 1 | 1 | 1 |
| 7 | 你 | 1625 | 2.047863 | 19.278900 | 9482 | 1 | 1 | 1 |
| 8 | 许三观 | 1423 | 1.793298 | 21.072198 | 4 | * | * | 3 |
| 9 | 乐 | 1258 | 1.585361 | 22.657559 | 51979 | * | * | 1 |
| 10 | 在 | 1060 | 1.335837 | 23.993396 | 659 | 1 | 1,3,4 | 1 |
| 11 | 就 | 959 | 1.208554 | 25.201951 | 725 | 2 | 2,3,4,5 | 1 |
Here we can see that with only eleven words we can read a quarter of the text, although this doesn’t necessarily translate to comprehending that much. All of these words are basic chinese words, except for the name of the main character and another word that doesn’t appear until a long way into the book. We also have some word statistics:
| Total | 79,351 |
| Total known | 71,441 |
| Total known (%) | 90.03% |
| Total unknown | 7,910 |
| Total unknown (%) | 9.97 |
| Unique | 4,695 |
| Unique known | 1,574 |
| Unique known (%) | 33.53% |
| Unique unknown | 3,121 |
| Unique unknown (%) | 66.47% |
This makes it fairly clear how easy it would be to read the book. Although there are a lot of unknown words, they mostly only appear a couple of times each, and the majority of words are covered by my existing knowledge. The reason for this is because i exported a list of all the words i didn’t know and used them to make flashcards, and then set all the words as known within chinese text analyser. The more i use the program, the more accurate the percentage values are; at the moment, i still quite frequently come across words that i know, but that aren’t marked as known within the software.
This workflow fits me better for two reasons: first of all, a lot of the things that i want to read cannot be easily imported into lute in a timely manner, so weaning myself off a visual indication of a text’s comprehension is useful for me. Second, as can be seen from the second table printed here, most books have a long tail of words which appear only rarely. I don’t want these low-frequency words to interfere with my willingness to start reading a text in the first place, so seeing that i understand most of the text and being able to skim over infrequent words and attempt to guess the meaning from context encourages me to read more. Learning words takes time, and i’d rather read more with marginally lower comprehension than read less with perfect comprehension.
Chinese text analyser costs money, but it comes with a free trial. I paid, because i think it’s a good value proposition, and it has helped me to continue increasing the amount of reading i do in chinese. The software is fast and robust, it runs on my computer, and it doesn’t have any features that detract from its main focus. If it seems like it could be for you, i recommend taking advantage of the trial to give it a try.