Tagged: Anytxt full text search feature
November 28, 2022 at 9:14 pm #7508PerryGuest
AnyText does a great job with foreign languages; however, I can’t search for a single multibyte character in Japanese or Chinese. AnyText reports an error and asks me to enter a minimum of two letters. I understand this requirement for Western European languages, but Far Eastern languages are constructed somewhat differently. Take, for example, that I want to find the Chinese verb for “to go”, which is “qù” in pinyin but consists of only one character “去” in Chinese. This character is actually a double byte character, which is to say it is made up of two bytes, which would qualify it under the two character rule. But AnyText won’t accept it as such and wants a Chinese word that consists of two Chinese characters. That might work for words like “饭店”, but that doesn’t help me in this case. It wouldn’t take much to adjust the code for AnyText to check for Unicode characters over a certain value as Chinese and Japanese characters have very high Unicode values over hex 2E83.
Thanks and keep up the good work!December 3, 2022 at 10:42 am #7693adminKeymaster
Because searching for a char can be slow and take a long time, some characters even have it in every file, such as ‘的’ in Chinese, and ‘a’ in English. So a one-char search is considered meaningless.
By the way, in what scene do you need to search for a char? thanksDecember 6, 2022 at 2:19 pm #7818perryGuest
As I said in my original post, I fully understand why one does not search for a single letter of the alphabet in languages written using the Latin alphabet, and that is the reason you gave in this post. I also understand the case in one were to search for hiragana or katakana in Japanese as they are as frequent in Japanese as the letter “a” in English texts. However, I have Japanese and Chinese texts on my PC and need to search for non frequent characters from time to time – not like “的”, which is ubiquitous in both languages. I can’t, in fact, understand why anyone would want to search for characters like “的”. The reason I am searching for these characters is that I am making a database of characters for a PHP / MYSQL program that I am creating by using characters from children’s stories in these languages. Because I have these in their own unique folder, and because there are not hundreds of them, it does not take long doing a search like the one I’m suggesting. Anyway, I hope you will consider it. One way to program around this is to keep the default setting that you have in AnyText, but having a checkbox or menu option to override the default setting and allow for single Unicode character searches on a need to do basis. After such a search, the default setting could kick in again.
By the way, has anyone told you about the latest problem in AnyText? My previous version was done scanning my hard disks after a day or so, and I got a green arrow up in the upper right hand corner of the program. The current version of the program has been scanning my hard drives for several weeks now (I leave the program running 24/7) and informs me that my searches may not be reliable until it has finished. However, by doing spot check searches in the most remote parts of my drives AnyText does deliver correct results. This may be a bug in the program, whereby the background search does not inform the main program that has completed its analysis of my HDDs.
Anyway, I enjoy using the program despite all this and will contribute at Christmas time.
Best wishesDecember 19, 2022 at 12:29 pm #8355adminKeymaster
Thank you very much