Reply To: >15,000 pdfs indexed but zero c:/ drive matches

Overview Forums Discussions >15,000 pdfs indexed but zero c:/ drive matches Reply To: >15,000 pdfs indexed but zero c:/ drive matches

#16095
A Guy
Guest

Hi Abbie, without specific details, I cannot guess what specifically, or generically, you are referring to as “the anti-virus software”. But I believe that avenue may be not relevant in this case. . .

The following is intended as feedback to the programme’s developer(s), and for the benefit of other users of this programme who may encounter unexpected outcomes from it):

For the past few days, I have been ‘characterising’ the Anytxt programme (checking and testing what I can, toward trying to understand and summarise its behaviour, to be able to – hopefully – use it effectively), and found the following:

(1) It now (finally) is returning matches from the C:/ drive, and with (large) counts of file numbers as might be expected, and for all file formats.

(2) However, its status indicators in terms of files indexed/processing is multi-layered, and confusing:

(2a) On start up – pop-up messages are displayed, including:

Loading drive C: index database.”

Load drive E: index database successful.”

Load all index database successful.”

However, these do not mean it has finished processing all files.

(2b) After those messages another pop-up is displayed:

The Full Text Search Engine is starting, please wait a moment…”

Accompanied by a ‘doing something’ segmented spinning indicator in the top right corner of the programme window.

(2c) While waiting for the ‘moment‘ to pass, I checked Option(s) > ‘Index Rules’:

About an hour after programme launch, PDF files status was ‘Pending’, all other file formats ‘Finish’ [sic].

About 4 hours after programme launch, PDF files status was ‘Finish’ [sic; should be ‘Finished’].

It was some time between 1 and 12 hours after launch that text-search matches from the C:/ drive were (finally) being returned.

(2d) While STILL waiting for the promised ‘moment‘ to pass, I checked the Index Store Path, via:

Windows Explorer  > PC > C:/ drive > Program Data > AnyTxt > data

The following time series shows respective drives’ data file sizes:

Hours:      ~1                   ~4                ~8                   ~11                ~12             ~30 to ~80*

C.ati      65,064 KB > 668,260 > 3,974,372 > 6,229,748 > 7,626,620     > > same >>*

E.ati      71,332 KB  > > same >>

F.ati             80 KB  > > same >>

(2e) *BUT, as at now more than three days of continuous PC and Anytxt-programme running, and after the C.ati data file size seemed to plateau, and certain text-search entries likewise (e.g. 3597 matches for one search-term became 3599 by 24 hours later, 7568 matches for another search-term became 7569 by 24 hours later, and 9332 matches for a third search-term remained at 9332 by 24 hours later),the ‘moment‘ has still not finished. In other words, I am still waiting for the programme to finish doing what it’s doing, assuming it ever will.

Three days after the third launch, the programme still states:

Load all index database successful. But some files are still being processed and your search results may be incomplete at this time.”

“… search results may be incomplete at this time as some files are still being processed … when the status in the upper right corner changes to green [tick], all scans are completed.”

 

SUGGESTION 1: It would seem that, instead of a “wait a moment” message, where the ‘moment‘ may equate to days, a user-meaningful processing progress status indicator would be useful, such as a 0 to 100% progress bar or other graphic, or an estimated time-to-completion counting-down clock.

 

(3) Further ‘characterisation’ leads to further improvement suggestions. . .

 

There are currently the following discrete search method options for users:

(1) Whole Match

(2) Advanced Search (with various search syntax variable inputs possible, per the Help(H) menu.)

(3) Regular Match

SUGGESTION 2: However, none of these appear to equate necessarily to an “Exact Match“.

For example, if one wishes to search the term ‘MS’ the programme returns matches including:

ms (for milliseconds), Ms. (short for Miss), msg, msgbox, msn, sums, mums, films, hamster, etc.

But the same (number of) matches are returned if one instead enters the search term ‘ MS ‘, (space em ess space).

Or if one tries to ‘outsmart’ by entering, for example the search term ‘ MS  !ms.’

This might be expected to return instances of ‘MS’ but not of ‘MS.’.

But instead, no matches for either string are returned.

It seems (unless I am incorrect) that spaces and full-stops, if not also other non alpha-numeric characters, are not enabled with the programme’s current search capabilities. It therefore seems not possible for users to cut down to a shortlist of desired as against unuseful matches when such a term as ‘MS’ may be critical and urgent, as in the context of it standing for ‘Multiple Sclerosis’.

 

SUGGESTION 3: Also, none of the current search options appear to facilitate a “Proximity Match” (a feature in some PDF reader software).

‘Proximity’ with such software equates to, for example:

The presence of two or more search terms, within the same paragraph, or page, or document.

It appears from the results fields that whole-document-matches are returned by default. But Anytext’s indexing already incorporates line or row numbers, so it should be easy to facilitate finer ‘Proximity Matches’ for users, by enabling, for example, users to choose, if not set: ‘Within plus-or-minus X lines or rows of each other’. (If X were 30 to 50 rows, that would equate to approximately one page for many texts, depending on their respective font and page size etc.)

This search functionality would enable users to drill down to a subset of file matches ‘most likely‘ to contain what is being looked for, while ignoring potentially large numbers of less likely if not outright time-wasting files.

 

SUGGESTION 4: There are four column/field headers for search-term matches: Name, Modified (date), Path, (file) Type.

However, results displayed initially do not appear to be in any A-Z or Z-A order based upon any of those columns/fields.

If one clicks on one of the headers, that column becomes sorted. However, it may be useful for some users to be able to have, if not set as a default, one of those headers in sorted order, e.g. Date Z-A (descending order, newest at top, oldest at bottom).

It may be even more useful for some users to be able to sort by, if not set as default, one or more columns, e.g. by file 1Type then 2Date, or vice-versa, or 1Date, 2Type, 3Name, etc.

 

Lastly, to recap on indexing status, now being 10 days after my initial download and launch of the programme (version 1.3.1380), and more than three days after leaving it and my PC running continuously but STILL NO GREEN TICK, I continue to wonder when, or even if, I may be able to expect the programme to ever finish processing? (And, consequently, how many, if any, matching files for certain search terms may remain missing from the programme’s results.) Is it actually still processing, or stuck in a loop, or on one (or a few) particular file(s)? There seems to be no (easy) way for users to know.