Start with 10K of the most commonly used nouns and verbs (AKA
"proposed terms"). Generate an overall 'score' (in terms of
weighted "degrees of separation") from each Type (and the Type's other 'true
meanings') to each proposed term. If a Type scores in the top 50th
percentile for a given proposed term, that term contains the
element (or the bit) that the Type represents.
When each proposed term has been 'vetted' by all eight Types,
its basic index is completely determined. This process is
"exhaustive but straightforward", which makes it an ideal task for
a computer.
Definitions of terms used above:
commonly used - on the internet, of course (welcome to the
Twenty-First Century!)
For safety's sake, these would include the current 4k
vocabulary (mostly)
score - a composite value (X) summarizing the following
degrees of separation - reaching the term requires X links (or
clicks)
X is determined by
Direct connection (+/-) same: sentence, paragraph, page,
article, website
true meaning - words that are not closely connected as above,
but that express an essential facade of the Type (such as
Supremacy; self, action, intention, science, unique, ownership,
etc.)
The 'weighting' algorithm would run as
follows:
Each proposed term would be checked for the first "Direct
connection" (Type in same sentence) with the applicable Type
(Self, for example)
for efficiency sake, same sentence, paragraph and page might
be checked simultaneously, but only same sentence would matter
at this point
The largest group of terms (found or not found) would be
checked for the first connection with the next 'true meaning' of
that same Type (probably Supremacy)
"trueness of meaning" is still a judgment call
a different order of 'true meaning' checks might provide
better term grouping
The sub-group that contains the fiftieth percentile would be
checked with the next true meaning (Ownership perhaps?)
each iteration checks a smaller number of terms
Repeat step three until the 'need' and 'don't need' groups are
of equal size
this may not require checking all direct connections or any
indirect connections
When done with one Type, Repeat process with next Type
The only requirement is a repository of connected words (like the
internet or a huge, general purpose Wiki) and the computer resources
required to 'sift' it. If anyone knows of any organization that
would be willing to support this project, let me know. Unfortunately
my resources are limited. (I don't have the space to *download*
Wikipedia, much less process it.)
I'm only specifying ten thousand words, because "Non-descriptive
conversational phrases" should fill most of the remaining six thousand basic slots.
Common conversational phrases (and other handy terms) should be supplied by users to fill their own
most usual or pressing needs. Personal
vocabularies should be used for code words, shared references,
local slang and inside jokes (not that Personal Vocabularies are
actually available just yet, you understand).
Going from there.
Arranging the (64) related terms in each "Created Division" of
the description layer and creating adjectival and adverbial
variations is the hard part (or at least the manual part).
Generation of plurals and degree forms, on the other hand, is
mostly trivial. Picking the 'common' (#00) term for the "Neutral
Description" can be simplified with usage counts.
Unfortunately the relationships between the terms and the Types
are much less self evident at this level and index collisions are
bound to occur. The problem is that a large number of these
decisions are essentially judgment calls. Generating action and
entity nouns and determining which nouns are "Non-Descriptive" is
hard to delegate to a machine. These processes can be
semi-automated by using dictionary checks, but no dictionary can
ever provide one hundred percent coverage.
Homonyms will need to be identified and clarified (bark as in dog or
bark as in tree?). Verbs will also need to be nouned (nounified? ...
noun-o-matized?). Most verbs can be treated as nouns without
modification (a run, a look), but there always suspicious transients
loitering around (a see? ['scene' {entity} and 'sight' {action} are
the more usual suspects]).
This is where an active involved user base
would actually come in handy. "With enough eyeballs all meaning is
shallow." The application needs an integral 'feedback' button for
inarguable fixes. The feedback button would automatically create a
transaction that could propose a new (or changed) term for a given
location. A "Swap Multiple Terms" button would be very handy also
(no idea how it would work just yet).
Of course, a feedback button won't even be practical until after a
vocabulary is computer generated and the button wouldn't be
available on the free version of the application. We need to limit
"easy updates" to people who are serious about the results. If you
actually pay to use something you are less likely to try to screw it
up for vandalistic ego boo.
The Specialization Layer
There is a list of Specialties having Focal
vocabularies. These Specialties could be distributed using the same
process as the basic terms. This is only the first step to actually
creating the Specialization Layer, but it would (hopefully) be a
good place to start.
Collecting all of the specialized terms is not difficult, but arranging them logically could prove to be
the work of several lifetimes.