Vous êtes sur la page 1sur 9

1/9

Neural Knowledge Network Proposal

Youve probably heard the joke a million times before. The first person asks, Do you have any programming experience? The second person, quite pleased, responds with, Of course! I know a little HTML and C++! Im the second person, but Im under no illusions of competency, and Im not particularly pleased with my ignorance. What I lack in technical skill, however, I hope youll see I more than make up in imagination. My goal is to provide the framework for creating something that may change the world. Im producing pseudocode for possible lines of attack in meeting objectives and will be making observations as to the necessary structure of the database and system as a whole. However, I cannot actually program the database or the connections I propose it make with other, existing systems like Wikipedia. And though I do admit I possess limited programming expertise, I must also admit my greater ignorance: I often dont know what I dont know, and that can be crippling when designing the framework. Thats where you come in. I need your help. This system has the potential to change your and everyones life for the better. Its ambitious, its going to take a long time to produce, and theres a chance it will fall apart later or even soon after it startsbut this project is my passion, and I want to see it through to the end. If youre a programmer, Id love your help. I dont want money; I just want a small slice of your time. I will be constantly updating this document, but if we can get a small community of big-picture thinkers together, we can pool our talents and make this happen. Its baby steps that will finish this project, and small contributions will make all the difference. I have paid programmers in the past in trying to get this off the ground by creating the initial database, but Ive never seen anything for my money. Its time to take a different approach. I want people that can share my vision. I want people that can see how it all fits together. I want people that want to make this happen too. If youre interested, drop me a line. Id love to talk with you if you have more ideas, see a problem, want more information, or just want to chat (I was a philosophy major and am a law student, so I love to talk about Schopenhauer and society).

Yours, jdgn0 P.S. I have worked on this project at different times over the years; some of the pieces may not fit together perfectly, but use your imagination and youll see the big picture.
Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

2/9

Objectives: First, to create a brain-emulating knowledge database after the fashion of a semantic network; second, to distribute the burden of creating and teaching the brain across the internet, harnessing the power of human contributors and existing service providers; third, to create error-correcting, security, and other necessary mechanisms in order to make the system run automatically and without subsequent human intervention; and lastly, to make the information contained in each neuron of the brain, as well as every service in the entire system, free to all who desire it in formats easily accessible to human users and other computer programs that want to harness its power. Ultimately, the brain may serve only as a stepping point for other projects, as a brain without a nervous system remains in futile, solipsistic isolation. Connectivity with other projects like Wikipedia and Wolfram Alpha would offer preexisting databases of knowledge and computational ability to meet the first two objectives, and the addition of language parsing bots and text-to-speech human-computer interfaces might allow the entire system to reach human use saturation. Eventually, the system should become entirely automatic, distributing and hosting itself in various forms across the ever-burgeoning internet until it becomes completely self-sustaining. The idea: The project must take place in several stages, likely according to the order of the objectives. The base of the project must be creating a brain-emulating knowledge database. This database will function on several principles and with several overarching goals in mind: 1. Create a new model for understanding phenomena and discrete factsa holistic model that in its very structure acknowledges the interconnectedness of life, the universe, and everything 2. Create and expand a database of information allowing the emulation of brain activity by tracing information (an electrochemical signal) down a synapsed pathway (nodal path) to reach a memory (terminal node) 3. Make the database and its information available to the body of internet users via an easy-to-use user interface that can output data into several utile formats; additionally, allow the database to communicate information to human users in multiple ways, e.g. visual display of the connections between each node (neuron) and the relative strength of the connection 4. Develop interfaces (central nervous system) between existing internet-based services (appendages, sense organs, etc.) that can perform additional functions for and from the database. These services may include text-speech translation, language parsing, language translation, mathematical computation, data parsing and organization, human-computer interfacing, CPU/GPU processing, and eventually hosting the entire system, and other services. 5. Make every aspect of the system run automatically and without human intervention; develop automatic error-correcting mechanisms; create parameters to help the system automatically
Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

3/9

detect useful services and knowledge to integrate into it; create parameters for the system to develop new parameters for detecting and integrating useful services and knowledge. 6. Create a multi-layered security protocol to protect the integrity of every aspect of the system, including data origin, data modification, system modification, etc. 7. Remain imaginative. Life is more complex than the human perspective alone can comprehend. A human body (the holistic system created by the integrated brain/network and body/services) may manipulate its environment with arms and communicate via spoken language, but a cephalopod manipulates its environment with tentacles and will change its body colors to communicate. To see where the human (computer) can go next, you have to think outside the box, and outside your existing physical and mental constraints. Life builds on itselflet this be the first step on a long flight of stairs to the heavens and other destinations unknown. Points to consider: Information and the connections between discrete quanta of it are the primary concern of both the thinker and the database user. Nonetheless, this information is useless without context and the ability to access and understand the information itself. For this reason, the graphical interface to interactions between nodes is of significant importance: at a glance the user can see context, connection, and the node of immediate interest. More localized context must also be provided for each node. Ideally, the type of connection between two nodes (e.g. one of historical simultaneity, one of structural similarity, etc.) would be immediately recognized by the brain and visible or otherwise observable by the thinker. This would be no easy task to program, however, and I recognize this. Emergent structure of the database: Consequently, and perhaps in anticipation of a scenario like this actually taking place farther into the future, for the time being the type of connection must be manually coded somehow into either the node or the connection itself. I believe the easier of the two would be to not define relationships between nodes on the connection but instead store the data in a tag on the node itself. See right:

Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

4/9

Since identity1 is the easiest and most primitive connection to make, it is a solid starting-point for connecting nodes together. Because there is no one type of connection, multiple tags must be created for every node. Examples of tags for the node Roger Williams might be any of the following: 1600s, 1631, emigrated to, citizen of, lived during, lived at the same time as, and so on. By querying any one tag, you can find all other nodes with the same tag. Each tag, in turn, would have a metatag to explain its general reference. The purpose of this is to allow node querying by type of connection as well as merely identifying similar nodes by the fact of their connection. An example of a metatag for the tags 1600s or founded the city might be history. Note that not all nodal tags can be immediately compared to others through identity (though metatag may suffice): for example, you would have to complete the nodal tag founded the city as founded the city Providence, RI for a connection at the tag level to be made. In this case, the underlined addition that completes the tag would be hyperlinked to Wikipedia to provide immediate context for that tag. In many cases, as for this specific tag, its possible that a node would be the only node to have a complete tag2 with the same full string. A tag could, however, just as likely have multiple nodes to which it is attached: for example, both Romulus and Remus, as discrete nodes, could be tagged with the same string founded the city. But the node itself would merely be an isolated noun if not for a greater context, which is to say the actual information about it. The top tier of information stored in the node is in fact not information at all but a link to the information stored off-site (and beyond our bandwidth limits) on Wikipedia. In short, each node (or neuron) stores data that provides: 1. 2. 3. 4. The identity of the node An array of connections to other nodes (identified via nodal tags) The type of connections to other nodes (identified via metatags) Information about the node (in the form of a link to the information)

Since nodes are the primary keys (indeed, the superkeys), there should be no two identically-named nodes with different informationthere should be no two identically-named nodes. Tags, however, are not unique, nor are metatags: moreover, as tiers of data (i.e. node, tag, metatag respectively) become more removed from the identity of the node, the tiers (and their tags) will likely become more and more generalized. Roger Williams is the most specific, for example, followed by 1600s (delineating a time in which he lived), followed by history (describing the type of tag this metatag indicates). Heretofore most of the discussion has centered around the nodes, their structure, and the nature of their constituent parts. The most important part of this project, however, is the actual connection between each node. What kinds of connections are
1

i.e. sameness. E.g. apple equals (or, for the purposes of the database, is related to) apple due to their identically-formulated strings 2 i.e. with any additional information besides the tag itself. E.g. the string founded the city is the tag while Providence, RI is additional information that is not actually part of the tag
Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

5/9

there? How many connections are there? In short, there are as many kinds of connections as there are kinds of phenomena, but the number of connections is infinite because, when completed, the database is a closed loop. This doesnt mean that the database is a closed system, unable to cope with dynamic inputs, new node creation, and adaptation to the world observed and entered by the users. Rather, because everything is connected to something at least once besides the reflexive connection that it exists in this database, there is no end to the connectivity. For example, see the connection between Parmenides (an ancient philosopher who lived in Elea), Elea (a town in what is now Italy), and Italy (home of Parmenides). Querying the database: There are several ways to query the database; the first two queries are the primary queries: 1. One predicate node query: brings up a display of the node and neighboring nodes with their connections between them. You can expand search to a limiting number N (the number of nodes between each terminal node) nodes distance from the original. 2. Two (three) predicate node query: brings up a display of the two nodes with a path or paths between them. Path is determined by a number N (the number of nodes between each terminal node). If the number of nodes between the terminal nodes is greater than N, the path is not displayed or calculated. No neighboring nodes are displayed, either at the terminal nodes or off of intermediate nodes. 3. Tag query: brings up a list of all nodes with that tag. 4. Metatag query, tag: brings up a list of all tags with that metatag. 5. Metatag query, node: brings up a list of all nodes with that metatag.

Each query can also be modified: 1. For queries 1-3 display/find only nodes with: a. Defined tag (1-2) b. Defined metatag (1-3) 2. For queries 1 and 2, display/find only nodes with R strength of relationship3 between the terminal nodes 3. For query 2, display/find only nodes within N nodes of another defined node

Explored later in this proposal


Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

6/9

These additional search parameters are not so important that they should be included in the first draft of the database. Likewise, queries 3-5 are not so important that, should time or resources not allow, they must be included in the first draft of the database. Supplementary systems: 1. Pathing system: to allow users to save thoughts, a system of saving paths between nodes should be created. A simple algorithm for creating a string to replicate the path is possible: each node has its own unique ID that can be parsed with each successive nodes ID to create a(n albeit long) unique path ID that can be input somewhere on the site to present the same path. Just as a thought is a neural impulse down a nerve pathway, so too can these pathed nodes act as thoughts that can be saved and shared. Example.4 2. Degree of association system: a system of determining the strength of connection between two nodes would be ideal at a later stage of development. This would allow for filtering results and finding more appropriate results for a query. In the brain, strength of a memory is determined by degree of association (i.e. the quantity of other neurons connected to that neuron). Strength of a node, after this fashion, could be determined simply by tallying the number of connections of others nodes to a single node. The strength of a relationship is different, however, and more pertinent to the discussion: the strength of association of one node to another could be determined by how many tags and metatags the nodes share in common. Likewise, with data mining additional routes of calculating association could arise. 3. Data mining system: a system to mine data could look at: a. Common two-predicate queries: by determining what two nodes are most often associated by query, the database could predictby the addition of a common search terms tag or something similar to each nodea relationship between two nodes. b. Existing saved paths: if a user has saved a path between two nodes, these terminal nodes must somehow be related. A tag including a hyperlink to the saved path is added.

Nuts and bolts of the database: I believe that developing a system that could create nodes by taking data from fields (noderequired, tagoptional, and metatagoptional) populated by a user would be simple. Connecting the nodes would also be simple and, more importantly, automatic: nodes would merely be connected by tag. By

Look at the URL. If you change any of the talent settings, it will change the string on top that identifies the location and quantity of points spent.
Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

7/9

clicking a node, a user may also be able to manually point to a node (referencing the nodes unique ID) and define a tag and metatag shared by each node to complete the connection. Besides creating a system for creating and connecting nodes (because the actual creation and connection will be accomplished by users, at least initially), the only real work to do is to create the database, parameters for it to be filled, andmost importantlycreating a GUI for users to input relevant information that will then be stored in the database. Querying the database should also not be too hard. Indexing the database is one way of finding connections, but if there were a way to go from node to node using a tag or metatag as the search string, no indexing would be needed and thus the search would complete faster. Indexing, however, is necessary in the long term, at least for nodes, to allow their quick identification and recall. In querying the database, three concerns are of import: 1. (Efficiency) Speed to delivery results of query 2. (Effectiveness) Accuracy of results: only related nodes are returned 3. (Effectiveness) Precision of results: only relevant nodes are returned Speed and precision are not as great concerns starting out as accuracyand accuracy should be easily achieved with appropriate tagging. Integration with Wikipedia: The sites (and databases) close integration with Wikipedia is a key part of the third principle to make the database and its information available to the body of internet users via an easy-to-use, simple graphical user interface. First, as previously mentioned, each node will have, where applicable, a reference to a Wikipedia (or more preferably a Citizendium) article. This will provide actual information by association with other nodes that the database, in its infancy, cannot. Second, with the advent of data mining capabilities, a bot will be able to passively comb Wikipedia for tags to add to nodes that are linked to that specific article. Certain phrases and words will be sought to add as tags. This will expand the connectivity of certain nodes. Third, and finally, using an improved version of the aforementioned bot, a web crawler will be able to comb Wikipedia with pre-mapped semantic algorithms to intelligently search for phrases, keywords, and discrete facts to add as tags and their associated metatags to certain nodes. For example, if the bot read was born in and 1995 within N number of words of each other, it could piece together that the person whom the article is about (and whose name serves as the node name in the database) was born in the year 1995. From this, it could create the tag born with the additional information 1995 and the metatag personal history because it recognized the phrase and pattern.

Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

8/9

Ultimately, the third step in integration, when carried to its logical end, results in a violation of the holistic principle. The more intelligent the crawling algorithms and pattern recognition, the more tags will be produced. The more tags produced, the more information stored on each node. At first glance this seems like a good idea, but it may not be. The more information stored on each node, the more each node is about the discrete node and not the patterns and similarities between them. Agreed, the information is technically being stored as an association in a tagnonetheless, the nodes and their associated tags could be combined into the reverse of step 3s integration to create a reduced (in both senses of the word) version of Wikipedia. The intention of this database is not to recreate Wikipedia, but to use it to a different end. For this reason, care must be exercised that a focus on associations and holism be taken at all times. Data integrity: The value of information returned from a database purporting to contain knowledge is directly proportional to the probability that that information is true. As such, the database must have a system of evaluating the likelihood that any information it returns to a user is actually true. This might best be accomplished in two stages. First, create truth value tag for the node, and create a truth value tag for every tag. The node tag would be an overall weighted average rating for the probability the node accurately reflects information constituting what it represents (0-100%). Each tag on the node, which is actually the information about the node itself, should have links to other tags to show support for what it says. This tag should have a tag on itself with a probability truth value like on the node. The probability values would be populated based on variables like the number of times node name (e.g. brain) is mentioned on equivalent Wikipedia article (weighted for article size as judged by number of sentences), or the number of times Google search results return summaries with node name mentioned in presence of similar nodes names (e.g. brain mentioned in same page summary as psychology), and so on. Second, create a separate truth value tag that can be populated by a voting system on each node. Users may provide input as to the accuracy, on a 100% scale, of various aspects of the node and its constituent tags. This system can be weighted against the first system to develop an overall truth probability average that can be used as a parameter when querying the database. This will give a better picture of whether the information returned from the database should be trusted (and maybe we can force Rupert Murdoch and Ted Turner to use it too). Community opportunity: The system of node and network building must become a social activity that people want to engage in to encourage its growth and secure its continued maintenance. Additionally, at every point in the development process, a dedicated core of programmer volunteers will be required. While much of the abilities the database could acquire exist in services already provided on the internet, the interfaces required to take advantage of these services do not. Furthermore, the existing structure of the brain

Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!

9/9

does not yet exist in computer form, and expertise will be required to move from seed to sapling and beyond. Once the programming has been completed to create the brain, discussion between users about nodes, connections between them (tags, metatags), and so on could take place in several ways in order to more effectively populate the brain: 1. Node/tag/metatag-based discussion pages (requires membership to post) 2. A forum with the ability to directly reference certain nodes/tags/metatags in a thread/post (requires membership to post) 3. An anonymous single board on which to post comments, requests, etc. 4. ? A medium for user-to-user communication within the confines of the site is absolutely necessary early in the systems development. Sourceforge has an excellent open source software community, but I would be willing to host a site dedicated to the project if it would be significantly helpful or necessary.

Copyright jdgn0 2010, 2011, 2012; all rights reserved. FEEL FREE TO REDISTRIBUTE PROPOSAL AND ANY DERIVATIVE CODE SUBJECT TO COPYLEFT/GPL3 T&C AND IDEALS!