Collaborative Tagging and Taxonomy by Vector Space Approach

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248
Volume: 3 Issue: 11 396 – 402

_______________________________________________________________________________________________
Collaborative Tagging and Taxonomy by Vector Space Approach
1 2 3
R. Sathya, Mrs. K.V. Sumathi, Mrs. K. K. Kavitha,
M. Phil Scholar, Department of MCA., M. Phil, Assistant Professor, M.C.A., M.Phil., SET., (Ph.D)., Vice
Computer Science, Department of Computer Science, Principal, Head of the Department of
Selvamm Arts and Science College Selvamm Arts and Science College Computer Science, Selvamm Arts and
(Autonomous) Namakkal (Tk) (Dt) – (Autonomous) Science College (Autonomous)
637003. Namakkal (Tk) (Dt) – 637003. Namakkal (Tk) (Dt) – 637003
Abstract:- Collaborative tagging or group tagging is tagging performed by a group of users usually to support in re-finding the items. The
limberness of tagging allows users to classify their collections of items in the ways that they find useful, but the personalized variety of
expressions can present challenges when searching and browsing. When users can liberally choose tags (users create and apply public tags to
online items as different to selecting terms from a proscribed terminology based on the users feedback), the resulting metadata can consist of
homonyms (the same tags used with dissimilar implication) and synonyms (multiple tags for the same concept) which may direct to
inappropriate connections between items and wasteful searches for information about a subject.
Collaborative tagging requires the enforcement of method that enables users to protect their privacy by allowing them to hide certain
user-generated contents without making them useless for the purposes they have been provided in a given online service. This means that
privacy-preserving mechanisms must not harmfully affect the service truthfulness and usefulness.The proposed approach defends the user
privacy to a certain level by reducing the tags that make a user profile let somebody see partiality toward certain categories of interest or
feedback.
__________________________________________________*****_________________________________________________
INTRODUCTION Data mining is a multidisciplinary field, draws

GENERAL BACKGROUND work from areas including database technology, machine
Data mining also popularly referred to as learning, statistics, pattern recognition, information retrieval,
Knowledge Discovery from Data (KDD), is the automated neural networks, knowledge-based systems, artificial
or convenient extraction of patterns representing knowledge intelligence, high-performance computing, and data
implicitly stored or catchable in large databases, data visualization. The discoveries of patterns hidden in large
warehouses, the Web, other massive information data sets are focusing on issues relating to their feasibility,
repositories, or data streams. usefulness, effectiveness, and scalability.
Data mining and analysis
Problem Identification Business Strategies
Creation of Analytical Data

Environment Develop Tactics and Programs
Application of Data Mining

Tools
Implementation and Tracking Evaluate and Measure
Fig 1.1 Data Mining Basics
396
IJFRCSCE | November 2017, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________
Volume: 3 Issue: 11 396 – 402
_______________________________________________________________________________________________
The overall goal of the data mining process is to visualization, and online updating. It is useful for computing
extract information from a data set and transform it into an science students, application developers, and business
understandable structure for further use. Aside from the raw professionals, as well as researchers.
analysis step, it involves database and data management Generally, data mining is the process of analyzing
aspects, data pre-processing, model and inference data from different perspectives and summarizing it into
considerations, interestingness metrics, complexity useful information - information that can be used to increase
considerations, post-processing of discovered structures, revenue, cuts costs, or both.
natural selection in a design based on the concepts of natural
DIFFERENT LEVELS OF DATA MINING evolution.
Artificial neural networks: Non-linear predictive models Data visualization: The visual interpretation of complex
that learn through training and resemble biological neural relationships in multidimensional data. Graphics tools are
networks in structure. used to illustrate data relationships.
Genetic algorithms: Optimization techniques that use
processes such as genetic combination, mutation, and
Classification
Regression
Predictive
Time Series Analysis
Prediction
Data Mining Clustering
Descriptive Summarization
Association Rule
Sequence Discovery
Fig 1.2 Data Mining Level
Decision trees: Tree-shaped structures that represent sets of Nearest neighbor method: A technique that classifies each
decisions. These decisions generate rules for the record in a dataset based on a combination of the classes of
classification of a dataset. Specific decision tree methods the k record(s) most similar to it in a historical dataset
include Classification and Regression Trees (CART) and (where k 1). Sometimes called the k-nearest neighbor
Chi Square Automatic Interaction Detection (CHAID) . technique.
CART and CHAID are decision tree techniques used for Rule induction: The extraction of useful if-then rules from
classification of a dataset. They provide a set of rules that data based on statistical significance.
you can apply to a new (unclassified) dataset to predict PURPOSE OF DATA MINING
which records will have a given outcome. CART segments a The important general difference in the focus and
dataset by creating 2-way splits while CHAID segments purpose between Data Mining and the traditional
using chi square tests to create multi-way splits. CART Exploratory Data Analysis (EDA) is that Data Mining is
typically requires less data preparation than CHAID. more oriented towards applications than the basic nature of
the underlying phenomena.
397
_______________________________________________________________________________________
Volume: 3 Issue: 11 396 – 402
_______________________________________________________________________________________________
Data collection
Data Display
Data reduction
Conclusions: drawing/
verifying
Fig 1.3 Data Mining Usages
Data Mining is often considered to be "a blend of to extract knowledge from the web data, web documents and
statistics, AI [artificial intelligence], and data base research hyperlinks between the documents. Where the web is
which until very recently was not commonly recognized as a universal information platform space which can be accessed
field of interest for statisticians, and was even considered by by companies, universities, businessman etc. Generally,
some a dirty word in Statistics. web hold there are numerous sources of information like
WEB MINING internal sources and external sources.
Web mining helps to extract useful information
from the web pages. Various we mining techniques are used
Fig 1.4 Web Mining
Internal sources are those which include personal among websites that are called “Hyperlinks”. Hyperlinks
information of any organization and external sources are are further divided into two categories which are listed as
those which include information of clients, vendors, below:-
suppliers, intranet and extranet etc. In this research paper,  Internal hyperlinks that lead to pages within the
we can divide us mining into three categories which are same web page.
listed as:  External hyperlinks that lead to other web pages.
a) Web Structure Mining.  Document structure is basically a schema language
b) Web Usage Mining. for XML which helps to describing a valid XML
c) Web Content Mining. documents.
Web Usage Mining
Web Structure Mining It holds the knowledge discovered by users which
It consists of web pages as nodes as hyperlinks and are navigating through the websites. files. It is further
edges connecting related pages. It basically tells the divided into two categories which are listed as follows
structural layout of the web. It also used the connectivity
398
_______________________________________________________________________________________
Volume: 3 Issue: 11 396 – 402
_______________________________________________________________________________________________
 Application Server Data: It holds the business maintaining the integrity of a tagging system
transactions and also makes their repository in presents a considerable logistical problem.
applications server log.  It is loosely classify resources based on end-user‟s
 Web Server Data: In these logs are made by the feedback, expressed in the form of free-text labels
web server. It also includes the field of IP address (i.e., tags). The novelty of such an approach to
means the number of web pages accessed with content/resource categorization has been seen, in
access times. recent years, as a challenging research topic.
Web Content Mining  The undefined semantics of tags, which are per se
It holds the knowledge discovery by going through ambiguous and expressed in multiple languages,
the web pages contents like image, videos etc. Intelligent makes it difficult to enforce semantic
agents help to solve the problem of indexing in search interoperability and to grant a reasonable level of
engines otherwise it will result in delivery imprecise results accuracy when determining the “meaning” of a tag.
but information overloading. It also helps to select much  Tag prediction concerns the possibility of
more relevant documents. identifying the most probable tags to be associated
with a non-tagged resource, whereas tag
WEB MINING SECURITY recommendation is meant to suggest to users the
Web mining has emerged as an important branch of tags to be used to describe resources they are book
data mining. This is mainly due to the tremendous amount marking. In both cases, existing approaches apply
of information available from the Web, which attracted techniques usually enforced in recommendation
many research communities, and the recent interest of e- systems.
commerce.  Another interesting issue concerns the exploitation
OBJECTIVES of the “explicit” relationships between users (i.e.,
 To introduce a user-assisted friend grouping mechanism the actual social network underlying a folksonomy
that enhances traditional group-based policy to address issues like annotation relevance and/or
management approaches. Minister to friend grouping trustworthiness. Privacy protection in social
leverages proven clustering techniques to aid users in tagging services is another issue that has not been
grouping their friends more effectively and efficiently. thoroughly investigated.
 To found measurable agreement between clusters and SCOPE OF RESEARCH WORK
user-defined relationship groups. In addition, user This proposed system‟s methodology could be
perceptions of the improvements should be implemented in the following real world applications:
encouraging.  Marketing strategy analysis
 To introduce a new privacy management model that is  Social communication application of a individual
an enhancement over conventional group-based policy organization
management approaches.  Rating process over online
 To leverage a user‟s memory and attitude of their An enhanced collaborative tagging system that
friends to situate policies for other associated friends, consists of a “traditional” book marking service, such as
which refer to as Same-As Policy Management. Delicious, and two main additional services built on top tree
PROBLEM DEFINITION view. Such services address two main issues. The former
Collaborative tagging systems such as Delicious, allows end users to specify policies that can be used either to
last.FM, and Bibsonomy are valuable components of the explicitly denote resources of interests or to enforce
Social Web. They allow users, firstly, to organize their own blocking conditions on the browsed data. The latter features
data with a level of freedom not possible in traditional a specific PET, namely, tag suppression, to preserve the
taxonomic filing systems whether it is web bookmarks, privacy of registered users by hiding the specific
music collections, or academic journal references. Secondly, characteristics of their profiles. Such architecture is a
they provide users a means to openly share this information specific implementation of the multilayer framework
so that friends and colleagues can easily communicate with presented. with the relevant difference that in [5] the privacy
each other about their latest discoveries. The major issues of layer is missing. Lastly, we would also like to emphasize
the existing approaches are that our approach is not limited to the specific book marking
 To allow anyone to utilize the collective application here contemplated, i.e., Delicious. As a matter of
knowledge of others for discovering new resources fact, it could be built on top of any collaborative tagging
and perhaps even new friends. These benefits, system.
however, are only as powerful as the system is Nevertheless, if tags were not sensible information
trustworthy. As with any open adaptive system, per se, they could easily be exploited to infer users‟ personal
399
_______________________________________________________________________________________
Volume: 3 Issue: 11 396 – 402
_______________________________________________________________________________________________
information, such as personal interests, preferences, and process of labeling is called tagging. Users can re-find the
opinions. This is even easier when it is possible to information later by means of those tags that they have
statistically analyze huge collections of tags as those made created. Also, by tagging users can store resources for their
publicly available by social bookmarking services, thus future retrieval.
obtaining accurate tag-based user profiles. In this field, COLLABORATIVE TAGGING SYSTEM
privacy-preserving techniques should guarantee, at the same Collaborative tagging is a classification by the users and
time privacy protection and the correctness of the results for the users. It is a social, decentralized and complex
obtained by analyzing the data set. network where many annotations, generally provided by
LITERATURE REVIEW interrelated groups of individuals, are organized so to link
Wu, L. Zhang, and Y. Yu et al[2] proposed explore resources and tags. Each resource item can be associated
a complement approach that focuses on the"social with many different tags, rather than with a single branch of
annotations of the web" which are annotations manually a hierarchy.
made by normal web users without a pre-defined formal
ontology. Compared to the formal annotations, although With tags chosen freely from common language and
social annotations are coarse-grained, informal and vague, associated with web resources that are interesting for users
they are also more accessible to more people and better (such as photographs, videos, web links and documents),
reflect the web resources' meaning from the users' point of collaborative tagging offers a sense of community in
views during their actual usage of the web resources. But managing resources and results in a process of knowledge
this system focuses only on bookmarking web based system. construction. Users can share their resources with others,
Also this system not considers the tagging approach. discover resources through the collaborative network, and
B. Markines, C. Cattuto, F. Menczer, D. Benz, A. contact people with similar interests. The benefit of
Hotho,and S. Gerd et al[3] proposed Here they build an collaborative tagging systems comes from the many views
evaluation framework to compare various general of the mass, rather than from a dominant opinion supplied
folksonomy-based similarity measures, which are derived by a few.
from several established information theoretic, statistical, In fact, the form of tagging tends to stabilize over time
and practical measures. Their framework deals generally and because people usually choose to use the tags in three ways:
symmetrically with users,tags, and resources. For evaluation  Imitation, users are easily affected by the tags
purposes we focus onsimilarity between tags and between that were previously applied by others to the
resources and consider different methods to aggregate same page;
annotations across users.This approach shows that they can  Habit, users re-use tags that they have already
define relations of users from tags. This is important from used on other pages respect to their
privacy preserving point of view. background and culture;
C. Marlow, M. Naaman et al[4] provide a short  Recommendation, users choose tags that are
description of the academic related work to date. They offer suggested by a given interface.
a model of tagging systems, specifically in the context of COLLABORATIVE METHODS
web-based systems, to help us illustrate the possible benefits Several recommendation systems use a hybrid approach
of these tools. Since many such systems already exist, they by combining collaborative and content-based methods,
provide a taxonomy of tagging systems to help inform their which helps to avoid certain limitations of content-based and
analysis and design, and thus enable researchers to frame collaborative systems. Different ways to combine
and compare evidence for the sustainability of such systems. collaborative and content-based methods into a hybrid
They also provide a simple taxonomy of incentives and recommender system can be classified as follows:
contribution models to inform potential evaluative  Implementing collaborative and content-based
frameworks. They present a preliminary study of the photo- methods separately and combining their
sharing and tagging system Flicker to demonstrate our predictions,
model and explore some of the issues in one sample system.  Incorporating some content-based characteristics
Hence this paper is just giving us the basic idea about how into a collaborative approach,
tag functionality works in web based systems. This is  Incorporating some collaborative characteristics
important to clear basic ideas about tagging. into a content-based approach, and
METHODOLOGY  Constructing a general unifying model that
INTRODUCTIONTAG incorporates both content-based and collaborative
A tag is a user-contributed metadata, providing a mean characteristics.
of information or content item, created freely by users with TAG SUPPRESSION
personally salient keywords or labels, known as tags. The
400
_______________________________________________________________________________________
Volume: 3 Issue: 11 396 – 402
_______________________________________________________________________________________________
In our scenario of collaborative tagging, users tag  It is useful for developing software components
resources on the web, for example, music, pictures, videos which are suitable for deployment in the distributed
or bookmarks, according to their personal preferences. Users environments. This supports internationalization.
therefore contribute to describe and classify those resources, CHARACTERISTICS OF C#
but this is inevitably at the expense of revealing their profile.  Garbage Collection: the memory management
To avoid being accurately profiled by tagging systems or in feature leads all managed objects. Garbage
general by any attacker able to collect such information, collection is a feature .NET. The C# uses it during
users may adopt a privacy-enhancing technology based on the runtime.
data perturbation. The data-perturbative technology  Indexes: C# has indexes which help to access
considered in this work is tag suppression, a technique that value in a class with an array like syntax programs.
allows a user to refrain from tagging certain resources in  Exception Handling: .NET standardizes the
such a manner that the profile resulting from this exception handling across languages. C# offers the
perturbation does not capture their interests so precisely. conditional keyword to control the flow and make
PROPOSED METHOD PROCEDURE the code more readable.
The proposed methodology of the thesis is  Versioning: C# programming supports this
implemented with the following modules and procedures to versioning. The .NET solves the versioning
experiment the proposed methodology. problem and enables the software developer to
 User Registration specify version dependencies between the different
 Post Content pieces of software.
 View Posts  Extensive Inter-operability: All enterprise
 General Suppression Word software application can be managed easily by type
 Strict Suppression Word safe environment. This extensive inter-operability
 My Tag Cloud makes C# which is the obvious choice for the
 Add Child User software developers.
 View Child Users EXPERIMENTAL RESULTS
 Policy/ Resource Recommendation INPUT FORM DESIGN USER LOGIN PAGE
 Policy/ Parental Control
 Change Password
EXPERIMENTAL RESULTS AND DISCUSSION
IMPLEMENTATION SOFTWARE
The empirical system is designed and implemented
by using the Microsoft visual studio .net as a front end tool.
And the coding language used is C#.net. Microsoft SQL
Server used as a back end tool. Visual studio is an integrated
development environment which is used in this thesis for
designing the thesis experiments.
FEATURES OF C#.NET
C# is a Microsoft„s new language designed for its
USER OPTION
new platform “.NET”. It is fully object oriented language
like java and is the first component-oriented language.
Because it contains integral supports for writing the software
components. C# is designed for building robust, reliable and
durable components to handle real world application. The
C# language specification stated the objectives and features
of C#:
 It is simple, modern, general purpose and object
oriented programming language.
 This provides a support for the software analysis
principles such as strong type checking, array
bounds checking, detection of attempts to use
uninitialized variables and automatic garbage
collection.
401
_______________________________________________________________________________________
Volume: 3 Issue: 11 396 – 402
_______________________________________________________________________________________________
CONCLUSION Collaborative Computing: Networking, Applications and
Collaborative tagging is currently an enormously Work sharing, pp. 126-144, 2008.
popular online service. Although nowadays it is basically [6] Frıás-Martinez.E, Cebriań.M , and Jaimes.A , “A Study
on the Granularity of User Modeling for Tag Prediction,”
used to support resource search and browsing, its achievable
Proc. IEEE/WIC/ACM Int‟l Conf. Web Intelligence
is still to be demoralized. One of these potential applications
Intelligent Agent Technology (WIIAT), pp. 828-831,
is the provision of web access functionalities such as content 2008.
filtering and innovation. For this to become a reality,
however, it would be necessary to extend the architecture of
current collaborative tagging services so as to include a
policy layer that supports the enforcement of user
inclinations.
Collaborative tagging has been gaining popularity,
it have been become more obvious the need for privacy
safeguard; not only because tags are susceptible information
but also because of the risk of cross referencing. In addition
to the existing system approaches, the proposed system
takes care of multi language tagging.
FUTURE ENHANCEMENTS
A privacy preserving collaborative tagging if
functional to content with various languages, and then it
becomes more efficient to beneficial to end users. Future
work includes the development of a full prototype for the
experimented system and it‟s testing and use in further
scenarios.
The proposed methodology can be enhanced and
implemented with the various applications like
 Official information sites,
 Employees skills registry portals,
 Other official or personal related
confidential information management
system through online.
 In future the algorithm will be modified or
redefined to improve the efficiency of the
methodology.
REFERENCES
[1] Adomavicius.G and Tuzhilin.A, “Toward the Next
Generation of Recommender Systems: A Survey of the
State-of-the-Art and Possible Extensions,” IEEE Trans.
Knowledge Data Eng., vol. 17,no. 6, pp. 734-749, June
2005.
[2] Barnes.S.B, “A Privacy Paradox: Social Networking in
the United States,” First Monday, vol. 11, no. 9, Sept.
2006.
[3] Bischoff .K ,Firan . C . S, Nejdl . W, and Paiu . R , “Can
All Tags Be Used for Search?” Proc. 17th ACM Conf.
Information and Knowledge Management (CIKM), pp.
193-202, 2008.
[4] Bundschus.M, Yu.S, Tresp.V, Rettinger.A , Dejori.M ,
and Kriegel.H.P, “Hierarchical Bayesian Models for
Collaborative Tagging Systems,” Proc. IEEE Int‟l Conf.
Data Mining (ICDM), pp. 728-733,2009.
[5] Carminati.B, Ferrari.E , and Perego.A, “Combining
Social Networks and Semantic Web Technologies for
Personalizing Web Access,” Proc. Fourth Int‟l Conf.
402
_______________________________________________________________________________________

Collaborative Tagging and Taxonomy by Vector Space Approach

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Collaborative Tagging and Taxonomy by Vector Space Approach

Transféré par

Droits d'auteur :

Formats disponibles

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248

Volume: 3 Issue: 11 396 – 402

INTRODUCTION Data mining is a multidisciplinary field, draws

Data mining and analysis

Problem Identification Business Strategies

Creation of Analytical Data

Application of Data Mining

Implementation and Tracking Evaluate and Measure

Fig 1.1 Data Mining Basics

Data Mining Clustering

Fig 1.2 Data Mining Level

Fig 1.3 Data Mining Usages

Fig 1.4 Web Mining

Vous aimerez peut-être aussi