Académique Documents
Professionnel Documents
Culture Documents
(2017)
The Jaspersoft package is one of the open source leaders for producing reports from
database columns. The software is well-polished and already installed in many
businesses turning SQL tables into PDFs that everyone can scrutinize at meetings.
The company is jumping on the big data train, and this means adding a software layer
to connect its report generating software to the places where big data gets stored. The
Jasper Reports Server now offers software to suck up data from many of the major
storage platforms, including MongoDB, Cassandra, Redis, Riak, CouchDB, and
Neo4j. Hadoop is also well-represented, with JasperReports providing a Hive
connector to reach inside of HBase. This effort feels like it is still starting up -- many
pages of the documentation wiki are blank, and the tools are not fully integrated. The
visual query designer, for instance, doesn't work yet with Cassandra's CQL. You get
to type these queries out by hand. Once you get the data from these sources,
Jaspersoft's server will boil it down to interactive tables and graphs. The reports can
be quite sophisticated interactive tools that let you drill down into various corners.
You can ask for more and more details if you need them. This is a well-developed
corner of the software world, and Jaspersoft is expanding by making it easier to use
these sophisticated reports with newer sources of data. Jaspersoft isn't offering
particularly new ways to look at the data, just more sophisticated ways to access data
stored in new locations. I found this surprisingly useful. The aggregation of my data
was enough to make basic sense of who was going to the website and when they were
going there.
Tableau Desktop is a visualization tool that makes it easy to look at your data in new
ways, then slice it up and look at it in a different way. You can even mix the data with
other data and examine it in yet another light. The tool is optimized to give you all the
columns for the data and let you mix them before stuffing it into one of the dozens of
graphical templates provided. Tableau Software started embracing Hadoop several
versions ago, and now you can treat Hadoop "just like you would with any data
connection." Tableau relies upon Hive to structure the queries, then tries its best to
cache as much information in memory to allow the tool to be interactive. While many
of the other reporting tools are built on a tradition of generating the reports offline,
Tableau wants to offer an interactive mechanism so that you can slice and dice your
data again and again. Caching helps deal with some of the latency of a Hadoop
cluster. The software is well-polished and aesthetically pleasing. I often found myself
reslicing the data just to see it in yet another graph, even though there wasn't much
new to be learned by switching from a pie chart to a bar graph and beyond. The
software team clearly includes a number of people with some artistic talent.
- Big data tools: Splunk
Splunk is a bit different from the other options. It's not exactly a report-generating
tool or a collection of AI routines, although it accomplishes much of that along the
way. It creates an index of your data as if your data were a book or a block of text.
Yes, databases also build indices, but Splunk's approach is much closer to a text
search process.
- Privacy: The privacy is the most sensitive issue, with conceptual, legal, and technological
implications. This concern increases its importance in the context of big data. Privacy can
also be understood in a broader sense as encompassing that of companies wishing to
protect their competitiveness and consumers and stages eager to preserve their
sovereignty and citizens.
- To rethink security for information sharing in big data use cases: Many online services
today require us to share private information (i.e., Facebook, LinkedIn, etc.), but beyond
record-level access control we do not understand what it means to share data, how the
shared data can be linked, and how to give users fine-grained control over this sharing.
The size of big data structures is also a crucial point that cans constraint the performance
of the system. Managing large and rapidly increasing volumes of data has been a
challenging issue for many decades. In the past, this challenge was mitigated by
processors getting faster, which provide us with the resources needed to cope with
increasing volumes of data. But there is a fundamental shift underway now considering
that data volume is scaling faster than computer resources.
- Size issue: the larger the data set to be processed, the longer it will take to analyze. The
design of a system that effectively deals with size is likely also to result in a system that
can process a given size of data set faster. However, it is not just this speed that is usually
meant when we refer to speed in the context of big data. Rather, there is an acquisition
rate challenge in the ETL process. Scanning the entire data set to find suitable elements is
obviously impractical. Rather, index structures are created in advance to permit finding
qualifying elements quickly.
- Working with new data sources: The relevance and harshness of those challenges will
vary depending on the type of analysis being conducted, and on the type of decisions that
the data might eventually inform. The big core challenge is to analyse what the data is
really telling us in a fully transparent manner.
- In multimedia big data: The semantic gap between semantics and video visual appearance
is a challenge towards automated ontology-driven video annotation. . Ontology builds a
formal and explicit representation of semantic hierarchies for the concepts and their
relationships in video events, and allows reasoning to derive implicit knowledge. With
the rapid growth of video resources on the world-wide-web, for example, on YouTube
alone, 35 hours of video are unloaded every minute, and over 700 billion videos were
watched in 2010.
Conclusion
- As the conclusion, effective big data management helps companies locate valuable
information in large sets of unstructured data and semi-structured data from a variety
of sources, including call detail records, system logs and social media sites. Most big
data environments go beyond relational databases and traditional data warehouse
platforms to incorporate technologies that are suited to processing and storing non-
transactional forms of data. The increasing focus on collecting and analysing big data
is shaping new platforms that combine the traditional data warehouse with big data
systems in a logical data warehousing architecture. As part of the process, the must
decide what data must be kept for compliance reasons, what data can be disposed of
and what data should be kept and analysed in order to improve current business
processes or provide a business with a competitive advantage. This process requires
careful data classification so that ultimately, smaller sets of data can be analysed
quickly and productively.
References
Dr. Borne, Kirk. (2014, April 14). Top 10 Data Challenges A Serious Look at 10 Big Data Vs.
Retrieved from https://mapr.com/blog/top-10-big-data-challenges-serious-look-10-big-data-vs/
http://searchdatamanagement.techtarget.com/definition/big-data-management
Wayner, P. (2012, April 18). 7 top tools for taming big data. Retrieved from
http://www.infoworld.com/article/2616959/big-data/7-top-tools-for-taming-big-data.html
Emran, N.A., 2015. Data completeness measures. In Advances in Intelligent Systems and
Computing. pp. 117130.
Leza, F.N.M. & Emran, N.A., 2014. Data accessibility model using QR code for lifetime
healthcare records. World Applied Sciences Journal, 30(30), pp.395402.