Académique Documents
Professionnel Documents
Culture Documents
Abstract—YouTube provides a lot of videos that will be IV describes data preparations, includes data structure being
able to create dataset. YouTube video has some characteristics used. Section V shows the IST-EFP algorithm. Section VI
on number of views, likes, dislikes and comments. Association discuss about the experiments, includes obtained numbers.
rules mining able to find the most dominant item in a dataset. Finally, section VII extracts the conclusion.
This research investigates 40 random videos on YouTube by
implementing association rules mining algorithm to find what
is the most ingredients used in Indonesia cooking recipes. This II. METHOD
research found that the most liked video use 2 main ingredient Research method applied on this research to achieve the
which are garlic and onion. This research also implements IST- goal is divided into three processes. The first one is to create
EFP algorithm for reducing the dimensional of the dataset a dataset by doing a pre-process on YouTube videos. Pre-
without loss on important rules obtained. This research found process is done by implementing ETL mechanism using
IST-EFP able to reduce 19% on dataset dimension with 0.7% Oracle SQL Developer tools [5][6]. The second process is
loss on rules obtained. applying association rules mining algorithm directly onto
YouTube Cooking recipes dataset to obtain ingredient
Keywords—association rule mining, YouTube dataset, IST-
EFP algorithm, cooking recipes ingredients
patterns. The third one is reducing original dataset with IST-
EFP algorithm to obtain reduced dataset and then processing
it with association rules mining to obtain another ingredient
I. INTRODUCTION pattern [3]. Both obtained rules compared on ingredient
YouTube is one of a huge video hosting that is exists, patterns to find the level of similarity.
dataset might be produced from YouTube [1]. One of
YouTube video genre is cooking recipes. Indonesian cooking
recipes is one kind of cooking recipes genre. Every YouTube Start
videos has characteristic values such as views, like, dislike
and comments. If a video got many likes means user likes it.
YouTube
Cooking recipes will consist of some ingredients. They will Cooking
be able to be recognized about the items composition of the Recipes
Data
ingredients [2].
Association Rules Mining (ARM) is one of data mining
technique for identifying relation between several items on a Pre-Process
Dislikes
IV. DATA PREPARATIONS
The first step of this research is collecting YouTube Comments
cooking recipes dataset. This task is done by watching some
videos on YouTube and collecting the cooking recipes
Access_Date
manually. Obtained data then transferred into database by
implementing ETL mechanism using Oracle SQL Developer.
40 YouTube videos are used in this research. The YouTube Fig. 2. Cooking Recipes ERD
video that being used was published between January 2018
until May 2018. The reason of choosing the period is to
obtain the convenient statistic of data by limiting time
periods because there may be customize duplicate videos that V. IST-EFP ALGORITHM
are published by several users if the period is too long. IST-EFP algorithm is an algorithm that able to reduce
dimensional time series dataset about 2.33% [3]. IST-EFP
TABLE I. RECIPES TABLE STRUCTURES algorithm implements intersection of set theory in EFP
(Expand FP-Growth) algorithm. EFP algorithm itself is a FP-
RECIPES Growth algorithm integrated with table on database [7]. The
ATTRIBUTE DATA_TYPE implementation of IST-EFP algorithm in the research is done
Id (PK) NUMBER by implementing PL/SQL Scripts [3, 4, 8, 10, 14, 15].
Title VARCHAR2(128)
Link VARCHAR2(64) IST-EFP(Dataset, minSupCount)
Date_Published DATE 1. X = Dataset
2. X1 = CREATE temporary table FROM X WHERE
Views NUMBER
COUNT(*) > minSupCount
Likes NUMBER 3. Y1 = CREATE EFP table FROM X1
Dislikes NUMBER 4. Z = Y1 X on Y1.previtem IS NOT NULL
Comments NUMBER 5. Return Z
Access_Date DATE
Fig. 3. IST-EFP Algorithm
VI. EXPERIMENTS & DISCUSSIONS At the first process, data gathering already done and
stored in database based on structure on Table I and Table
Based on the research method stated, this research will do
II, found 71 ingredients used. At second process, YouTube
3 main processes, each process will follow software cooking recipes processed with association rules algorithm
engineering flows [12] and database design theories [13]. and the support values obtained can be seen on Table III.
Each process will be tested by using black box testing to Table III shows onion and garlic have strongest relation of
make sure the output is valid compared with the manual all ingredients. Table IV shows that garlic and onion have
calculations [9][11]. strongest confidence value. Means that on Indonesia
cooking recipes almost all using onion and garlic together.
TABLE III. SUPPORT VALUES
TABLE VI. IST-EFP’S CONFIDENCE VALUES
SUPPORT TOTAL SUPPORT
ITEMSETS COUNT LENGTH TRX PCT
onion, garlic 30 2 41 73.17 X Y CONFXY CONFYX