Vous êtes sur la page 1sur 12

FUNDAMENTAL

GUIDE FOR
MIGRATING
CONTENT TO
AEM
www.tothenew.com
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

CONTENT
1. Introduction 3

2. Manual vs. Automated Content Migration 4

3. Content Migration Flow 5

4. ETL Cycle 5
4.1 Extraction 5

4.2 Transformation 6

4.3 Load 6

5. Content Migration via ETL Tool (Talend) 8

5.1 Pre-Migration Cleanup 9

5.2 Extraction & Transformation 9

9
5.3 Post Migration Cleanup
10
5.4 Packaging

6. Conclusion 10

7. About the Author 10

8. About TO THE NEW 11

2
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

1. INTRODUCTION
Organizations, big or small, almost always have agreements and so on over to the new system.
to deal with upgrading their systems because of This moving of “old” data into a “new” system/
the fast evolving technology landscape. Data application requires migration.
produced using legacy systems is an important
While there are many use cases of content
asset for an organization which it would like to
migration, this ebook would focus on handling
migrate to the new system. For instance, any
migration to Adobe Experience Manager(AEM).
company would like to retain attendance
records of all employees, policy documents,

3
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

2. MANUAL VS. AUTOMATED CONTENT


MIGRATION
Content migration activities in AEM would 2. Automated
involve migrating all digital assets (pages,
images, videos, and even code files like CSS, Use tools and methodologies to automate

Javascript, JSP, Java etc.) over TO THE NEW the entire process. While initial development

system. Two broad ways to go about content effort would be high and may be complex

migration would be: (depending on the content being migrated).


For large migration projects automation is the
only practical way forward.
1. Manual
Automated migrations can be divided into
Put in a team of 10 awesome content authors 3 phases (Extract Transform Load). There are
and let them loose, manually migrating content some ETL tools that can help in migration. One
from old system over TO THE NEW system. such ETL tool Talend , is discussed in this E-book.

To sum it up, following is a comparative analysis between Manual/Automated content migration

Manual Automated
Easy (“Lift” and “Shift” approach) Initial Difficulty (requires custom development,
and initial ramp up required to work with the
tool)

Tight control over migration process and Not all content can be easily migrated such as
cleaning up of outdated information residing in backend code, complex objects tied to legacy
existing system system etc

Time consuming (Expensive) Time Efficient


2 - 3 days ~1000 pages
10 days for small site ~1000 pages 4 - 6 days ~ 15K pages
15-30 days for ~ 15K pages (Indicative figures. Includes any custom
(Figures are indicative) development time)

For plain content migration, no technical/ Requires some amount of technical/coding skills
coding skills required

Error prone due to manual intervention Once tested properly, should lead to much
(Links rewriting, tags mapping etc) lesser defects over large number of pages

4
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

3. CONTENT MIGRATION FLOW


The basic content migration flow needs to cater For Instance, Old content url: www.legacycms.
to the following points: com/new-festivities-begin.html

1. Obtain an inventory of content that needs to New content path: /content/newcms/en/2015/


be migrated. For a web application this would jun/new-festivities-begin.html
typically have digital assets (images, videos,
Similarly digital assets (images, videos, binary
CSS, JS, HTML files and other binary objects).
content etc) could be migrated to AEM DAM.
2. Clean up/Sanitize source content for any
4. Extending from the previous point, ensure
potential issues such as encoding issues
URLs and bookmarks work after migration. So
(Especially with asian languages), broken
even though the internal content structure is
links, tags mapping, preserving parent child
changed by organizing it logically for better
relationship etc.
manageability, end users should not have to
3. Organise content in logical hierarchy as per experience broken links or re-direction to error
new site structure in AEM. Example, migrating pages. This could be done post migration as
from a legacy CMS to AEM, where in legacy well via web server configurations.
CMS, all pages might be maintained within a
5. Upload this transformed content to AEM.
single root, whereas in AEM this would typically
Some common approaches will be discussed in
be bucketed using some convention such as
this ebook.
page publish date, or page creation date etc.

4. ETL CYCLE
ETL tools extract data from heterogeneous or 4.1 Extraction
homogeneous resources, transform it to be
stored in similar format for analysis, and load the Content extraction from legacy system is one

data into target source. There are many such of the major and most important task. Usually

tools available in the market and Talend Open legacy systems would export data in some form

Studio is one such tool and is the major focus or the other. Some of the common ways for

area for this ebook. exporting content could be in flat file structures

5
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

such as text, XML, CSV etc. while others might (i) Using Sling Post Servlet
just expose their underlying storage (such as
a database) from which content could be AEM uses Apache Sling as the underlying web

extracted using API level integrations. application framework. Apache Sling provides
Sling Default Post Servlet implementation which
is a great way to modify or create content in a
JCR repository via simple HTTP requests.
4.2 Transformation
In this approach, source content sanitization
Content transformation is the step where source
and organizing content in new site structure
content is transformed into desired content
would typically be carried out beforehand.
structure to meet new system’s requirement.
This sanitized and logically structured content
Extracted content may not always be in desired could then be POSTed to SlingDefaultPostServlet
format and hence it becomes necessary to do which will create the necessary node structure
a pre-migration cleanup/sanitization of source in the underlying JCR repository to represent
content. Basic cleanup/sanitization includes : that content.

• Any links rewriting, tags mapping etc. (ii) Content Loader in CRX

• There might be a scenario where some of the AEM’s repository (CRX) provides Content loader
information need not to be moved TO THE NEW feature which is one of the ways to upload
system e.g. any outdated information, unused content to repository. Though it should be noted
assets etc. that this feature is deprecated in latest versions
of CRX.
• Special character handling and encoding are
common issues that one faces during migration (iii) Package Manager
which should be handled during transformation.
One of the most common ways to install/
upload content (including deployable code)
to an AEM repository is Package Manager
4.3 Load utility. It is an advanced tool for defining,
creating, and managing content packages.
This step simply refers to process of loading
Content packages can be uploaded to and
content on target (AEM) system.This is a very
downloaded from the repository, as well as
major and important step in the content
created and downloaded on-the-fly. Package
migration process because the method of
definition supports multiple node hierarchies
content load has a lot of bearing on the previous
and advanced content filtering for maximum
(Transformation) step.
flexibility.
Listed following are 3 ways to load content into
AEM.

6
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

A package is essentially a zip file holding Packages include content, both page-content
repository content in the form of a file-system and project-related content, selected using
serialization (called “vault” serialization). filters.

Default Sling Post Servlet Package Manager


Versatile utility in that it gives a lot of flexibility for Very nifty tool that can handle most use-cases
data handling. but may fall short in some edge cases.

Could be used to create lots of content. Packages over 2GB in size may run into issues.
There is no limit to the amount of content that
could be uploaded to/created in AEM using
SlingPostServlet.

Good where content size is huge (>10 GB) Good method for relatively small to medium
content migration. Though could be used for
large content migrations as well by logically
segregating content into multiple packages
and ensuring each package size remains <2 GB.

Does not offer rollback feature in case something Offers rollback feature.
goes wrong.
This ensures that in case wrong content has
Overwritten content cannot be rolled back to been uploaded then via package uninstall
previous state. feature, AEM is restored to previous state.

This is a very powerful feature of packages in


AEM.

In addition,it is a great way of transferring


content across different AEM servers.

Could overwrite existing content though If not properly configured, can overwrite and/
content deletion does not happen (unless or delete existing data. If configured, allows
explicitly asked for). merging new data with any existing data.

This strategy is useful when doing phased


migration for a large website where entire
website content cannot be migrated in one go
and needs to be migrated section by section.

7
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

5. CONTENT MIGRATION VIA ETL TOOL


(TALEND)
Talend is an open source software vendor that features to its users and site authors.
provides data integration, data management,
Legacy CMS can provide an export of the
enterprise application integration and big data
source content pages in one unified XML file.
software and services.

When working with Talend Studio, you will often


Solution Overview
come across words such as repository, project, The solution was designed in Talend Open Studio
workspace, Job,component and item. Most which was the ETL tool of choice.
important of which is Job which can contain sub-
jobs. Talend jobs provide process orchestration The whole ETL process was segregated into
which defines the processes that would be multiple Talend jobs with each job having
executed in a predefined sequence. It translates specific responsibility. Some of these jobs are
business needs into code, routines and programs. explained in detail.
A subjob can further consist of modularized
A basic migration job in this case would look like
components that are used to perform a specific
the Fig 5.1.
data integration/transformation operation. For
instance, tfileInputXML is a component which is This job consists of four sub-jobs named:
used to read an Input XML file.
1. Pre-migration Cleanup
Below is a real life use-case for a client migrating
from a legacy CMS e.g. Drupal to Adobe 2. Extraction & Transformation

Experience Manager 6.0


3. Post Migration Cleanup

UseCase 4. Packaging

A client is using a legacy CMS Drupal to manage


its ever increasing digital content and would
like to migrate to AEM to provide enhanced

Fig 5.1 Soluton Overview

8
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

5.1 Pre-Migration Cleanup 5.2 Extraction & Transformation

A pre-migration cleanup job in this case would The transformation job in this case would look
look like Fig 5.2. This job reads the input content like Fig 5.3. This jobs reads the XMLs created in
(XML in this case) and breaks it into smaller the previous step one by one, transforms each
manageable XML files which each represent XML into AEM specific XML schema, places it
a unique page in the source system. This could under new site structure (jcr_root hierarchy on
be achieved by writing some custom code file system), and renames xml to “.content.xml”
OR by leveraging the Out of the Box Talend for an AEM page.
component such as tadvancedFileOutputXML.

The generated smaller XML files are kept in


filesystem. The next process step (transformation)
5.3 Post Migration Cleanup
would work with these XMLs to transform them This job is required if there are any post migration
to AEM compatible XML files. cleanups. Typical post migration cleanup jobs

In addition this job is used for :- could be :

Correcting character encoding issues Tags mapping

URL handling and re-writing issues. URL mapping

Fig 5.2 Pre Migration Clean Up

Fig 5.3 Extraction & Transformation

9
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

5.4 Packaging for us. Keep in mind that the package needs to
be AEM compatible i.e. it should contain jcr_root
This is the final step of migration which creates & META_INF folder and associated metadata
the archive for the migrated pages. It is a properties as per AEM packaging standard.
straight-forward component that does the job

Fig 5.4 Extraction & Transformation

6. CONCLUSION
Content Migration is an important activity in No matter which option you choose for
redevelopment of a website. Though we can migration, Manual or Automated, careful
automate the migration process, a little manual planning is a must when it comes to migrating
effort will be always required for signing off the the existing website.
entire migration.

7. ABOUT THE AUTHOR


Geetika Chhabra | AEM Consultant

Geetika is a technology enthusiast, a quick player and is also a part of infrastructure group
learner and has vast experience in CQ/AEM at TO THE NEW. In her idle time, you can find
development. She is an Oracle Certified Java her reading novels, playing table-tennis and
programmer (OCJP). She is a great team listening to music.

10
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM

8. ABOUT TO THE NEW


TO THE NEW is a digital technology company experts, video specialists and creative mavericks
that builds disruptive products and transforms who have transformed businesses of more than
businesses. We leverage the power of 300 companies spread across 30 countries
experience design, cutting-edge engineering, worldwide. We take pride in our culture which is
cloud and analytics led marketing to enable driven by passion for making an impact through
digital transformation. technology.

Our passionate team of 750+ people includes


passionate technologists, digital analytics

11
info@tothenew.com
www.tothenew.com

LETS CONNECT

Vous aimerez peut-être aussi