Académique Documents
Professionnel Documents
Culture Documents
GUIDE FOR
MIGRATING
CONTENT TO
AEM
www.tothenew.com
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
CONTENT
1. Introduction 3
4. ETL Cycle 5
4.1 Extraction 5
4.2 Transformation 6
4.3 Load 6
9
5.3 Post Migration Cleanup
10
5.4 Packaging
6. Conclusion 10
2
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
1. INTRODUCTION
Organizations, big or small, almost always have agreements and so on over to the new system.
to deal with upgrading their systems because of This moving of “old” data into a “new” system/
the fast evolving technology landscape. Data application requires migration.
produced using legacy systems is an important
While there are many use cases of content
asset for an organization which it would like to
migration, this ebook would focus on handling
migrate to the new system. For instance, any
migration to Adobe Experience Manager(AEM).
company would like to retain attendance
records of all employees, policy documents,
3
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
Javascript, JSP, Java etc.) over TO THE NEW the entire process. While initial development
system. Two broad ways to go about content effort would be high and may be complex
Manual Automated
Easy (“Lift” and “Shift” approach) Initial Difficulty (requires custom development,
and initial ramp up required to work with the
tool)
Tight control over migration process and Not all content can be easily migrated such as
cleaning up of outdated information residing in backend code, complex objects tied to legacy
existing system system etc
For plain content migration, no technical/ Requires some amount of technical/coding skills
coding skills required
Error prone due to manual intervention Once tested properly, should lead to much
(Links rewriting, tags mapping etc) lesser defects over large number of pages
4
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
4. ETL CYCLE
ETL tools extract data from heterogeneous or 4.1 Extraction
homogeneous resources, transform it to be
stored in similar format for analysis, and load the Content extraction from legacy system is one
data into target source. There are many such of the major and most important task. Usually
tools available in the market and Talend Open legacy systems would export data in some form
Studio is one such tool and is the major focus or the other. Some of the common ways for
area for this ebook. exporting content could be in flat file structures
5
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
such as text, XML, CSV etc. while others might (i) Using Sling Post Servlet
just expose their underlying storage (such as
a database) from which content could be AEM uses Apache Sling as the underlying web
extracted using API level integrations. application framework. Apache Sling provides
Sling Default Post Servlet implementation which
is a great way to modify or create content in a
JCR repository via simple HTTP requests.
4.2 Transformation
In this approach, source content sanitization
Content transformation is the step where source
and organizing content in new site structure
content is transformed into desired content
would typically be carried out beforehand.
structure to meet new system’s requirement.
This sanitized and logically structured content
Extracted content may not always be in desired could then be POSTed to SlingDefaultPostServlet
format and hence it becomes necessary to do which will create the necessary node structure
a pre-migration cleanup/sanitization of source in the underlying JCR repository to represent
content. Basic cleanup/sanitization includes : that content.
• Any links rewriting, tags mapping etc. (ii) Content Loader in CRX
• There might be a scenario where some of the AEM’s repository (CRX) provides Content loader
information need not to be moved TO THE NEW feature which is one of the ways to upload
system e.g. any outdated information, unused content to repository. Though it should be noted
assets etc. that this feature is deprecated in latest versions
of CRX.
• Special character handling and encoding are
common issues that one faces during migration (iii) Package Manager
which should be handled during transformation.
One of the most common ways to install/
upload content (including deployable code)
to an AEM repository is Package Manager
4.3 Load utility. It is an advanced tool for defining,
creating, and managing content packages.
This step simply refers to process of loading
Content packages can be uploaded to and
content on target (AEM) system.This is a very
downloaded from the repository, as well as
major and important step in the content
created and downloaded on-the-fly. Package
migration process because the method of
definition supports multiple node hierarchies
content load has a lot of bearing on the previous
and advanced content filtering for maximum
(Transformation) step.
flexibility.
Listed following are 3 ways to load content into
AEM.
6
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
A package is essentially a zip file holding Packages include content, both page-content
repository content in the form of a file-system and project-related content, selected using
serialization (called “vault” serialization). filters.
Could be used to create lots of content. Packages over 2GB in size may run into issues.
There is no limit to the amount of content that
could be uploaded to/created in AEM using
SlingPostServlet.
Good where content size is huge (>10 GB) Good method for relatively small to medium
content migration. Though could be used for
large content migrations as well by logically
segregating content into multiple packages
and ensuring each package size remains <2 GB.
Does not offer rollback feature in case something Offers rollback feature.
goes wrong.
This ensures that in case wrong content has
Overwritten content cannot be rolled back to been uploaded then via package uninstall
previous state. feature, AEM is restored to previous state.
Could overwrite existing content though If not properly configured, can overwrite and/
content deletion does not happen (unless or delete existing data. If configured, allows
explicitly asked for). merging new data with any existing data.
7
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
UseCase 4. Packaging
8
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
A pre-migration cleanup job in this case would The transformation job in this case would look
look like Fig 5.2. This job reads the input content like Fig 5.3. This jobs reads the XMLs created in
(XML in this case) and breaks it into smaller the previous step one by one, transforms each
manageable XML files which each represent XML into AEM specific XML schema, places it
a unique page in the source system. This could under new site structure (jcr_root hierarchy on
be achieved by writing some custom code file system), and renames xml to “.content.xml”
OR by leveraging the Out of the Box Talend for an AEM page.
component such as tadvancedFileOutputXML.
9
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
5.4 Packaging for us. Keep in mind that the package needs to
be AEM compatible i.e. it should contain jcr_root
This is the final step of migration which creates & META_INF folder and associated metadata
the archive for the migrated pages. It is a properties as per AEM packaging standard.
straight-forward component that does the job
6. CONCLUSION
Content Migration is an important activity in No matter which option you choose for
redevelopment of a website. Though we can migration, Manual or Automated, careful
automate the migration process, a little manual planning is a must when it comes to migrating
effort will be always required for signing off the the existing website.
entire migration.
Geetika is a technology enthusiast, a quick player and is also a part of infrastructure group
learner and has vast experience in CQ/AEM at TO THE NEW. In her idle time, you can find
development. She is an Oracle Certified Java her reading novels, playing table-tennis and
programmer (OCJP). She is a great team listening to music.
10
FUNDAMENTAL GUIDE FOR MIGRATING CONTENT TO AEM
11
info@tothenew.com
www.tothenew.com
LETS CONNECT