Tuesday 11 July 2017

AEM Workbench Best Practices - Exception Handling

AEM Workbench processes do not have a built-in exception management mechanism like normal java code (try..catch..throw), which often leads to developers building applications that don't
implement solid exception handling. At 8 BIT PLATOON we've developed workbench-based systems for many years and here's how we approach the problem:

Effectively all our Workbench processes have the following two output variables:

  • respCode: int (Default = -1)
  • respMsg: string[255] (Default = "")


Only under very specific circumstances do we ever create a process that does not include these two output variables. When doing code review, this is the first thing I look for.

If the process executes successfully, the respCode variable must be set to 0 and the respMsg variable must be set to some suitable message

For each potential error that can occur in the process:

  • the respCode must be set to a positive, non-zero integration (typically 1,2,3 etc, but these can also be unique numbers that correlate with a database of system exceptions)
  • the respMsg must be set to a suitable message that describes the error


Determining whether an error has occurred depends on the specific process, but typically errors come in two forms:

  • A built-in Workbench operation has thrown an exception, or
  • A sub-process has returned a non-zero respCode

I'll show each one individually with an example

Built-in Operation throws an exception
Here is a screenshot of a simple process that reads a file, from the file system. If the file is not found, the exception is caught and the respCode/respMsg is set accordingly:


Sub-process returned a non-zero respCode:
Below is a screenshot that shows how exceptions from sub-processes are caught. Notice the following about the process:

  • In addition to the standard respCode and respMsg variables, this process also has subRespCode and subRespMsg variable.
  • When calling any sub process, the respCode and respMsg from the sub process is mapped to the parent process's subRespCode and subRespMsg variables
  • The parent process-step has two branches, a "Success" branch, which has an expression and a default branch for all other cases
  • The Success branches check for the expression "subRespCode=0"
  • A subRespCode of -1 will indicate that the sub-process failed unexpectedly prior to setting the respCode - this is handled as an exception
  • A subRespCode of any positive integer inddcates that an unhandled exception has occurred in the sub process. 

By Following the approach you can manage exceptions in even the most complex Workbench processes.

Of course, we are slowly replacing Workbench processes at our clients with AEM Workflows, however, Workbench will remain for some time to come and it always makes sense to build robust systems.


Sunday 26 February 2017

Dumping HTTP Requests and Responses in WildFly 10

Requirement / Problem

During late nights (or early mornings) of debugging weird issues, you may need to see exactly what HTTP requests are passed into a Web Service.

Depending on your requirements you have two options available:

  • Option 1: Use the the Undertow Request Dumper to log the request and response headers
  • Option 2: Use a system property to log the full request and response messages

I'll show how to do both below:

Option 1: Undertow Request Dumper

While WildFly is shutdown, make the following change in standalone.xml:

 <subsystem xmlns="urn:jboss:domain:undertow:3.1">
    ...
    <server name="default-server">
       ...
       <host name="default-host" alias="localhost">
          ...
          <filter-ref name="request-dumper"/>
       </host>
    </server>
    ...
    <filters>
        ...
        <filter name="request-dumper" module="io.undertow.core" class-name="io.undertow.server.handlers.RequestDumpingHandler"/>
    </filters>
 </subsystem>

Now restart. WildFly should log the headers of your requests and responses nicely, as follows (This example was for a SOAP request for SOAPUI):

 08:31:17,302 INFO  [io.undertow.request.dump] (default task-1)
 ----------------------------REQUEST---------------------------
                URI=/rmoservice/SARBFormWebServiceImpl
  characterEncoding=null
      contentLength=825
        contentType=[multipart/related; type="text/xml"; start="<rootpart@soapui.org>"; boundary="----=_Part_2_1535191634.1488004277175"]
             header=Connection=Keep-Alive
             header=SOAPAction="http://tempuri.org/XXXXXX"
             header=Accept-Encoding=gzip,deflate
             header=Content-Type=multipart/related; type="text/xml"; start="<rootpart@soapui.org>"; boundary="----=_Part_2_1535191634.1488004277175"
             header=Content-Length=825
             header=MIME-Version=1.0
             header=User-Agent=Apache-HttpClient/4.1.1 (java 1.5)
             header=Host=localhost:8080
             locale=[]
             method=POST
           protocol=HTTP/1.1
        queryString=
         remoteAddr=/127.0.0.1:35646
         remoteHost=localhost
             scheme=http
               host=localhost:8080
         serverPort=8080
 --------------------------RESPONSE--------------------------
      contentLength=286
        contentType=text/xml;charset=UTF-8
             header=Connection=keep-alive
             header=X-Powered-By=Undertow/1
             header=Server=WildFly/10
             header=Content-Type=text/xml;charset=UTF-8
             header=Content-Length=286
             header=Date=Sat, 25 Feb 2017 06:31:17 GMT
             status=200
 ==============================================================

Option 2 HttpAdapter.dump

If the headers are not enough for you, because you need the entire message, you can also use the HttpAdapter.dump system property

Once again, open standalone.xml while WildFly is shut down. Add the following entry just after the <extensions> section:

 <system-properties>
    <property name="com.sun.xml.ws.transport.http.HttpAdapter.dump" value="true"/>
 </system-properties>

Restart WildFly

Now WildFly will log the entire inbound and outbound message to the log. Here's an example from my test service

 08:35:52,292 INFO  [org.apache.cxf.services.XXXX.XXXXX.XXXXX] (default task-2) Inbound Message
 ----------------------------
 ID: 2
 Address: http://localhost:8080/XXXX/XXXXX
 Encoding: UTF-8
 Http-Method: POST
 Content-Type: text/xml;charset=UTF-8
 Headers: {accept-encoding=[gzip,deflate], connection=[Keep-Alive], Content-Length=[384], content-type=[text/xml;charset=UTF-8], Host=[localhost:8080], SOAPAction=["http://tempuri.org/XXXX"], User-Agent=[Apache-HttpClient/4.1.1 (java 1.5)]}
 Payload: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tem="http://tempuri.org/">
    <soapenv:Header/>
    <soapenv:Body>
       <tem:XXXX>
          <tem:data>
             <tem:string>testing</tem:string>
          </tem:data>
       </tem:XXXX>
    </soapenv:Body>
 </soapenv:Envelope>
 --------------------------------------
 08:35:52,295 INFO  [org.apache.cxf.services.XXXX.XXXX.XXXX] (default task-2) Outbound Message
 ---------------------------
 ID: 2
 Response-Code: 200
 Encoding: UTF-8
 Content-Type: text/xml
 Headers: {}
 Payload: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><XXXXResponse xmlns="http://tempuri.org/"><XXXXResult>YYYYYYYYYYYYYYYYYY</XXXXResult></XXXXResponse></soap:Body>  </soap:Envelope>

Summary

This is super useful if you need to trace very precisely what is being passed to the server.

Happy Coding

Friday 20 May 2016

The AEM MicroKernel (extremely high altitude view)


The microkernel is the layer on which all other things are built. It determines the type of storage that AEM is going to use. To put it simply it is the persistence layer for AEM.


The Options


There are two choices available. Each of these has specific pro's and cons that must be considered when setting up an AEM environment.

TarMK

The TarMK option is the one that is deployed by default on all new AEM author instances, and publish instances will most likely always be deployed as TarMK. It is designed specifically for performance, but is not as strong in terms of scalability.

MongoMK

The primary reason for choosing the MongoMK persistence backend over TarMK is to scale the instances horizontally. 
This means having two or more active author instances running at all times and using MongoDB as the persistence storage system. 

The need to run more than one author instance results generally from the fact that the CPU and memory capacity of a single server, supporting all concurrent authoring activities, is no longer sustainable.

It is not recommended to deploy MongoMK for publish instances.


Moving from one to the other

It is possible to migrate from TarMK to MongoMK. 

Adobe recommends first deploying to TarMK and then moving to MongoMK, unless you foresee conditions that demand MongoMK within the first 18months after a deployment.

How to decide which MicroKernel fits your scenario best (for Author instances)

You will first need to know the following :
  • Number of named users connected in a day: in the thousands or more.
  • Number of concurrent users: in the hundreds or more.
  • Volume of asset ingestions per day: in hundreds of thousands or more.
  • Volume of page edits per day: in hundreds of thousands or more (including automated updates via Multi Site Manager or news feed ingestions for example).
  • Volume of searches per day: in tens of thousands or more.
A tool called Tough Day can be used to evaluate an existing deployment, in terms of the hardware configuration.

Once you have the above details, the decision process is as follows:
file

Sources: 

https://docs.adobe.com/docs/en/aem/6-0/deploy/recommended-deploys.html
https://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/data-store-config.html




Wednesday 20 April 2016

Setting up an AEM server from scratch - part 4

AEM Dispatchers

(Note: I'm not diving in too deep in the dispatcher, I'm merely explaining according to the AEM layout above. For more information, you can visit: https://docs.adobe.com/docs/en/dispatcher.html & https://docs.adobe.com/docs/en/dispatcher/disp-config.html)

The Dispatcher is Adobe Experience Manager's caching and/or load balancing tool. Using AEM's Dispatcher also helps to protect your AEM server from attack. Therefore, you can increase the security of your AEM instance by using the Dispatcher in conjunction with an enterprise-class web server.

Note: The most common use of the Dispatcher is to cache responses from an AEM publish instance, to increase the responsiveness and security of your externally facing published website. Most of the discussion focuses on this case.

But, the Dispatcher can also be used to increase the responsiveness of your author instance, particularly if you have a large number users editing and updating your website. For details specific to this case see Using a Dispatcher with an Author Server, below.

Dispatcher Farms:

The /farms property defines one or more sets of Dispatcher behaviors, where each set is associated with different web sites or URLs. The /farms property can include a single farm or multiple farms:


  • Use a single farm when you want Dispatcher to handle all of your web pages or web sites in the same way.
  • Create multiple farms when different areas of your web site or different web sites require different Dispatcher behavior.


The /farms property is a top-level property in the configuration structure. To define a farm, add a child property to the /farms property. Use a property name that uniquely identifies the farm within the Dispatcher instance.

The /farmname property is multi-valued, and contains other properties that define Dispatcher behavior:


  • The URLs of the pages that the farm applies to.
  • One or more service URLs (typically of AEM publish instances) to use for rendering documents.
  • The statistics to use for load-balancing multiple document renderers.
  • Several other behaviors, such as which files to cache and where.


The value can include any alphanumeric (a-z, 0-9) character. The following example shows the skeleton definition for two farms named /daycom and /docsdaycom:

#name of dispatcher
/name "day sites"
#farms section defines a list of farms or sites
/farms
{
   /daycom
   {
       ...
   }
   /docdaycom
   {
      ...
   }
}

You can include other files that contribute to the configuration: (The config we are using for the main setup)


  • If your configuration file is large you can split it into several smaller files (that are easier to manage) then include these. 
  • To include files that are generated automatically.


For example, to include the file myFarm.any in the /farms configuration use the following code:
#farms section defines a list of farms or sites
/farms
{
   #myFarm.any
}

Use the asterisk ("*") as a wildcard to specify a range of files to include.
For example, if the files farm_1.any through to farm_5.any contain the configuration of farms one to five, you can include them as follows:

#farms section defines a list of farms or sites
/farms
{
   #farm_*.any
}

After you have setup your /farms, let us configure the dispatcher (.any) files:
(Note: Again, I'm showing a typical setup, if you need more info, please visit: https://docs.adobe.com/docs/en/dispatcher/disp-config.html )

A sample of a .any file setup.

/myFarm
    {  
    /clientheaders
      {
      "*"
      }
    /virtualhosts
      {
      "yourdomain.com" 
     }
    /renders
      {
      /rend01
        {
        /hostname "YOUR.IP.AD.DRESS"
        /port "4503"
        # /timeout "0"
        }
      }
    /filter
      {
      /0001 { /type "deny" /glob "*" }
      /0002 { /type "allow" /method "GET" /url "*" /query "page=*"} # enable query strings
      /0022 { /type "allow" /url "/bin/*"  }
      /0023 { /type "allow" /url "/content*" }  # disable this rule to allow mapped content only
      /0026 { /type "allow" /url "/home/*"  }
      /0030 { /type "allow" /glob "* /is/image*"   }
      /0041 { /type "allow" /url "*.css"  }  # enable css
      /0042 { /type "allow" /url "*.gif"  }  # enable gifs
      /0043 { /type "allow" /url "*.ico"  }  # enable icos
      /0044 { /type "allow" /url "*.js"   }  # enable javascript
      /0045 { /type "allow" /url "*.png"  }  # enable png
      /0046 { /type "allow" /url "*.swf"  }  # enable flash
      /0047 { /type "allow" /url "*.jpg"  }  # enable jpg
      /0048 { /type "allow" /url "*.jpeg" }  # enable jpeg
      /0049 { /type "allow" /url "*.svg"  }  # enable svg
      /0062 { /type "allow" /url "/libs/cq/personalization/*"  }  # enable personalization
      /0081 { /type "allow"  /url "*.json" }
      /0083 { /type "deny"  /url "*.sysview.xml"   }
      /0085 { /type "deny"  /url "*.docview.xml"  }
      /0089 { /type "deny"  /url "*.feed.xml"  }
      /0091 { /type "allow"  /glob "GET / *" }
      /0092 { /type "allow"  /glob "GET /index.html *" }
      /0093 { /type "allow"  /glob "GET /geohome.html *" }
      /0094 { /type "allow"  /glob "GET /*.html *" }
      }
    /cache
      {
      /docroot "/var/www/html/yourdomain/content/yoursitename/en"
      /statfile  "/var/www/html/yoursitename/.stat"
      /statfileslevel "3"
      /allowAuthorized "1"
      /serveStaleOnError "0"
      /rules
        {
        /0000
          {
          /glob "*"
          /type "true"
          }
        }
      /invalidate
        {
        /0000
          {
          /glob "*"
          /type "deny"
          }
        /0001
          {
          # Consider all HTML files stale after an activation.
          /glob "*.html"
          /type "allow"
          }
        /0002
          {
          /glob "/etc/segmentation.segment.js"
          /type "allow"
          }
        /0003
          {
          /glob "*/analytics.sitecatalyst.js"
          /type "allow"
          }
        }
      /allowedClients
        {
        }
/ignoreUrlParams
{
  /0001 { /type "deny" /glob "*" }
}
      }
    /statistics
      {
      /categories
        {
        /html
          {
          /glob "*.html"
          }
        /others
          {
          /glob "*"
          }
        }
      }
    }

Once you have this configured, we need to configure the httpd-vhosts file: (The Apache side)

#NameVirtualHost *:80
#NameVirtualHost *:443

##############################################################################
#YOURSITENAME CONFIGURATION
<VirtualHost *:80>
    ServerAdmin webmaster@yoursite.com
    DocumentRoot "/var/www/html/yoursitename"
    ServerName yourdomainname.com 
    ErrorLog "logs/customname.log"
    CustomLog "logs/customname-access.log" common
  RewriteEngine On

  #Redirect error 404
  ErrorDocument 404 /404.html
  
  RewriteRule ^/$ /content/yoursitename/en.html [PT,L]
  RewriteCond %{REQUEST_URI} !^/apps
  RewriteCond %{REQUEST_URI} !^/bin
  RewriteCond %{REQUEST_URI} !^/content
  RewriteCond %{REQUEST_URI} !^/etc
  RewriteCond %{REQUEST_URI} !^/home
  RewriteCond %{REQUEST_URI} !^/libs
  RewriteCond %{REQUEST_URI} !^/tmp
  RewriteCond %{REQUEST_URI} !^/var
  RewriteRule ^/(.*)$ /content/yoursitename/en/$1 [PT,L]
  #Prevent cross - domain access
  RewriteCond %{REQUEST_URI} ^/content
  RewriteCond %{REQUEST_URI} !^/content/campaigns
  RewriteCond %{REQUEST_URI} !^/content/dam
  RewriteRule !^/content/yoursitename/en - [R=404,L,NC] 
<Directory /var/www/html/yoursitename>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
    Options FollowSymLinks
     AllowOverride None
   
   </Directory>
</VirtualHost>
##############################################################################

Some more information on Apache: 

Apache mod_rewrite

After defining mappings (and probably adding an appropriate domain to the hosts file) we can enjoy our multi domain CQ installation with short links. There is only one problem: a dispatcher. If we use some standard dispatcher configuration, there will be one cache directory for all sites. If the user requests the page geometrixx.com/products.html, a dispatcher will create the file /products.html in the cache dir. Now, if some other user requests the page geometrixx.de/products.html, a dispatcher will find its cached English version and will serve it to the German user. In order to avoid such problems we should reflect the JCR directory structure in a dispatcher. The easiest way to expand shortened paths is to use the Apache rewrite engine. Basically, we will try to simulate the Sling resolving mechanism. The following rules will do the job:
00   RewriteEngine On  
01   RewriteRule ^/$ /content/geometrixx/en.html [PT,L]
          02   RewriteCond %{REQUEST_URI} !^/apps
03   RewriteCond %{REQUEST_URI} !^/bin 
04   RewriteCond %{REQUEST_URI} !^/content 
05   RewriteCond %{REQUEST_URI} !^/etc 
06   RewriteCond %{REQUEST_URI} !^/home 
07   RewriteCond %{REQUEST_URI} !^/libs 
08   RewriteCond %{REQUEST_URI} !^/tmp 
09   RewriteCond %{REQUEST_URI} !^/var 
10   RewriteRule ^/(.*)$ /content/geometrixx/en/$1 [PT,L]
At the beginning (1) we check if the entered URL contains an empty path (e.g. http://geometrixx.com/). If so, the user will be forwarded to the homepage. Otherwise, we check if the entered path is shortened (it does not begin with apps, content,  home, etc. - lines 2-8). If it is, the rewrite engine will add /content/geometrixx/en while creating the absolute path (9).

Apache VirtualHost
As you can see, this rule is valid only for the geometrixx.com domain, so we need similar rules for each domain and some mechanism for recognizing a current domain. Such a mechanism in Apache is called VirtualHost. A sample configuration file of the Apache2 VirtualHost looks as follows:

<VirtualHost *:80>
    ServerAdmin webmaster@localhost
    ServerName geometrixx.com

    DocumentRoot /opt/cq/dispatcher/publish
    <Directory /opt/cq/dispatcher/publish>
        Options FollowSymLinks
        AllowOverride None
    </Directory>

    <IfModule disp_apache2.c>
        SetHandler dispatcher-handler
    </IfModule>

[... above rewrite rules ...]

    LogLevel warn
    CustomLog ${APACHE_LOG_DIR}/access-geo-en.log combined
    ErrorLog ${APACHE_LOG_DIR}/error-geo-en.log
</VirtualHost>

All VirtualHosts can use a shared dispatcher directory. Create similar files for each domain.

Cross-domain injection threat
Because users are able to enter a full content path after a given domain name, e.g. geometrixx.com/content/geometrixx/en/products.html, they may as well get a page that belongs to some other domain, e.g. geometrixx.com/content/geometrixx/fr/products.html. In order to avoid such a situation, we need to check all requests for path beginning with /content and reject these which are not related to any campaign, DAM or a current domain:

RewriteCond %{REQUEST_URI} ^/content
RewriteCond %{REQUEST_URI} !^/content/campaigns
RewriteCond %{REQUEST_URI} !^/content/dam
RewriteRule !^/content/geometrixx/en - [R=404,L,NC]

Macros
Our rewrite configuration has become quite complicated and (what is worse) has to be included in each Apache VirtualHost configuration. Fortunately, we can avoid repetitions using the Apache macro module. Add the following expand-cq-paths file to your conf.d directory:

<Macro ExpandCqPaths $path>
        RewriteEngine On

        RewriteRule ^/$ $path.html [PT,L]

        RewriteCond %{REQUEST_URI} ^/content
        RewriteCond %{REQUEST_URI} !^/content/campaigns
        RewriteCond %{REQUEST_URI} !^/content/dam
        RewriteRule !^$path - [R=404,L,NC]

        RewriteCond %{REQUEST_URI} !^/apps
        RewriteCond %{REQUEST_URI} !^/content
        RewriteCond %{REQUEST_URI} !^/etc
        RewriteCond %{REQUEST_URI} !^/home
        RewriteCond %{REQUEST_URI} !^/libs
        RewriteCond %{REQUEST_URI} !^/tmp
        RewriteCond %{REQUEST_URI} !^/var
        RewriteRule ^/(.*)$ $path/$1 [PT,L]
</Macro>

After that you can include a macro in each VirtualHost with the Use directive:
Use ExpandCqPaths /content/geometrixx/en

Because the Macro module is an external Apache2 library, you might need to install it separately. On Debian you can install and enable it using two commands:

# apt-get install libapache2-mod-macro
# a2enmod macro

If you use any other Linux distribution or Windows, please find the appropriate version of the module and the installation instruction on the mod_macro homepage.

Source: https://www.cognifide.com/our-blogs/cq/multidomain-cq-mappings-and-apache-configuration/

Sling Mappings

I will illustrate the setup of sling mappings according to the first diagram and with assumption we have multiple sites on this setup.
For more information, go here: http://aem.matelli.org/url-mapping-and-deep-linking/

Sling Mappings (Resource Mapping) is used to define redirects, vanity URLs and virtual hosts for AEM.

For example, you can use these mappings to:
Prefix all requests with /content so that the internal structure is hidden from the visitors to your website.

One possible HTTP mapping prefixes all requests to localhost:4503 with /content. A mapping like this could be used to hide the internal structure from the visitors to the website as it allows: localhost:4503/content/geometrixx/en/products.html
to be accessed using: localhost:4503/geometrixx/en/products.html as the mapping will automatically add the prefix /content to /geometrixx/en/products.html.

CREATING MAPPING DEFINITIONS IN AEM

In a standard installation of AEM you can find the folder:

/etc/map/http

This is the structure used when defining mappings for the HTTP protocol. Other folders (sling:Folder) can be created under /etc/map for any other protocols that you want to map.
Configuring an Internal Redirect to /content

To create the mapping that prefixes any request to http://localhost:4503/ with /content:

  • Using CRXDE navigate to /etc/map/http.
  • Create a new node:

  • Type sling:Mapping
  • This node type is intended for such mappings, though its use is not mandatory.

  • Name localhost_any
  • Click Save All.

  • Add the following properties to this node:
    • Name sling:match 
    • Type String 
    • Value localhost.4503/
    • Name sling:internalRedirect 
    • Type String 
    • Value /content/

  • Click Save All.

This will handle a request such as:
         localhost:4503/geometrixx/en/products.html
as if:
        localhost:4503/content/geometrixx/en/products.html
had been requested.

Here are some screenshots to illustrate:






For more info on this, you can also check out: https://docs.adobe.com/docs/en/aem/6-1/deploy/configuring/resource-mapping.html

Final word:

There are a few ways to skin an AEM environment setup..it all depends on the client's needs and infrastructure. You as the architect should know the framework (AEM) well, and according to your knowledge, apply what will be best for the client.

In this post, the setup is based on a real world scenario.  Multiple websites are run from this environment, and in another post I will go into the MSM (Multi Site Manager) bit of AEM and how that can be an advantage to have in your company. I will also point out the pro's and con's of the Translator service in AEM.

Till then! Happy Coding folks!

*JumpHost - a server that one connect to access other servers with a program like mRemoteG. This is for security. 

Wednesday 13 April 2016

Setting up an AEM server from scratch - part 3

Configuring a Dispatcher Flush agent

Default agents are included with the installation. However, certain configuration is still needed and the same applies if you are defining a new agent:

  • Open the Tools tab in AEM.
  • Select Replication, then Agents on publish in the left panel.
  • Double-click on the Dispatcher Flush item to open the overview.
  • Click Edit - the Agent Settings dialog will open:
  • In the Settings tab:


    • Activate Enabled.
    • Enter a Description.
    • Leave the Serialization Type as Dispatcher Flush, or set it as such if creating a new agent.

  • In the Transport tab, Enter the required URI for the new publish instance; for example:

  • http://localhost:80/dispatcher/invalidate.cache.
  • Enter the site-specific user account used for replication.
  • You can configure other parameters as required.

For Dispatcher Flush agents, the URI property is used only if you use path-based virtual host entries to differentiate between farms, you use this field to target the  farm to invalidate. For example, farm #1 has a virtual host of www.mysite.com/path1/* and farm #2 has a virtual host of www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to target the first farm and /path2/invalidate.cache to target the second farm.

Return to the Tools tab, from here you can Activate the Dispatcher Flush agent (Agents on publish).

The Dispatcher Flush replication agent is not active on author. You can access the same page in the publish environment by using the equivalent URI; for example, http://localhost:4503/etc/replication/agents.publish/flush.html.

And for the final bit on Replication: Monitoring!


  • Access the Tools tab in AEM.
  • Click Replication.
  • Double-click the link to agents for the appropriate environment (either the left or the right pane); for example Agents on author.
  • The resulting window shows an overview of all your replication agents for the author environment, including their target and status.
  • Click the appropriate agent name (which is a link) to show detailed information on that agent:
  • Here you can:

  • See whether the agent is enabled.
  • See the target of any replications.
  • See whether the replication queue is currently active (enabled).
  • See whether there are any items in the queue.
  • Refresh or Clear to update the display of queue entries; this helps you see items enter and leave the queue.
  • View Log to access the log of any actions by the replication agent.
  • Test Connection to the target instance.
  • Force Retry on any queue items if required.



CAUTION
Do not use the "Test Connection" link for the Reverse Replication Outbox on a publish instance.
If a replication test is performed for an Outbox queue, any items that are older than the test replication will be re-processed with every reverse replication.
If such items already exist in a queue, they can be found with the following XPath JCR query and should be removed. /jcr:root/var/replication/outbox//*[@cq:repActionType='TEST']


Source and for more info, please review: 
https://docs.adobe.com/docs/en/aem/6-2/deploy/configuring/replication.html

Ok, its time for some coffee.. when you are back, we will then setup the sling mappings on the publish server and also setup the Dispatcher.




See you in Part 4 :)

Sunday 13 March 2016

Setting up an AEM Server from scratch - Part 2

Setting up of the Replication Agents

After you have setup the Author and Publish instances, you need to setup the Replication Agents.

Replication agents are central to Adobe Experience Manager (AEM) as the mechanism used to:

  • Publish (activate) content from an author to a publish environment.
  • Explicitly flush content from the Dispatcher cache.
  • Return user input (for example, form input) from the publish environment to the author environment (under control of the author environment).
  • Requests are queued to the appropriate agent for processing.

Replicating from Author to Publish

Replication, to a publish instance or dispatcher, takes place in several steps:

  • the author requests that certain content be published (activated); this can be initiated by a manual request, or by automatic triggers which have been preconfigured.
  • the request is passed to the appropriate default replication agent; an environment can have several default agents which will always be selected for such actions.
  • the replication agent "packages" the content and places it in the replication queue.
  • in the Websites tab the colored status indicator is set for the individual pages.
  • the content is lifted from the queue and transported to the publish environment using the configured protocol; usually this is HTTP.
  • a servlet in the publish environment receives the request and publishes the received content; the default servlet is http://localhost:4503/bin/receive?sling:authRequestLogin=1.
  • multiple author and publish environments can be configured.



Replicating from Publish to Author

Some features, such as commenting, allow users to enter data on a publish instance.

In some cases, a type of replication known as reverse replication, is needed to return this data to the author environment from where it is redistributed to other publish environments.  Due to security considerations, any traffic from the publish to the author environment must be strictly controlled.

Reverse replication uses an agent in the publish environment which references the author environment. This agent places the data into an outbox. This outbox is matched with replication listeners in the author environment. The listeners poll the outboxes to collect any data entered and then distribute it as necessary. This ensures that the author environment controls all traffic.

In other cases, such as for Communities features (for example, forums, blogs, comments, and reviews), the amount of user generated content (UGC) being entered in the publish environment is difficult to efficiently synchronize across AEM instances using replication.

Now, for the fun part, let us do the actual setting up for the agents:

  • Go to the following section on the author site: Tools.
  • On the left menu, find the Replication section and click on the word Replication.
  • You will see in the main pane all the replication agents that are currently set up.
  • Click on ‘New’ and then on ‘New Page’.
  • A modal will popup – it will look like the modal from the new page creation section.
  • Fill in the Title Field of the Replication agent and also the Name Field of the new replication agent.
  • Select the ‘Replication Agent’ selection below the name field and click on create.
  • You have now created a new replication agent, but you still need to set up the properties of the agent.
  • Double click on the agent and it will open up the properties section of the Replication agent.
  • Find the “Edit” button and click on that.

  • There will be a popup modal and you need to complete some of the fields. List below:

  • Name: This will be pre-populated, you can change this, but no need.
  • Description: Give your agent a proper description. (Whatever rock your boat)
  • Enabled: Tick the box, to enable the replication agent.
  • Serialization Type: From the dropdown, select ‘Default’.
  • Retry Delay: Make this 6000 (that is 6 seconds).
  • Log Level: From the dropdown, select Info.
  • On the next tab: Transportation, complete the following:
  • URI: must be the publish server’s and then: http://0.0.0.0:4505/bin/receive?sling:authRequestLogin=1
  • User: The user must be setup on the Author and Publish and have the correct credentials to activate assets.
  • Password: The User’s password.
  • Once this is all completed, click on OK.

  • You should see that the agent will now be active and you will be able to publish pages to the publish server. You can do a test to see if it works – but first you need to setup the sling mappings on the publish servers.
  • To test your connections, you can click on the link: Test Connection, this will open a new tab in your browser with some test results. If it is setup correctly, you will get a message saying: ‘Replication test succeeded.’
  • If you don’t get that message, the system will tell you where the issue is via a system message.


Below are a few images to illustrate the above explanation:

In the Tools Section, go to the Replication -> Agents on Author



Click on New -> New Page...


Choose a suitable name for you Agent (007 is not recommended!)





Once it is created, find you agent in the list on the right, and double click on it, you will see the same view as below.



We do need to configure the agent now. Click on the 'Edit' in the Settings bar.




After you have configured the agent, you need to test it and see if it does in fact connect to the publish server. Click on the Test Connection link to test.



Once you get the above image and the message ' Replication test succeeded', then you are on the money. If not, it will usually tell you in an error message where is the error. You can also review the logs.

Reverse replication from Publish to Author

Reverse replication is used to get user content generated on a publish instance back to an author instance. This is commonly used for features such as surveys and registration forms.

For security reasons, most network topologies do not allow connections from the "Demilitarized Zone" (DMZ) (a subnetwork that exposes the external services to an untrusted network such as the Internet).

As the publish environment is usually in the DMZ, to get content back to the author environment the connection must be initiated from the author instance. This is done with:
  • an outbox in the publish environment where the content is placed.
  • an agent (publish) in the author environment which periodically polls the outbox for new content.
A reverse replication agent in the author environment
This acts as the active component to collect information from the outbox in the publish environment:
If you want to use reverse replication then ensure that this agent is activated.



A reverse replication agent in the publish environment (an outbox)
This is the passive element as it acts as an "outbox". User input is placed here, from where it is collected by the agent in the author environment.



Ok, its time for some coffee.. when you are back, we will then setup the sling mappings on the publish server and also setup the Dispatcher Flush Agents.


See you in Part 3 :)