Wednesday 20 April 2016

Setting up an AEM server from scratch - part 4

AEM Dispatchers

(Note: I'm not diving in too deep in the dispatcher, I'm merely explaining according to the AEM layout above. For more information, you can visit: https://docs.adobe.com/docs/en/dispatcher.html & https://docs.adobe.com/docs/en/dispatcher/disp-config.html)

The Dispatcher is Adobe Experience Manager's caching and/or load balancing tool. Using AEM's Dispatcher also helps to protect your AEM server from attack. Therefore, you can increase the security of your AEM instance by using the Dispatcher in conjunction with an enterprise-class web server.

Note: The most common use of the Dispatcher is to cache responses from an AEM publish instance, to increase the responsiveness and security of your externally facing published website. Most of the discussion focuses on this case.

But, the Dispatcher can also be used to increase the responsiveness of your author instance, particularly if you have a large number users editing and updating your website. For details specific to this case see Using a Dispatcher with an Author Server, below.

Dispatcher Farms:

The /farms property defines one or more sets of Dispatcher behaviors, where each set is associated with different web sites or URLs. The /farms property can include a single farm or multiple farms:


  • Use a single farm when you want Dispatcher to handle all of your web pages or web sites in the same way.
  • Create multiple farms when different areas of your web site or different web sites require different Dispatcher behavior.


The /farms property is a top-level property in the configuration structure. To define a farm, add a child property to the /farms property. Use a property name that uniquely identifies the farm within the Dispatcher instance.

The /farmname property is multi-valued, and contains other properties that define Dispatcher behavior:


  • The URLs of the pages that the farm applies to.
  • One or more service URLs (typically of AEM publish instances) to use for rendering documents.
  • The statistics to use for load-balancing multiple document renderers.
  • Several other behaviors, such as which files to cache and where.


The value can include any alphanumeric (a-z, 0-9) character. The following example shows the skeleton definition for two farms named /daycom and /docsdaycom:

#name of dispatcher
/name "day sites"
#farms section defines a list of farms or sites
/farms
{
   /daycom
   {
       ...
   }
   /docdaycom
   {
      ...
   }
}

You can include other files that contribute to the configuration: (The config we are using for the main setup)


  • If your configuration file is large you can split it into several smaller files (that are easier to manage) then include these. 
  • To include files that are generated automatically.


For example, to include the file myFarm.any in the /farms configuration use the following code:
#farms section defines a list of farms or sites
/farms
{
   #myFarm.any
}

Use the asterisk ("*") as a wildcard to specify a range of files to include.
For example, if the files farm_1.any through to farm_5.any contain the configuration of farms one to five, you can include them as follows:

#farms section defines a list of farms or sites
/farms
{
   #farm_*.any
}

After you have setup your /farms, let us configure the dispatcher (.any) files:
(Note: Again, I'm showing a typical setup, if you need more info, please visit: https://docs.adobe.com/docs/en/dispatcher/disp-config.html )

A sample of a .any file setup.

/myFarm
    {  
    /clientheaders
      {
      "*"
      }
    /virtualhosts
      {
      "yourdomain.com" 
     }
    /renders
      {
      /rend01
        {
        /hostname "YOUR.IP.AD.DRESS"
        /port "4503"
        # /timeout "0"
        }
      }
    /filter
      {
      /0001 { /type "deny" /glob "*" }
      /0002 { /type "allow" /method "GET" /url "*" /query "page=*"} # enable query strings
      /0022 { /type "allow" /url "/bin/*"  }
      /0023 { /type "allow" /url "/content*" }  # disable this rule to allow mapped content only
      /0026 { /type "allow" /url "/home/*"  }
      /0030 { /type "allow" /glob "* /is/image*"   }
      /0041 { /type "allow" /url "*.css"  }  # enable css
      /0042 { /type "allow" /url "*.gif"  }  # enable gifs
      /0043 { /type "allow" /url "*.ico"  }  # enable icos
      /0044 { /type "allow" /url "*.js"   }  # enable javascript
      /0045 { /type "allow" /url "*.png"  }  # enable png
      /0046 { /type "allow" /url "*.swf"  }  # enable flash
      /0047 { /type "allow" /url "*.jpg"  }  # enable jpg
      /0048 { /type "allow" /url "*.jpeg" }  # enable jpeg
      /0049 { /type "allow" /url "*.svg"  }  # enable svg
      /0062 { /type "allow" /url "/libs/cq/personalization/*"  }  # enable personalization
      /0081 { /type "allow"  /url "*.json" }
      /0083 { /type "deny"  /url "*.sysview.xml"   }
      /0085 { /type "deny"  /url "*.docview.xml"  }
      /0089 { /type "deny"  /url "*.feed.xml"  }
      /0091 { /type "allow"  /glob "GET / *" }
      /0092 { /type "allow"  /glob "GET /index.html *" }
      /0093 { /type "allow"  /glob "GET /geohome.html *" }
      /0094 { /type "allow"  /glob "GET /*.html *" }
      }
    /cache
      {
      /docroot "/var/www/html/yourdomain/content/yoursitename/en"
      /statfile  "/var/www/html/yoursitename/.stat"
      /statfileslevel "3"
      /allowAuthorized "1"
      /serveStaleOnError "0"
      /rules
        {
        /0000
          {
          /glob "*"
          /type "true"
          }
        }
      /invalidate
        {
        /0000
          {
          /glob "*"
          /type "deny"
          }
        /0001
          {
          # Consider all HTML files stale after an activation.
          /glob "*.html"
          /type "allow"
          }
        /0002
          {
          /glob "/etc/segmentation.segment.js"
          /type "allow"
          }
        /0003
          {
          /glob "*/analytics.sitecatalyst.js"
          /type "allow"
          }
        }
      /allowedClients
        {
        }
/ignoreUrlParams
{
  /0001 { /type "deny" /glob "*" }
}
      }
    /statistics
      {
      /categories
        {
        /html
          {
          /glob "*.html"
          }
        /others
          {
          /glob "*"
          }
        }
      }
    }

Once you have this configured, we need to configure the httpd-vhosts file: (The Apache side)

#NameVirtualHost *:80
#NameVirtualHost *:443

##############################################################################
#YOURSITENAME CONFIGURATION
<VirtualHost *:80>
    ServerAdmin webmaster@yoursite.com
    DocumentRoot "/var/www/html/yoursitename"
    ServerName yourdomainname.com 
    ErrorLog "logs/customname.log"
    CustomLog "logs/customname-access.log" common
  RewriteEngine On

  #Redirect error 404
  ErrorDocument 404 /404.html
  
  RewriteRule ^/$ /content/yoursitename/en.html [PT,L]
  RewriteCond %{REQUEST_URI} !^/apps
  RewriteCond %{REQUEST_URI} !^/bin
  RewriteCond %{REQUEST_URI} !^/content
  RewriteCond %{REQUEST_URI} !^/etc
  RewriteCond %{REQUEST_URI} !^/home
  RewriteCond %{REQUEST_URI} !^/libs
  RewriteCond %{REQUEST_URI} !^/tmp
  RewriteCond %{REQUEST_URI} !^/var
  RewriteRule ^/(.*)$ /content/yoursitename/en/$1 [PT,L]
  #Prevent cross - domain access
  RewriteCond %{REQUEST_URI} ^/content
  RewriteCond %{REQUEST_URI} !^/content/campaigns
  RewriteCond %{REQUEST_URI} !^/content/dam
  RewriteRule !^/content/yoursitename/en - [R=404,L,NC] 
<Directory /var/www/html/yoursitename>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
    Options FollowSymLinks
     AllowOverride None
   
   </Directory>
</VirtualHost>
##############################################################################

Some more information on Apache: 

Apache mod_rewrite

After defining mappings (and probably adding an appropriate domain to the hosts file) we can enjoy our multi domain CQ installation with short links. There is only one problem: a dispatcher. If we use some standard dispatcher configuration, there will be one cache directory for all sites. If the user requests the page geometrixx.com/products.html, a dispatcher will create the file /products.html in the cache dir. Now, if some other user requests the page geometrixx.de/products.html, a dispatcher will find its cached English version and will serve it to the German user. In order to avoid such problems we should reflect the JCR directory structure in a dispatcher. The easiest way to expand shortened paths is to use the Apache rewrite engine. Basically, we will try to simulate the Sling resolving mechanism. The following rules will do the job:
00   RewriteEngine On  
01   RewriteRule ^/$ /content/geometrixx/en.html [PT,L]
          02   RewriteCond %{REQUEST_URI} !^/apps
03   RewriteCond %{REQUEST_URI} !^/bin 
04   RewriteCond %{REQUEST_URI} !^/content 
05   RewriteCond %{REQUEST_URI} !^/etc 
06   RewriteCond %{REQUEST_URI} !^/home 
07   RewriteCond %{REQUEST_URI} !^/libs 
08   RewriteCond %{REQUEST_URI} !^/tmp 
09   RewriteCond %{REQUEST_URI} !^/var 
10   RewriteRule ^/(.*)$ /content/geometrixx/en/$1 [PT,L]
At the beginning (1) we check if the entered URL contains an empty path (e.g. http://geometrixx.com/). If so, the user will be forwarded to the homepage. Otherwise, we check if the entered path is shortened (it does not begin with apps, content,  home, etc. - lines 2-8). If it is, the rewrite engine will add /content/geometrixx/en while creating the absolute path (9).

Apache VirtualHost
As you can see, this rule is valid only for the geometrixx.com domain, so we need similar rules for each domain and some mechanism for recognizing a current domain. Such a mechanism in Apache is called VirtualHost. A sample configuration file of the Apache2 VirtualHost looks as follows:

<VirtualHost *:80>
    ServerAdmin webmaster@localhost
    ServerName geometrixx.com

    DocumentRoot /opt/cq/dispatcher/publish
    <Directory /opt/cq/dispatcher/publish>
        Options FollowSymLinks
        AllowOverride None
    </Directory>

    <IfModule disp_apache2.c>
        SetHandler dispatcher-handler
    </IfModule>

[... above rewrite rules ...]

    LogLevel warn
    CustomLog ${APACHE_LOG_DIR}/access-geo-en.log combined
    ErrorLog ${APACHE_LOG_DIR}/error-geo-en.log
</VirtualHost>

All VirtualHosts can use a shared dispatcher directory. Create similar files for each domain.

Cross-domain injection threat
Because users are able to enter a full content path after a given domain name, e.g. geometrixx.com/content/geometrixx/en/products.html, they may as well get a page that belongs to some other domain, e.g. geometrixx.com/content/geometrixx/fr/products.html. In order to avoid such a situation, we need to check all requests for path beginning with /content and reject these which are not related to any campaign, DAM or a current domain:

RewriteCond %{REQUEST_URI} ^/content
RewriteCond %{REQUEST_URI} !^/content/campaigns
RewriteCond %{REQUEST_URI} !^/content/dam
RewriteRule !^/content/geometrixx/en - [R=404,L,NC]

Macros
Our rewrite configuration has become quite complicated and (what is worse) has to be included in each Apache VirtualHost configuration. Fortunately, we can avoid repetitions using the Apache macro module. Add the following expand-cq-paths file to your conf.d directory:

<Macro ExpandCqPaths $path>
        RewriteEngine On

        RewriteRule ^/$ $path.html [PT,L]

        RewriteCond %{REQUEST_URI} ^/content
        RewriteCond %{REQUEST_URI} !^/content/campaigns
        RewriteCond %{REQUEST_URI} !^/content/dam
        RewriteRule !^$path - [R=404,L,NC]

        RewriteCond %{REQUEST_URI} !^/apps
        RewriteCond %{REQUEST_URI} !^/content
        RewriteCond %{REQUEST_URI} !^/etc
        RewriteCond %{REQUEST_URI} !^/home
        RewriteCond %{REQUEST_URI} !^/libs
        RewriteCond %{REQUEST_URI} !^/tmp
        RewriteCond %{REQUEST_URI} !^/var
        RewriteRule ^/(.*)$ $path/$1 [PT,L]
</Macro>

After that you can include a macro in each VirtualHost with the Use directive:
Use ExpandCqPaths /content/geometrixx/en

Because the Macro module is an external Apache2 library, you might need to install it separately. On Debian you can install and enable it using two commands:

# apt-get install libapache2-mod-macro
# a2enmod macro

If you use any other Linux distribution or Windows, please find the appropriate version of the module and the installation instruction on the mod_macro homepage.

Source: https://www.cognifide.com/our-blogs/cq/multidomain-cq-mappings-and-apache-configuration/

Sling Mappings

I will illustrate the setup of sling mappings according to the first diagram and with assumption we have multiple sites on this setup.
For more information, go here: http://aem.matelli.org/url-mapping-and-deep-linking/

Sling Mappings (Resource Mapping) is used to define redirects, vanity URLs and virtual hosts for AEM.

For example, you can use these mappings to:
Prefix all requests with /content so that the internal structure is hidden from the visitors to your website.

One possible HTTP mapping prefixes all requests to localhost:4503 with /content. A mapping like this could be used to hide the internal structure from the visitors to the website as it allows: localhost:4503/content/geometrixx/en/products.html
to be accessed using: localhost:4503/geometrixx/en/products.html as the mapping will automatically add the prefix /content to /geometrixx/en/products.html.

CREATING MAPPING DEFINITIONS IN AEM

In a standard installation of AEM you can find the folder:

/etc/map/http

This is the structure used when defining mappings for the HTTP protocol. Other folders (sling:Folder) can be created under /etc/map for any other protocols that you want to map.
Configuring an Internal Redirect to /content

To create the mapping that prefixes any request to http://localhost:4503/ with /content:

  • Using CRXDE navigate to /etc/map/http.
  • Create a new node:

  • Type sling:Mapping
  • This node type is intended for such mappings, though its use is not mandatory.

  • Name localhost_any
  • Click Save All.

  • Add the following properties to this node:
    • Name sling:match 
    • Type String 
    • Value localhost.4503/
    • Name sling:internalRedirect 
    • Type String 
    • Value /content/

  • Click Save All.

This will handle a request such as:
         localhost:4503/geometrixx/en/products.html
as if:
        localhost:4503/content/geometrixx/en/products.html
had been requested.

Here are some screenshots to illustrate:






For more info on this, you can also check out: https://docs.adobe.com/docs/en/aem/6-1/deploy/configuring/resource-mapping.html

Final word:

There are a few ways to skin an AEM environment setup..it all depends on the client's needs and infrastructure. You as the architect should know the framework (AEM) well, and according to your knowledge, apply what will be best for the client.

In this post, the setup is based on a real world scenario.  Multiple websites are run from this environment, and in another post I will go into the MSM (Multi Site Manager) bit of AEM and how that can be an advantage to have in your company. I will also point out the pro's and con's of the Translator service in AEM.

Till then! Happy Coding folks!

*JumpHost - a server that one connect to access other servers with a program like mRemoteG. This is for security. 

No comments:

Post a Comment