Removing Duplicate Paragraphs

I've called this post 'removing duplicate paragraphs, but actually it's a bit more than that - it's removing paragraphs using a filtering function to compare the current paragraph with the next. By default, it will remove duplicate blank paragraphs in a document, which is a fairly common problem. Negotiating the Docs object model is always a bit tricky, so I find this a handy cleaning up function.

How to use

removeFilteredParas (container element , optional function to compare this para with next);

Examples

Remove any duplicate blank paragraphs.
removeFilteredParas (body);

Remove any paragraphs which would repeat the current one.
removeFilteredParas ( body , function (thisPara, nextPara) {
    return thisPara.getText() === nextPara.getText();
});

Because this takes an optional argument as a function, you can cause the paragraph represented by thisPara to be removed based on any criteria simply by returning true. 

However you don't need to actually compare things, so for example, you could delete  paragraphs 8-10 like this.
  removeFilteredParas(body, function (thisPara, nextPara) {
    var index = thisPara.getParent().getChildIndex(thisPara);
    return index > 6 && index < 10;
  });

If your call passes a function, then the objects that is passed to it are as follows (note that the last paragraph you'll see is the 2nd last one, in other words both values are always a paragraph element). If your function returns true, thisPara will be deleted.
 property contains
 thisPara the current paragraph element
 nextPara the nextParagraph element


The code

Here's the function code.
 /**
  * get rid of any duplicate paragrphs
  * @param {Body} body the body
  * @param {function} [filterFunc=multipleBlanks] returns true if para should be removed
  * @return {Body} the body for chaining
  */
  function removeFilteredParas (body , filterFunc) {
    
    // set default filter function if required
    filterFunc = filterFunc || multipleBlanks;
    
    // delete any dupped paras
    body.getParagraphs()
    .filter( function (d) {
      var nextPara = d.getNextSibling();
      return nextPara && filterFunc(d, nextPara);
    })
    .reverse()
    .forEach(function(d) {
      d.removeFromParent();
    });
    
    /**
     * default function to check if this para needs to be removed
     * @param {Paragraph} thisPara the current paragraph
     * @param {Paragraph} nextPara the next paragraph
     * @return {boolean} true if this paragraph should be removed
     */
    function multipleBlanks ( thisPara, nextPara ) {
      return ! (thisPara.getText().length + nextPara.getText().length);
    }

    return body;
  }
  

For more like this, see Google Apps Scripts snippets. Why not join our forumfollow the blog or follow me on twitter to ensure you get updates when they are available. 

You want to learn Google Apps Script?

Learning Apps Script, (and transitioning from VBA) are covered comprehensively in my my book, Going Gas - from VBA to Apps script, available All formats are available now from O'Reilly,Amazon and all good bookshops. You can also read a preview on O'Reilly

If you prefer Video style learning I also have two courses available. also published by O'Reilly.
Google Apps Script for Developers and Google Apps Script for Beginners.






Comments