Monday, 16 November 2020

Scraping eTenders One-Liner

 

curl --request POST https://irl.eu-supply.com/ctm/Supplier/publictenders/PublicTenders -d 'SearchFilter.SortField=None&SearchFilter.SortDirection=None&SearchFilter.ShortDescription=&SearchFilter.Reference=&SearchFilter.TenderId=0&SearchFilter.OperatorId=1&Branding=ETENDERS_SIMPLE&SavedCategoryId=&SavedUnitAndName=&SearchFilter.PublishType=1&TextFilter=&SearchFilter.FromDate=16%2F11%2F2011&SearchFilter.ToDate=16%2F11%2F2020&CpvContainer.CpvCodes=&CpvContainer.CpvIds=&CpvContainer.CpvMain=&CpvContainer.ContractType=&CpvContainer.CpvIds=&CpvContainer.IsMandatory=True&SearchFilter.ShowExpiredRft=false&SearchFilter.PagingInfo.PageNumber=1&SearchFilter.PagingInfo.PageSize=25000' | egrep 'title=|</tr>|<a href|class="js-tooltip-content">|Tender name' | sed 's/.*class="js-tooltip-content">.*/NO TENDER REF/' | sed 's/.*title="//' | sed 's/">//' | sed 's+.*</tr>+###+' | sed 's/                <a href=.*target="_blank//' | sed 's+</a>++' | sed 's/^M//g' | sed 's/,/ /g' | tr '\n' ',' | sed 's/,###/\n/g' | sed 's/^,//' > /mnt/d/dev/etenders/etenders.csv



Some notes:


Run in a terminal with grep, sed, tr installed

outputs to csv

SearchFilter.PagingInfo.PageSize=25000 ... amazingly you can set this to anything you want.  You don't need to paginate it, you can get all the results in one go.  Just 789 tenders right now.


If you are typing this in, watch out for the ^M charater.  You'll have to type this in with CTRL+V,CTRL+M in a terminal - special carriage return character.

edit SearchFilter.ToDate to make sure you are querying all tenders

edit SearchFilter.FromDate if you want to just look for tenders since you last looked

SearchFilter.ShowExpiredRft=false ... you might want to set this to true if you want to see the complete history and not just live tenders


Surprisingly structure and easy to scrape!


Sample out put here ... etenders.csv