curl --request POST https://irl.eu-supply.com/ctm/Supplier/publictenders/PublicTenders -d 'SearchFilter.SortField=None&SearchFilter.SortDirection=None&SearchFilter.ShortDescription=&SearchFilter.Reference=&SearchFilter.TenderId=0&SearchFilter.OperatorId=1&Branding=ETENDERS_SIMPLE&SavedCategoryId=&SavedUnitAndName=&SearchFilter.PublishType=1&TextFilter=&SearchFilter.FromDate=16%2F11%2F2011&SearchFilter.ToDate=16%2F11%2F2020&CpvContainer.CpvCodes=&CpvContainer.CpvIds=&CpvContainer.CpvMain=&CpvContainer.ContractType=&CpvContainer.CpvIds=&CpvContainer.IsMandatory=True&SearchFilter.ShowExpiredRft=false&SearchFilter.PagingInfo.PageNumber=1&SearchFilter.PagingInfo.PageSize=25000' | egrep 'title=|</tr>|<a href|class="js-tooltip-content">|Tender name' | sed 's/.*class="js-tooltip-content">.*/NO TENDER REF/' | sed 's/.*title="//' | sed 's/">//' | sed 's+.*</tr>+###+' | sed 's/ <a href=.*target="_blank//' | sed 's+</a>++' | sed 's/^M//g' | sed 's/,/ /g' | tr '\n' ',' | sed 's/,###/\n/g' | sed 's/^,//' > /mnt/d/dev/etenders/etenders.csv
Some notes:
Run in a terminal with grep, sed, tr installed
outputs to csv
SearchFilter.PagingInfo.PageSize=25000 ... amazingly you can set this to anything you want. You don't need to paginate it, you can get all the results in one go. Just 789 tenders right now.
If you are typing this in, watch out for the ^M charater. You'll have to type this in with CTRL+V,CTRL+M in a terminal - special carriage return character.
edit SearchFilter.ToDate to make sure you are querying all tenders
edit SearchFilter.FromDate if you want to just look for tenders since you last looked
SearchFilter.ShowExpiredRft=false ... you might want to set this to true if you want to see the complete history and not just live tenders
Surprisingly structure and easy to scrape!
Sample out put here ... etenders.csv
No comments:
Post a Comment