Monday, 16 November 2020

Scraping eTenders One-Liner

 

curl --request POST https://irl.eu-supply.com/ctm/Supplier/publictenders/PublicTenders -d 'SearchFilter.SortField=None&SearchFilter.SortDirection=None&SearchFilter.ShortDescription=&SearchFilter.Reference=&SearchFilter.TenderId=0&SearchFilter.OperatorId=1&Branding=ETENDERS_SIMPLE&SavedCategoryId=&SavedUnitAndName=&SearchFilter.PublishType=1&TextFilter=&SearchFilter.FromDate=16%2F11%2F2011&SearchFilter.ToDate=16%2F11%2F2020&CpvContainer.CpvCodes=&CpvContainer.CpvIds=&CpvContainer.CpvMain=&CpvContainer.ContractType=&CpvContainer.CpvIds=&CpvContainer.IsMandatory=True&SearchFilter.ShowExpiredRft=false&SearchFilter.PagingInfo.PageNumber=1&SearchFilter.PagingInfo.PageSize=25000' | egrep 'title=|</tr>|<a href|class="js-tooltip-content">|Tender name' | sed 's/.*class="js-tooltip-content">.*/NO TENDER REF/' | sed 's/.*title="//' | sed 's/">//' | sed 's+.*</tr>+###+' | sed 's/                <a href=.*target="_blank//' | sed 's+</a>++' | sed 's/^M//g' | sed 's/,/ /g' | tr '\n' ',' | sed 's/,###/\n/g' | sed 's/^,//' > /mnt/d/dev/etenders/etenders.csv



Some notes:


Run in a terminal with grep, sed, tr installed

outputs to csv

SearchFilter.PagingInfo.PageSize=25000 ... amazingly you can set this to anything you want.  You don't need to paginate it, you can get all the results in one go.  Just 789 tenders right now.


If you are typing this in, watch out for the ^M charater.  You'll have to type this in with CTRL+V,CTRL+M in a terminal - special carriage return character.

edit SearchFilter.ToDate to make sure you are querying all tenders

edit SearchFilter.FromDate if you want to just look for tenders since you last looked

SearchFilter.ShowExpiredRft=false ... you might want to set this to true if you want to see the complete history and not just live tenders


Surprisingly structure and easy to scrape!


Sample out put here ... etenders.csv

Thursday, 9 June 2011

Spring Data Mongo Repositories

Understandably, spring data is very new so the documentation isn't 100% accurate. There are several mistakes in the documentation at the moment, but there was one that got me stuck for ages. I was getting this exception in setting up the PersonRepository:


Unsupported id class! Only class java.lang.String,class org.bson.types.ObjectId,class java.math.BigInteger are supported!

The problem was the had this interface definition:

public interface PersonRepository extends PagingAndSortingRepository<person,long>{

where as instead I should have had:

public interface PersonRepository extends PagingAndSortingRepository PagingAndSortingRepository<person,String> {

Friday, 3 June 2011

Have separate log files for each of your services running on jboss

This is trickier than you'd imagine. I wanted:

- Several services running on jboss
- Each service outputting to a separate log file
- Each log4j.xml configuration to be available to edit outside of the war file

1)

first of all have the following configured in your web.xml file:


 <context-param>
<param-name>log4jConfigLocation</param-name>
<param-value>file:--file location on host--</param-value>
</context-param>

<context-param>
<param-name>log4jRefreshInterval</param-name>
<param-value>1000</param-value>
</context-param>

<listener>
<listener-class>org.springframework.web.util.Log4jConfigListener</listener-class>
</listener>


<context-param>
<param-name>log4jExposeWebAppRoot</param-name>
<param-value>false</param-value>
</context-param>


If you're using maven substitution you'll need to make sure that you have the following in your pom in order to have it replace variables in your web.xml:


<webResources>
<webResource>
<directory>${basedir}/src/main/webapp/WEB-INF</directory>
<includes>
<include>web.xml</include>
</includes>
<targetPath>WEB-INF</targetPath>
<filtering>true</filtering>
</webResource>
</webResources>


Then finally, you need to make sure that you are including log4j-1.2.14.jar, but excluding the other commons logging jars and excluding other log4j xml files in the war file that you produce. This weird and wonder combination achieves the separate log files. In maven you can exclude jars by putting <scope>provided</scope> on dependencies.

Setting up our spring integration apps to run under jboss:

I found setting up my spring app with jboss 5 and maven tricky. You'll need to produce a war file in maven to deploy to the jboss server. Take the following steps:
 Ensure you have
<packaging>war</packaging>
near the top of you pom
 Then add the following plugin:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-war-plugin</artifactId>
<version>2.1.1</version>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
</manifest>
</archive>
</configuration>
</plugin>

 Add the following directory structure to your existing maven project:

src/main --
|-- webapp
|
|----- WEB-INF
| |-- web.xml
| |-- lib
| |--- <libraries will eventually automatically be added here by the war plugin>
|----- jsp
|--- <you can put a jsp page here so you can verify your app is up once you deploy>

 Now add the web.xml file described in the structure above with contents similar to this:

<?xml version="1.0" encoding="UTF-8"?>
<web-app id="WebApp_ID" version="2.4"
xmlns="http://java.sun.com/xml/ns/j2ee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee
http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
<display-name>app</display-name>

<!-- Configure the location of the Spring application context -->
<context-param>
<param-name>contextConfigLocation</param-name>
<param-value>classpath:spring/app.xml</param-value>
</context-param>

<listener>
<listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
</listener>

<!-- Configure Log4j to get Configuration From Central Area -->
<context-param>
<param-name>log4jConfigLocation</param-name>
<param-value>file:${log4j.calculation.filePath}</param-value>
</context-param>

<context-param>
<param-name>log4jExposeWebAppRoot</param-name>
<param-value>false</param-value>
</context-param>

<context-param>
<param-name>log4jRefreshInterval</param-name>
<param-value>1000</param-value>
</context-param>
<listener>
<listener-class>org.springframework.web.util.Log4jConfigListener</listener-class>
</listener> -->

<!-- Status page accessed at http://hostname:port/CGO-calculation
<welcome-file-list>
<welcome-file>index.jsp</welcome-file>
</welcome-file-list>
</web-app>

 For the logging you cannot refer to a classpath resouce (at least I couldn't ... anyone who cracks this let me know).
 Also I couldn't get jboss to call a main class or pass arguments arguments to my app, so you'll have to:
 Use the maven variable sub stitution to have your app pull in its environment specific config files. e.g. put a variable like ${dbadaptor.release.env} in your config file and then when you build from maven do something like this ... mvn clean install -Ddbadaptor.release.env=dev. For this to work you'll need to say in your pom which sets of files are eligable for substitution like this using the tag when declaring resources:
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.xml</include>
</includes>
</resource>

 Allow jboss to instantiate your spring context directly. This is in the web.xml above in this piece:
<context-param>
<param-name>contextConfigLocation</param-name>
<param-value>classpath:spring/databaseAdaptor.xml</param-value>
</context-param>
 Jboss ships with its own xerces xml parsers. If you package your war with the xerces and xml apis you'll get an error when you try to deploy to jboss. Exclude the libraries from the war build in your pom like this with the scope tag in the dependency (it won't affect your local running, although it may affect a one-jar that you produce (if you ever need to)):

<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.6.2</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.0.b2</version>
<scope>provided</scope>
</dependency>
 Jboss also ships with its own log4j jars and prevents you from accessing your own in the war lib directory. Not only this, if you include the logging jars in your war lib directory your own classpath will resolve to these jars even though at a later time jboss will prevent them from being accessed, so you need to exclude these from your war also (again using the scope tag):

<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1</version>
<scope>provided</scope>

<exclusions>
<exclusion>
<groupId>apache-log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>

<exclusion>
<groupId>logkit</groupId>
<artifactId>logkit</artifactId>
</exclusion>

<exclusion>
<groupId>avalon-framework</groupId>
<artifactId>avalon-framework</artifactId>
</exclusion>

<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>

</dependency>

<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.14</version>
<scope>provided</scope>
</dependency>

 Using standalone MBeans servers like mx4j will cause issues with jboss so your better off just using the jboss jmx service and get rid of the former from your spring context
 Once you run your maven build (mvn clean install), a *.war file will appear in your ./target directory. Copy this to your jboss server deploy directory (/server/default/deploy) and when you start jboss it should pick it up (this last step will be slightly different when we move from local application server to the jfarms ... once we get the jfarms we can update here)

Thursday, 2 June 2011

ORA-12541: TNS:no listener

I was getting this error, even though I had the tns name I wanted configured. It really bugged me. I had gone through the normal things that cause this error:

- Wrong or missing entry in my tnsnames.ora file (can be found in your thin client installation ... e.g. C:\oracle\product\10.2.0\client_1\network\ADMIN\ )
- I had multiple installations of different versions of the client, but I had updated all the tnsnames files in each of these installations.

But then I noticed the detail of the error:

Fatal NI connect error 12541, connecting to:
(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=MYDB)(CID=(PROGRAM=C:\SPBSource\myapp\trunk2\impl\target\spb#debug\mycompany\myappmanager\myapp.exe)(HOST=MYHOST)(USER=alegear)))(ADDRESS=(PROTOCOL=TCP)(HOST=10.11.16.4)(PORT=1521)))

The port it was trying to connect on was 1521 which was the wrong port for my database. My dba's had configured 1525. But had this correct port in that tnsname file, so why was it still picking up this?

Well, if it has trouble reading the tnsnames file or parts of your oracle installation, oracle will always default to port 1521 when resolving the TNS Name.

So what was wrong? Well, this will be different for everyone, but in my case I found that there is an environment variable called TNS_ADMIN. I had this pointing to a different, old and broken installation of my oracle client. So even though tnsping was working, It had trouble finding the tnsnames, so defaulted to 1521. When I repointed to a good installation and restarted the apps that used it, things were hunky dorey.