Directed Acyclic Graphs and Executing Tasks in Order (and in Parallel) Based on Dependencies [1107]

A little while ago, there was a requirement to write a tool that could take a number of tasks each with a set of dependencies and execute them in parallel while taking the dependencies into account.

The tasks themselves were meant for data migration but that is not particularly relevant. We were writing a number of tasks which all had a set of dependencies (some of the tasks did not have any dependencies or the process could of course never start).

It was assumed that there were no cyclic dependencies (which would be error in this particular case anyway)

Bearing in mind that this was a quick and dirty tool for use three times, some of the bits in here could do with tidying up.

Each task was defined to implement the following interface

public interface Task extends Runnable {

	public String getName();

	public Set getDependencies();

}

It should all be self explanatory. Extending the Runnable interface ensure that we can pass it into threads and other relevant bits of code. The getDependencies is expected to return the name of the tasks that it depends on.

The basic task runner which I describe below does not check if the task described in any list of dependencies actually exist. If an non-existing dependency is defined, it will likely just throw a Null Pointer Exception. I wrote this a long time ago, so don’t actually remember.

Continue reading

Java Object Size In Memory

Anyone who has worked with java in a high end application will be well aware of the double edged sword that is java garbage collection. When it works – it is awesome but when it doesn’t – it is an absolute nightmare. We work on a ticketing system where it is imperative that the system is as near real-time as possible. The biggest issue that we have found is the running of of memory in the JVM which causes a stop the world garbage collection, which results in cluster failures since an individual node is inaccessible for long enough that it is kicked out of the cluster.

There are various ways to combat this issue and the first instinct would be suggest that there is a memory leak. After eliminating this as a possibility, the next challenge was to identify where the memory was being taken up. This took some time and effort and the hibernate second level cache was identified. We were storing far too much in the second level cache.

Continue reading

Android – Parcel data to pass between Activities using Parcelable classes

Passing data between activities on android is unfortunately, not as simple as passing in parameters. What we need to to do is tag these onto the intent. If the information we need to pass across is a simple object like a String or Integer, this is easy enough.

String strinParam = "String Parameter";
Integer intParam = 5;

Intent i = new Intent(this, MyActivity.class);
i.putExtra("uk.co.kraya.stringParam", stringParam);
i.putExtra("uk.co.kraya.intParam", intParam);

startActivity(i);

Passing in custom objects is a little more complicated. You could just mark the class as Serializable
and let Java take care of this. However, on the android, there is a serious performance hit that comes with using Serializable. The solution is to use Parcelable.

Continue reading

Database Systems Compared

My first experiences of a computer started with DBase III+ which is now dBASE, then went on to Foxpro, now Microsoft Visual Foxpro. I have since used Filemaker Pro, Microsoft Access, Microsoft SQL Server, MySQL, PostgreSQL, SQLite and HSQLDB. I have not yet used IBM DB2, Oracle. Wikipedia has a list of database systems.

Having worked with this range of database systems and having done copious amounts of research into DB2, Oracle and other DB systems I have not mentioned, I like answering the age old questions. Which is the best database system?

Ah! if only it was that simple. There is no database system that is appropriate for any given requirement. But then, if you have been in the technology sector long enough, you would already know that. It’s all about using the right tool for the job.

I separate these systems into two broad categories and Oracle. There are the Desktop based database systems:

  • DBase
  • Foxpro
  • SQLite
  • HSQLDB
  • Filemaker Pro
  • Microsoft Access
  • MySQL

DBase, FoxPro, Filemaker Pro and Microsoft Access are essentially a GUI frontend that has a database backing.

Access is the best choice for this purpose under the majority of circumstances. Filemaker Pro is relevant in some. The usual reason to use DBase or FoxPro is simply that the developer is used to it. This is not a good enough reason.

I have used DBase III+ for developing an office management suite back in 1994. I have since used Filemaker Pro to develop a simple contact management database in 1998, Microsoft Access to develop a patient management system for a clinic.

SQLite, HSQLDB and MySQL are database engines that are to be utilised by popping a frontend on top; sometimes the frontend is Microsoft Access. Microsoft Access can also be used for its database engine.

Access is usually the worst choice for this except as a stopgap. There are exceptions to this. One is for a web frontend if the site is not too busy and its running on a microsoft platform. You don’t have to go to the hassle of installing anything on the server. The drivers will take care of it all.

HSQLDB becomes an obvious choice for a light java based application and SQLite for any other lightweight applications.

MySQL is substantially more powerful and scales a lot better. I include it in this section because it is a server grade database system that can also work well in a desktop environment.

I have used Access for several web based systems and I have used HSQLDB for unit testing hibernate and for a quick and dirty MP3 library that linked into musicBrainz. I have used SQLite in passing to be utilised by open source products.

I have used MySQL with an Access frontend as a management suite for a website as well.

And we have the server based database systems:

  • MySQL
  • Microsoft SQL Server
  • IBM DB2
  • PostgreSQL

MySQL was used as the backed database system for the edFringe.com website. This was the perfect choice since the most important requirement was speed. Particuarly with the Query Cache and Master Slave replication, MySQL was the best choice.

SQL Server was used as the backend system for an online course for the Scottish Enterprise around 1999/2000. While MySQL would have been a good choice this, it was not of production quality at the time.

We have also used Ms SQL Server for an insurance company since all the infrastructure was based on Windows and PostgreSQL did not have a viable Windows version at the time.

We use PostgreSQL for megabus. While speed is absolutely critical, it is a ticketing system which means that transactionality is absolutely critical.

While MySQL now has transactionality with innodb, it is still nowhere near as good as the transactionality provided by PostgreSQL through MVCC (Multi-version Concurrency Control). We could have used Ms SQL Server but the cost savings are dramatic.

To summarise, each system has a specific use, specific strengths and weaknesses and which should be used is highly dependent on what it is to be used for. I am hopeful that the summary of what we have used each of these systems for us useful in determining which one is best placed to solve any specific problem 😀

We have not yet used Oracle and it was a strong contender for megabus but the serious heavyweight functionality provided by Oracle comes at a price and it is not yet a cost effective option.

Eclipse TPTP on Ubuntu (64bit)

I run ubuntu 64 bit (technically, I run an ubuntu 64bit vserver which I access from ubuntu 32 bit but thats not really relevant).

In the open source world, I expect that all things which are accessible as 32bit are also accessible and 64bit and ubuntu makes it automagic enough that everything just works. Yes, I run into problems with closed source software like Flash Player (recently resolved with flash player 10) and the Java Plugin but that is another story. I use Eclipse and wanted to do some performance analysis and benchmarking to find a bottleneck and installed the TPTP plugin; and ran into a problem. It just didn’t work.

To resolve it, I turned to google… In this instance, it turned out to be a distraction and a red-herring. It lead me in the direction of installing libstdc++2.10-glibc2.2_2.95.4-27_i386.deb which was difficult at best since there was only a 32bit version of the package and that wasn’t even in the standard repository.

In the end, digging deeper, I found that it simply missed the following shared object libstdc++.so.5.

All I had to do was install libstdc++5:

sudo aptitude install libstdc++5

and it worked… 😀

Now, I think that ACServer which Eclipse uses to do TPTP should not link to an outdated library but that is another issue…

Hibernate Domain Model Testing

One of my pet peeves with Hibernate has always been how difficult it was to test it. I want to test the persistence of data, loading the data back and any specific funtionality with the domain model.

Simple? NO! The main problem was the management of the data set. I had set up, in the past fairly interesting classes to test the functionality using reflection, and injecting the data from the classes themselves through the data provider mechanism of TestNG. However, this was error prone and clunky at best. It also made dependency management of data quite cumbersome.

With a view to resolving this, I also looked at DbUnit, unitils and Ejb3Unit. They all did some things that I liked but lacked some functionality that was important.

This led me to write a simple testing infrastructure. The goal was straightforward.

  • I need to be able to define data in a CSV (actually it was seperated by the pipe character |, so PSV) based on entities.
  • The framework should automatically persist the data (and fail on errors)
  • It should test that it can load all that data back
  • It should run as many automated tests on the DOM as possible.

The framework uses the CSV files to read the data for each of the classes (using the excellent SuperCsv library). It needs an Id field for internal reference. As long as the id’s match within the CSV files for the relationships, it will be persisted correctly into the database even when the persisted id’s are different.

For example, I could have a Contact.csv with 5 records (ids 1 through 5) and a Company.csv with 3 records (ids 1 through 3).

The Contact.csv records can map to the id specified in the Company.csv file and when the records get persisted, they will be associated correctly, even if the id’s in the database end up being different.

The framework also looks for the CSV file which has the same name as the class within the location defined within the configuration file. This means that as long as the filename matches the class name, the data loading is automatic.

For simple classes, the Test case is as simple as:

public class CompanyTest extends DOMTest<Company> {

public CompanyTest() {
super(Company.class);
}
}

The system (with the help of testNG) is also easily flexible to define object model dependencies. Just override the persist method (which just calls the super.persist) and define the groups to be persist and <object>.persist

in this particular case, it would be

@override

@Test(groups={“persist”, “Company.persist”}

public void persist() {

super.persist();

}

For all dependent classes, I then depend on the Company.persist group (For the ContactTest class for example, since it needs to link to the Company object)

You can specify OneToOne and ManyToOne relationships with just the CSV files – just defining the field name and the id of the object to pull in.

ManyToMany is more complex and requires an interim object to be created within the test section. If the Contact to Company relationship above was ManyToMany, we would create a ContactCompany class with just the two fields – Contact & Company, then create a csv file with three fields, id, Contact, & Company. The framework currently always needs an id field.

You would then need to write a method within the ContactTest or CompanyTest(I use the owning side) to read the CSV file in and pump the data. This process is a little bit complex just now.

With an appropriate amount of test data, you are able to write a test suite that can consistently test your domain model. More importantly, you can configure it to drop the database at the start of each run so that once the tests are complete, you have a database structure and data than can be used for testing of higher level components (EJB/Spring/UI/WebApp)

We currently use this framework to test the domain model as well as distribute a data set for development and testing of the higher tier functionalities.

For the future, there are several additional features this framework needs:

  • It currently needs the setters/getters & constructors to be public. This needs to be FIXED
  • Refactor the ManyToMany Relationship code to make it easier and simpler to test and pump data
  • See if we can ensure that additional tests which data is done within a transaction and rolled back so that the database is left in the “CSV Imported” state on completion of tests
  • Easier Dependency management if possible

This framework is still inside the walls of Kraya, but once the above issues are resolved and it is in a releasable state, it will be published into the open source community. If you are interested in getting a hold of it, email me and I’ll provide you with the latest version.

The easier and quicker it is to test, the more time we can spend on writing code… 🙂 The higher the coverage of the tests, the more confident you can be of your final product.

To more testing…