Skip to main content

Validating CSV Files

What is CsvValidator ?
  A Java framework which validates any CSV files something similar to XML validation using XSD.

Why should I use this ?
  You don't have to use this and in fact its easy to write something your own and also checkout its source code for reference.

Why did I write this ?
  Some of our projects integrate with third party application which exchanges information in CSV files so I thought of writing a generic validator which can be hooked in multiple projects or can be used by QA for integration testing.

What is the license clause ?
 GNU GPL v2

Are there any JUnit test cases for me checkout ?
 Yes, source

How to integrate in my existing project ?

Just add the Jar which can be downloaded from here CsvValidator.jar and you are good.

Instantiate CsvValidator constructor which takes these 3 arguements

         // filename is the the file to be validated and here is a sample
        // list - defines all the fields in the above csv file ( a field has index, type, isOptional, regex )
        // last argument is the file delimiter and it can be anything and not just comma


Checkout this sample code 


 public static void main(String[] args) {
        boolean optional = true;
        boolean notOptional = false;

        List list = new ArrayList();
     
        list.add(new Field(1, Type.NUMBER, notOptional));
        list.add(new Field(2, Type.NUMBER, notOptional));
        list.add(new Field(3, Type.TEXT, notOptional));
        list.add(new Field(4, Type.TEXT, notOptional));
        list.add(new Field(5, Type.NUMBER, notOptional));
        list.add(new Field(6, "Purchase Date", Type.DATE, notOptional, "yyyy-MM-dd HH:mm:ss"));
        list.add(new Field(7, Type.NUMBER, notOptional));
        list.add(new Field(8, Type.NUMBER, notOptional));
        list.add(new Field(9, "Campaign ID", Type.TEXT, optional));
        list.add(new Field(10, "Promo Code", Type.TEXT, optional));
        list.add(new Field(11, Type.TEXT, notOptional));
        list.add(new Field(12, Type.NUMBER, optional));
        list.add(new Field(13, Type.NUMBER, optional));
        list.add(new Field(14, Type.TEXT, notOptional));

         
        CsvValidator validator1 = new CsvValidatorImpl("somefile.txt", list, "\\|");


        if (!validator1.isValid()) {
            System.out.println(validator1.getValidationDetails());
        }

    }

Can my QA use it in a stand alone mode ?
 Yes, just checkout the spec.txt file which your QA needs to create.

java -jar "CsvValidator.jar" csv-file.txt spec.txt

first line in spec.txt is the delimiter (, ' | etc)
  All other lines contains each field information on a separate line for example
 
    Currency,T,R,
    Date of Purchase,D,R,yyyy-MM-dd HH:mm:ss

    first column - field name helps in understanding validation results 
    second column - type (T, N, D ) represents Text, Number, Date
    third column - Required (R, O) represents Required or Optional
    fourth column - regex 

  (Regex for dates )
    
"yyyy.MM.dd G 'at' HH:mm:ss z"2001.07.04 AD at 12:08:56 PDT
"EEE, MMM d, ''yy"Wed, Jul 4, '01
"h:mm a"12:08 PM
"hh 'o''clock' a, zzzz"12 o'clock PM, Pacific Daylight Time
"K:mm a, z"0:08 PM, PDT
"yyyyy.MMMMM.dd GGG hh:mm aaa"02001.July.04 AD 12:08 PM
"EEE, d MMM yyyy HH:mm:ss Z"Wed, 4 Jul 2001 12:08:56 -0700
"yyMMddHHmmssZ"010704120856-0700
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"2001-07-04T12:08:56.235-0700

 Regex cheat sheet for all others cheat sheet

Comments

Anonymous said…
Excellent article and easy to understand explanation. How do I go about getting permission to post part of the article in my upcoming news letter? Giving proper credit to you the author and link to the site would not be a problem.
Anonymous said…
In my csv some headers are created dynamically, i.e they are unknown at compile time. Is there any way to validate those field using CsvValidator?
vijay krishna said…
Can you share gradle build path for this jar

Popular posts from this blog

JPA 2 new feature @ElementCollection explained

@ElementCollection is new annotation introduced in JPA 2.0, This will help us get rid of One-Many and Many-One shitty syntax.

Example 1: Stores list of Strings in an Entity

@Entity
public class Users implements Serializable {

    private static final long serialVersionUID = 1L;
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
@ElementCollection
    private List<String> certifications = new ArrayList<String>();

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public List<String> getCertifications() {
        return certifications;
    }

    public void setCertifications(List<String> certifications) {
        this.certifications = certifications;
    }
..
}

        Users u = new Users();
        u.getCertifications().add("Sun Certified Java Programmer");
        em.persist(u);

Generated Tables

   Users
Column --> ID
    Row             1

Users_CERTIFICATIONS

ArrayList vs LinkedList vs HashSet Performance Comparision

ConclusionsInserting & Reading sequentially from Collection prefer LinkedList/ArrayListInserting & Reading/Deleting by Search/equals from Collection prefer HashSetInserting, ArrayList & LinkedList performs best while HashSet takes double the timeReading, HashSet performs best while ArrayList & LinkedList are marginally lessDeleting, HashSet performs 10 times better than ArrayList & ArrayList performs 4 times better than LinkedList. LinkedList is slow because of sequencial search Bottom line : unless you are not going to iterate using for(Integer i : list ) then prefer HashSet
Inserting/Reading/Deleting integer's from zero till countJDK7Collectionactioncounttime msArrayListInsert1000/1LinkedListInsert1000/1HashSetInsert1000/1ArrayListInsert100005LinkedListInsert100004HashSetInsert100007ArrayListInsert10000011LinkedListInsert10000011HashSetInsert10000021ArrayListGet/Read1000LinkedListGet/Read1000HashSetGet/Read1000ArrayListGet/Read100004LinkedListGet/Read100003Has…