File Formats

Certificates

These are really just files, and you will see a few different extensions to represent them. If you want to know about the actual file format then have a look at Certificates for more information.

CSV

This is one of the oldest and most common file formats, it does have some serious issues but it's flexibility and cross platform compatibility make it a frequent lowest common denominator.

Recently I have been using CSV files with Microsoft Excel and a couple of noteworthy points came to light. First Microsoft Excel cannot cope with numbers longer than 15 digits. Oh and it does not warn you about this, it will just take the first 15 digits starting from the left and drop the rest. Secondly it is hard to get ID numbers with leading zeros into Excel as text, this is true with mobile phone numbers for example but also IDs like 0001, which becomes 1. There are two options, you can write ="01" into your CSV file or "="01"", which whilst looking odd does actually work.

GPX

The GPX file format is for exchanging GPS location based data and is documented at GPX: the GPS Exchange Format. This file format is used by many systems, including my V60 Polestar.

JSON

The JSON (pronounced like the name Jason) format came about because XML was just too verbose. The specification for JSON schema is available at JSON Schema | The home of JSON Schema as well as links to various implementations of validators and generators. There is a nice online editor at JSON Editor Online - view, edit and format JSON online. However I have also found ObjGen - Live JSON Generator useful and the output of that can be put into JSON Schema Tool to get a proper schema.

If you want to do some very flexible JSON processing on the command line then jq is a very fast, powerful and flexible utility that works cross platform.

Markdown

Files with the extension ".MD" which contain Markdown are becoming increasingly common. GitHub uses them as does Stash from Atlassian. However GitHub openly admit they use their own format of Markdown. Here are some handy hints:

  • New Line: this is achieved by typing space, space, enter, in other words the line needs to end with two space characters
  • New Paragraph: you need two new lines for this, or you separate a paragraph with a blank line
  • Bullet List: start the line with "- ", "+ " or "* ", any will do
  • Numbered List: I would recommend starting all the items with "1. " that way they all come out with their own unique number and reordering is easy
  • Tables: the key is to separate cells with | which is "space, vertical bar, space" and you can add leading and trailing bars as well as do things with headers, see Markdown Cheatsheet · adam-p/markdown-here Wiki but note I have not had success with the heading separator on Confluence

A handy place to start with Markdown is Markdown Cheatsheet · adam-p/markdown-here Wiki and there is also some good documentation at Daring Fireball: Markdown which is the "original" Markdown, however there are several variants and it is not always clear which is in use. Use Mastering Markdown · GitHub Guides for "GitHub Flavoured Markdown" or GFM. Atlassian use CommonMark. I believe the community is moving towards CommonMark as the accepted standard, for example GitHub are moving that way.

If you visit Babelmark 2 - Compare markdown implementations and try your Markdown, you will see how different engines parse things slightly differently, but this is a helpful resource.

PDF

This is the Portable Document Format, originally developed by Adobe: Creative, marketing and document management solutions but now an ISO standard. If you want a document readable on many different devices then it is a good choice. I generally use PDF reader, PDF viewer | Adobe Acrobat Reader DC where possible, I also use PDFCreator – free pdf converter, create & merge PDF files but Microsoft Office provides good support for PDF as does LIbreOffice. I have also looked at Split and merge PDF files. Free and open source - PDFsam and used PDFsam Basic, which worked very well for me, PDFsam Visual does look nice though.

There are many libraries for processing PDF files but Apache PDFBox | A Java PDF Library is one such library.

YAML

These files are starting to occur with increasing frequency. So browser Web Extensions use YAML, as does Drupal 8, Spring and other things like Kubernetes. It is worth a quick look at Learn YAML through a personal example for an introduction. If you need some handy YAML utilities then Online YAML Tools - Simple, free and easy to use YAML utilities is worth looking at.