Updates, Reflections and More Plans

In my last post I mentioned that I wanted to put together an API for my malpdfobj tool, so sharing could be easier. The good news is that I have the RESTful API functioning complete with interactive API documentation, python interfaces and the ability to take in samples. I also have new statistics and malware-related details collected on my sample set. The bad news is that the material itself has been submitted to a conference and I will have to hold off on releasing it until I hear back on the acceptance choice. The latest this will be is March 20th, but I hope to hear back sooner. In the meantime, I plan on adding more features to the backend processing, so when I do release the tool it will be well-tested and full featured.

One of the other things I wanted to reflect back on was my original choice in using MongoDB as my backend data storage. Initially I used MySQL to store details, but that quickly got old when parsing larger PDF documents and not being able to account for all the new data I collected. I explained how MongoDB sounded like my solution, but when I posted I was still unsure. It has been about a month and a half since I started playing with MongoDB and I am impressed with how well it works for this project.

When building the API, I wanted to use PHP as the backend, but I needed the ability to query my MongoDB collection. Connecting to Mongo and getting my data was as easy as using the MySQLi connecter in PHP. I was impressed in how seamless the transition was between the two different interfaces and didn’t have to waste much time reading documentation. Once I got my data, from Mongo I was able to parse it how I wanted and then package it back up into JSON to send back to the client. In some cases I didn’t even need to parse the results because they were already JSON packaged. Having that BSON format made life easier and also made querying straight forward.

Of course I did need data for the whole API service to work out, so there had to be an easy way to get a parsed PDF into Mongo. I briefly highlighted on this in a few of my posts, but never went to far into detail. When adding data to Mongo all I had to do was get my data in a JSON format, pick my collection and insert the record. It was literally as simple as that. Python had very good support for MongoDB and using the Mongo option on the tool, I was able to bulk insert into Mongo without a single error.

Querying for the data can be a little confusing at times where results expected are not necessarily what gets returned. In some cases it appears that Mongo is not capable of returning a single object based on certain filtering clauses in a single query. Instead, multiple queries need to be made to obtain the exact object you want to have access to. While this can be a bit frustrating, it is not a big deal for this project given that I usually find myself querying the whole dataset and then foring through the results for further parsing.

I am currently in the process of porting bighands and dirtyhands over to support Mongodb insertion. The goal is to constantly reach out to the Internet for PDF documents, mine the data and then store them in an object format. While most PDFs will be considered good and not malicious, it will still help to find the commonalities within public PDFs on the web. I intend on using MongoDB for this task as well and it will ultimately plug into the released API. Expect to see a fully functional application go live very soon that should hopefully change how we share this kind of data.

Updates, Reflections and More Plans

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112