A weekend project turned into Project Iris - A serverless ETL pipeline that bridges the gap between modern vulnerability management and SIEM platforms. This first post dives into the first decisions, challenges, and lessons learned in building a cost-effective, secure, and scalable solution using Google Cloud Platform.
Logs as Code: Building a Serverless ETL Security Pipeline #
Security engineering often runs into this painful truth: your tools don’t talk to each other — especially when you’re working with emerging platforms and SIEMs.
That’s exactly what happened when I tried integrating Vicarius VRx (a modern vuln management tool) with our SIEM stack. Waiting for a native integration? Not an option.
**We needed visibility — and we needed it now.
This is the first post in a series I plan to publish, and it outlines the design phase. I’ll get to technical details in further posts.
Link to Github repository:
The Integration Gap: A Common Security Challenge #
Emerging security vendors often lack established integrations with major SIEM platforms. Vicarius VRx, while offering advanced patch and vulnerability management capabilities, was no exception. This limitation threatened to create a critical gap in our security monitoring - one we couldn’t afford to have.
Figuring out a solution #
The project was also clear - Building “Logs as code” or an engineering solution to “glue” things together.
Get logs from point A to point B. Use Python to retrieve logs from VRx API and drop them into a GCP Bucket. Very easy.
The idea was simple and elegant, but…
-
But I need to paginate the API
Because both the API and the SIEM have some limitations…
- But I need to also do pagination for storage in buckets.
-
But I need to track time across executions
Because “last 4 hours” isn’t a thing when your API wants nanoseconds since epoch.
-
But I need to convert those nanoseconds into human time
So the SIEM events actually make sense.
-
But I need to handle secrets securely
-
But I need to log everything
Because I need to know what happened, when, and get good feedback when (Not if) something goes wrong.
-
But I need to send Slack alerts
So I need to know if it’s working, or more importantly—when it’s not. Easily.
-
But I need to avoid duplicates
Because confusing a monitoring team is exactly the opposite of my mission.
-
But I need to modularize it (eventually), or make it as generic as possible
Because today it’s VRx, tomorrow it’s XYZ Corp and their weird GraphQL event feed.
-
But I need it to be serverless
Because I’m not spinning up a VM just to run code for 5 seconds every 4 hours.
Me, thinking the whole weekend about this before actually starting this project:
Enter Project Iris: A Serverless Security Pipeline #
Rather than deploying a server to achieve this, I built Iris - a serverless security data pipeline leveraging Google Cloud Platform’s Cloud Run service. Here’s why this approach made sense:
- Cost-effective: Pay only for actual execution time vs maintaining 24/7 infrastructure - The estimated time of execution (Every 4 hours) was no more than 5 seconds: Grab logs from API iteratively, push them to GCP Bucket.
- Resource-efficient: Process 500 events in milliseconds with minimal overhead
- Highly reliable: Leverages GCP’s robust platform and built-in redundancy
- Secure by design: Managed identity and secrets handling through Cloud Secret Manager
- Weapon of choice: Python (Of course)
Technical Deep Dive: The Power of Stateless Processing #
The Architecture: Simple Yet Powerful #
Or, in a nice Mermaid diagram (I’m loving mermaid this days so here it is):
graph TD;
A["Cloud Scheduler"] -->|"Triggers every 4h"| B["Cloud Run Job (Iris)"];
B -->|"Fetch Secrets"| C["Secret Manager"];
B -->|"Pull Events"| D["Vicarius VRx API"];
D -->|"Process & Transform"| B;
B -->|"Store NDJSON"| E["Cloud Storage"];
B -->|"Status Updates"| F["Slack Notifications"];
Iris processes security events in batches of 500, using intelligent pagination and timestamp tracking to ensure no event is missed or duplicated. The pipeline includes:
- API ingestion with built-in rate limiting and error handling
- Data transformation from raw API format to SIEM-compatible NDJSON
- Timestamp conversion from nanoseconds to RFC 3339 format
- State management through Cloud Storage for consistent processing
Robust Logging: Because Security Tools Need Security #
A security pipeline needs comprehensive logging for accountability and troubleshooting. The logging strategy was dual:
- Detailed stdout logs in Cloud Run for complete execution tracing (
loggingpython package), - Real-time Slack notifications for critical status updates:
- Success confirmations
- Processing errors
- Zero-event notifications
And so I started.. #
On a personal note, I have never worked coded blindly for that much time. There were no chances for testing code fully. Only “microtests”: Essentially sandbox or scratchpad files to test things out - “Will this secret load? Will this timestamp convert? Will GCP accept this blob?”.
If the final deploy failed, cleanup was pain. Success had to be one-shot clean.
Me several hours into this solution:
Lessons Learned #
Overall, it was a very enriching experience. Building this pipeline taught me valuable lessons about modern security integration:
- Serverless isn’t just for web applications - it’s perfect for scheduled security tasks
- State management in serverless requires careful design but enables reliable processing
- About AI: AI was definitely a factor for me to achieve this in 2 days instead of 1 week (Minus the design and “mental modeling” phase). My idea that now we’re too slow for LLMs usage is very strong now (My typing speed is waay behind my thinking speed, and hence I’m typing dozens or hundreds of words to prompt GPT-4 o1 which is sometimes miles behind my thinking, and very annoying).
- I need to improve my Vim workflow - For this kind of development, Vim was a bit clunky and kicked me out of Flow state several times. Dropped it and used VSCode for the whole time. There are some remaps to do for me to breeze through this kind of workflow in Vim.
Looking Forward: Evolution of Security Integration #
While vendors will eventually provide native integrations, the ability to build secure, efficient data pipelines is becoming a crucial skill in security engineering. The approach with Iris demonstrates that with modern cloud services, we can bridge integration gaps without compromising on security or scalability.
Key Takeaway: Don’t let integration gaps create security blind spots. Modern cloud platforms provide the building blocks for secure, efficient, and cost-effective security data pipelines.