Our Firm



Data Engineer


Regulatory DataCorp | Information Technology

Job Summary

RDC is looking for a Data Engineer to be part of a team dedicated to building best-in-class machine learning solutions that protect the world’s financial systems. As part of the Architecture team, the Data Engineer will work on the management side of data to make it easy for other systems and people (e.g.,  Data Science, Development, and Product) to use the data to develop and enhance stable and scalable software solutions.

Essentials Duties and Responsibilities

  • Architect/Design, implement, monitor, and maintain big data pipelines and ETL/ELT pipelines
  • Gather data requirements, capture and maintain technical/operational/business metadata
  • Source data from different systems
  • Store data using the optimal technology (e.g. SQL, NoSQL, HDFS, S3) for the particular use
  • Prepare data for analysis by performing data wrangling/munging
  • Cleanse data
  • Convert data from one format to another
  • De-duplicate data
  • Discover opportunities for data acquisition and pick the right tools to collect and analyze such datasets in batch and/or real-time
  • Recommend and implement methods to improve data governance, security, reliability, efficiency and quality
  • Implement best practices around data modeling, data partitioning and data backfilling on new and existing data
  • Help the team ensure compliance with all regulatory requirements related to data privacy
  • Work closely with the Architecture, Data Scientist, and Tech-Ops teams to ensure efficient and effective delivery of data solutions
  • Interface with Software Engineers, Product Managers and Business Analysts to understand goals, data needs and implement data-driven features/products

equal employment opportunity (EEO)

It is the policy of Regulatory DataCorp, Inc. and Regulatory DataCorp Limited (herein referred to as RDC) to provide equal employment opportunity to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, RDC will provide reasonable accommodations for qualified individuals with disabilities.

Job Description Disclaimer

This job description is not intended as and does not create employment contracts. RDC maintains its status as an at-will employer. All descriptions have been reviewed in an attempt to illustrate the jobs functions and basic duties that illustrate the minimal standards required to successfully perform the positions. The list of duties, responsibilities, and requirements should not be interpreted as all-inclusive.  RDC retains the right to change or assign other duties to this position. 
Skills & Requirements


  • Bachelor’s degree in Computer Science or related field with a GPA of 3.0 or higher
  • At least 3 years of relevant data engineering experience
  • At least one year of professional experience with:
  • Amazon Web Services (EMR, S3, Glue, IAM, ECS)
  • SQL (MSSQL, PostgreSQL)
  • NoSQL databases (MongoDB)
  • Various ETL/ELT approaches and tools to help create Data Warehouses or Data Reservoirs
  • Spark ecosystem (i.e. Dataframes , MLlib, SparkSQL) and Hadoop ecosystem (i.e. Hive, Sqoop, HDFS)
  • Various data serialization formats such as Apache Avro, Apache Parquet, json, csv, yaml, xml
  • Elasticsearch/Kibana or a similar distributed search and analytics engine
  • Kafka, Spark Streaming or a similar real-time stream processing framework
  • ActiveMQ, RabbitMQ or a similar messaging system
  • Databricks/AWS or a similar web-based platform for working with Spark and other Big Data tools
  • Apache Oozie, Apache Airflow, Luigi or a similar workflow management system
  • Excellent programming skills with 1+ years of experience in software development writing production code, in Java, Scala or Python
  • Demonstrated proficiency with:
  • Unix/Linux OS
  • Database Management Systems
  • Distributed Systems
  • Big Data concepts/tools
  • Curious, self-driven, analytical and excited to play with data
  • Demonstrated ability to work with ambiguous requirements, adapt, and learn
  • Excellent verbal and written communication