Importing Data into Cassandra with CSV

Balvinder Singh

--

Step A. Creating a CSV file

  1. First, create an Excel File with fields you require to import with header/no header.
  2. UUID creation (Optional, if you use UUID for id in your Cassandra DB)
    2.1. Go to Link below for generating UUID v4 and download as a text file.

2.2 Then go to Text to CSV Site to upload file and download as CSV

2.3 Open CSV and Copy all Id (UUID’s) from there to your CSV file

3. For Timestamp generation (optional, if you using some date in your fields) go to below link and copy and paste timestamp field like 1551705863. For more than one field drag down the pasted timestamp column to create an incremented timestamp for other fields with different date-time.

www.unixtimestamp.com

For the timestamp field, go to [Timestamp Site Link](/) and copy and paste. and if required for more than one field you can create by dragging up to required columns for selection to generate incremented timestamps.

4. Then save and import as text.csv with a delimiter to (,)(comma, for fields separation) and (“ ”) for String.

Note : Delimiter is used to separate fields as csv is created with rows of data, each row containing all field values separated with a delimiter. This delimiter is used to identify and separate different fields in a row. You can use any delimiter like , or | or || like any of these. You just need to mention while importing csv.

Step B. Importing CSV File

  1. Copy CSV file to Docker container of Cassandra by Command to some folder like root folder of Docker container ( Skip if you running Cassandra directly and copy to any place)
sudo docker cp /file/file.csv 2b8f8989d6c6:/file/file.csv

where /file/file.csv is host(local) file path and 2b8f8989d6c6:/file/file.csv is the container file path, shown by container id and path.

Note : You can get the Docker container id by command

sudo docker ps

with output

CONTAINER ID          IMAGE                                     COMMAND                  CREATED             STATUS                      PORTS                                                                                                           NAMES
2b8f8989d6c6 cassandra:3.9 "/docker-entrypoint.…" 2 weeks ago Exited (255) 2 weeks ago 0.0.0.0:7000-7001->7000-7001/tcp, 0.0.0.0:7199->7199/tcp, 0.0.0.0:9042->9042/tcp, 0.0.0.0:9160->9160/tcp docker_cassandra_1_448ca2d53958

2. Now bash to Cassandra Docker Container

sudo docker exec -it containerID bash

where containerID is the id of running Cassandra Docker Container.

3. Drop to CQL shell.

cqlsh

4. Select keyspace(optional, as you can also define keyspace)

use keyspace

where keyspace is the name of project keyspace.

5. Now import CSV by command

COPY keypsace.tableName (col1,col2,col3.....) FROM 'file/file.csv' WITH DELIMITER=',' AND HEADER=TRUE;

where keyspace is the name of the keyspace, tableName is the name of the Table in Cassandra, col1, col2 are columns we need to save from CSV.
file/file.csv is the CSV file path, Delimiter is the limiter for fields separation like age, name, dob . and Header means you have one row for header properties that you want to skip.

6. Now all your data will be imported.

7. You can check by command.

Select count(*) from keyspace.table;

where the count will be 0 if there are no records, and Count will be greater than 0 if records were imported successfully.

Hope you like the post, share your comments below and keep me motivated to write more. Also, visit my blog for other posts.

--

--

No responses yet

Write a response