Date: Fri, 29 Mar 2024 10:02:45 -0500 (CDT) Message-ID: <120464004.30440.1711724565022@PUBEDFIPRDWEB5.public.local> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_30439_1831783830.1711724565017" ------=_Part_30439_1831783830.1711724565017 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The Ed-Fi ODS / API contains endpoints that allow client applications to= send XML data files through the API for bulk loading. This is useful for a= number of scenarios. Bulk loading is often the easiest way to populate a n= ew instance of the ODS / API. In addition, some implementations only requir= e periodic uploads from clients. Bulk loading is useful for these "batch lo= ading" scenarios.
This article provides overview and technical information to help platfor= m hosts and client developers use the bulk load endpoints.
Note that platform hosts have an alternate way of bulk loading files dir= ectly from disk (i.e., not through the API) using the Ed-Fi Console ODS Bul= k Loader. See the Co= nsole Bulk Loader documentation for more information.
A bulk operation can include thousands of records across multiple files = to insert and update.
A few key points about the API surface worth understanding are:
Before we dive into the details, it's useful to understand the differenc= es between the transactional operations of the API surface and the bulk loa= d services discussed in this article. The following table provides a compar= e/contrast of the major differences:&n= bsp;
Transactional API Surface | Bulk Load Services |
---|---|
JSON | Ed-Fi Data Standard XML |
Synchronous responses | Asynchronous responses |
Near real-time, as data is changing in client ap= plications | For initial load or batch mode updates |
Full range of create, read, update, and delete o= perations | Upsert (i.e., create and update) only |
Create and retrieve UniqueIds | No ability to create or retrieve UniqueIds |
This section outlines the basics of setting up and testing bulk loading = through the ODS / API surface.
This walk-through demonstrates the sequence of operations clients use to= load bulk data via the API. We'll use an XML file with student data as an = example.
The high-level sequence of operations from the client is as follows:
Detail on each step follows.
POST a representation of the files to upload to /bulkOperations.
{ "uploadFiles": [{ "format": "text/xml", "interchangeType": "student", "size": 699 }] }
Create one uploadFiles
entry for every file you're includin=
g. The format
should always be "text/xml", interchangeTy=
pe
should be the type of interchange, and size
is the t=
otal bytes of the file you're uploading. You can easily get the file size b=
y using new
FileInfo(filePath).Length
or using th=
e Length
property of the file stream if you're opening a file =
stream to send it up.
Sample Response (should have status code of 201 Created
):=
p>
HTTP/= 1.1 201 Created Content-Length: 290 Content-Type: application/json; charset=3Dutf-8 Location: http://localhost:54746/api/v2.0/2016/BulkOperations/1b8e6786-53ef= -4ec0-9ee1-2d1194e1374c Server: Microsoft-IIS/10.0 X-SourceFiles: =3D?UTF-8?B?QzpcR2l0XFBlcnNvbmFsXEVkRmlBbGxpYW5jZVxFZC1GaS1P= RFMtSW1wbGVtZW50YXRpb25cQXBwbGljYXRpb25cRWRGaS5PZHMuV2ViQXBpXGFwaVx2Mi4wXDI= wMTZcYnVsa09wZXJhdGlvbnM=3D?=3D X-Powered-By: ASP.NET Date: Wed, 22 Jun 2016 18:51:57 GMT { "id": "1b8e6786-53ef-4ec0-9ee1-2d1194e1374c", "uploadFiles": [ { "id": "1b9e9c6d-56b1-4489-b485-454528b18602", "size": 699, "format": "text/xml", "interchangeType": "student", "status": "Initialized" } ], "status": "Initialized" }
From the response, you can obtain the overall operation id (the root
For each file to upload, take the returned fileId and then submit the fi= le as one-to-many "chunks." Each chunk of the file can be up to 150MB. The = attached example file can be submitted as a single chunk. We'll use a = single chunk to keep this walkthrough simple.
POST the file to /uploads/fileId/chunk?offset=3Doffset&size=3D=
size
, where fileId
is the value returned from creating =
the bulk operation, offset
is the current offset in the file s=
tarting with 0, and size
is the actual size of the chunk being=
uploaded. This POST must be submitted as multipart/form-data
=
with the binary data streamed along in the body. An easy way to do this cor=
rectly is to use (or deconstruct) the code provided in the generated SDK for =
the UploadsApi, as it will handle submitting the appropriate headers and da=
ta.
The following is an example HttpRequest
with Headers and em=
bedded XML:
POST = http://localhost:54746/api/v2.0/2016/uploads/1b9e9c6d-56b1-4489-b485-454528= b18602/chunk?offset=3D0&size=3D699 HTTP/1.1 Authorization: Bearer ea8110623bcb478c917aa30c7d65e392 Accept: application/json, application/xml, text/json, text/x-json, text/jav= ascript, text/xml User-Agent: RestSharp/105.2.3.0 Content-Type: multipart/form-data; boundary=3D-----------------------------= 28947758029299 Host: localhost:54746 Content-Length: 965 Accept-Encoding: gzip, deflate -------------------------------28947758029299 Content-Disposition: form-data; name=3D"1b9e9c6d-56b1-4489-b485-454528b1860= 2"; filename=3D"1b9e9c6d-56b1-4489-b485-454528b18602" Content-Type: application/octet-stream <?xml version=3D"1.0" encoding=3D"UTF-8"?> <InterchangeStudent xmlns=3D"http://ed-fi.org/0200" xmlns:ann=3D"http://= ed-fi.org/annotation" xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instanc= e" xsi:schemaLocation=3D"http://ed-fi.org/0200 ../../Schemas/Interchange-St= udent.xsd"> <Student id=3D"TEST_STUDENT_68"> <StudentUniqueId>68</StudentUniqueId> <Name> <FirstName>Student</FirstName> <LastSurname>Sixty Eight</LastSurname> </Name> <Sex>Male</Sex> <BirthData> <BirthDate>1969-06-09</BirthDate> </BirthData> <HispanicLatinoEthnicity>false</HispanicLatinoEthnicity>= ; </Student> </InterchangeStudent> -------------------------------28947758029299--
The expected response is a status code of 201
, with no body=
.
For operations that submit data in chunks, you would simply repeat this =
process until the entire file has been uploaded, adding the size of the chu=
nk to the offset value for each subsequent upload. For example, submitting =
two 300-byte chunks, the first offset
would be 0, the second w=
ould be 300, and both would have a size
of 300.
The following is example code for handling a large file:
int o= ffset =3D 0; int bytesRead =3D 0; var buffer =3D new byte[3 * 1000000]; this.Logger.DebugFormat("Uploading file {0}", filePath); using (var stream =3D File.Open(filePath, FileMode.Open, FileAccess.Read)) while ((bytesRead =3D stream.Read(buffer, 0, buffer.Length)) !=3D 0) { if (bytesRead !=3D buffer.Length) { var newBuffer =3D new byte[bytesRead]; Array.Copy(buffer, newBuffer, bytesRead); buffer =3D newBuffer; } // Submit over to the sdk uploadApi for upload var response =3D uploadApi.PostUploads(new Upload { id =3D fileId, size =3D bytesRead, offset =3D offset, fileBytes =3D buffer }); offset +=3D bytesRead; if (response.StatusCode !=3D HttpStatusCode.Created) { this.Logger.DebugFormat("Error uploading file {0}.", uploadFile.Fil= ePath); break; } this.Logger.DebugFormat("{0} bytes uploaded.", offset); }
For each file, after finishing the upload, take the fileId
=
and commit the upload.
POST to /uploads/fileId/commit
where fileId
is=
the same fileId that was uploaded to. The expected response is a 202=
Accepted
with no body.
At this point, the bulk operation is completed, and will be processed on=
the server asynchronously. Once the commit command is received, the operat=
ion is pushed to a queue that will trigger the actual processing. Status ca=
n be checked at any time by performing a GET to /bulkoperations/{bulk=
OperationId}
, where bulkOperationId
is the id sent back from the original creation of the operation.
On a happy path, after committing all the files, the status
=
should be Started, such as this example:
{ "id": "1b8e6786-53ef-4ec0-9ee1-2d1194e1374c", "uploadFiles": [ { "id": "1b9e9c6d-56b1-4489-b485-454528b18602", "size": 699, "format": "text/xml", "interchangeType": "student", "status": "Started" } ], "status": "Started" }
Once the operation is done processing, the status
should be=
Completed, such as this example:
{ "id": "1b8e6786-53ef-4ec0-9ee1-2d1194e1374c", "uploadFiles": [ { "id": "1b9e9c6d-56b1-4489-b485-454528b18602", "size": 699, "format": "text/xml", "interchangeType": "student", "status": "Completed" } ], "status": "Completed" }
If any of the data elements don't load correctly, the status
will come back as Error such as this example:
{ "id": "d3b18de4-1f1b-482c-802a-0ba9b71bbf8f", "uploadFiles": [ { "id": "83bafffe-377a-4962-844e-88d0d3fcf5e9", "size": 699, "format": "text/xml", "interchangeType": "student", "status": "Error" } ], "status": "Error" }
An Error status doesn't necessarily mean every record failed to load. To=
see which parts failed to load, you can perform a GET against /bulko=
perations/{operationId}/exceptions/fileId?offset=3D0&limit=3D50
=
to get 50 exceptions per file at a time. You can adjust the offset and limi=
t to page through all the exceptions until you've received them all.
This section contains a few additional resources related to bulk loading= through the API:
The following link is the sample XML file used in the walkthrough below.=
Platform hosts and client application developers may find this useful fo= r testing their implementations.